diff mbox series

[Middle-end] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Message ID 4ADA2BFC-DE6C-449E-84F7-2FFED4AF0789@ORACLE.COM
State New
Headers show
Series [Middle-end] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all] | expand

Commit Message

Qing Zhao July 14, 2020, 2:45 p.m. UTC
Hi, Gcc team,

This patch is a follow-up on the previous patch and corresponding discussion:
https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>

From the previous round of discussion, the major issues raised were:

A. should be rewritten by using regsets infrastructure.  
B. Put the patch into middle-end instead of x86 backend. 

This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:

1. Change the names of the option and attribute from 
-mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
to:
-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”) 
Add the new option and  new attribute in general. 
2. The main code generation part is moved from i386 backend to middle-end;
3. Add 4 target-hooks;
4. Implement these 4 target-hooks on i386 backend. 
5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.

The patch is as following:

[PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
command-line option and
zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:

  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")

  Don't zero call-used registers upon function return.

  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")

  Zero used call-used general purpose registers upon function return.

  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")

  Zero all call-used general purpose registers upon function return.

  4. -fzero-call-used-regs=used and zero_call_used_regs("used")

  Zero used call-used registers upon function return.

  5. -fzero-call-used-regs=all and zero_call_used_regs("all")

  Zero all call-used registers upon function return.

The feature is implemented in middle-end. But currently is only valid on X86.

Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
-fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
-fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
by default on x86-64.

Please take a look and let me know any more comment?

thanks.

Qing


====================================

gcc/ChangeLog:

2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>

	* common.opt: Add new option -fzero-call-used-regs.
	* config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
	(ix86_zero_call_used_regno_mode): Likewise.
	(ix86_zero_all_vector_registers): Likewise.
	(ix86_expand_prologue): Replace gen_prologue_use with
	gen_pro_epilogue_use.
	(TARGET_ZERO_CALL_USED_REGNO_P): Define.
	(TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
	(TARGET_PRO_EPILOGUE_USE): Define.
	(TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
	* config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
	with UNSPECV_PRO_EPILOGUE_USE.
	* coretypes.h (enum zero_call_used_regs): New type.
	* doc/extend.texi: Document the new zero_call_used_regs attribute.
	* doc/invoke.texi: Document the new -fzero-call-used-regs option.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
	(TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
	(TARGET_PRO_EPILOGUE_USE): Likewise.
	(TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
	* function.c (is_live_reg_at_exit): New function.
	(gen_call_used_regs_seq): Likewise.
	(make_epilogue_seq): Call gen_call_used_regs_seq.
	* function.h (is_live_reg_at_exit): Declare.
	* target.def (zero_call_used_regno_p): New hook.
	(zero_call_used_regno_mode): Likewise.
	(pro_epilogue_use): Likewise.
	(zero_all_vector_registers): Likewise.
	* targhooks.c (default_zero_call_used_regno_p): New function.
	(default_zero_call_used_regno_mode): Likewise.
	* targhooks.h (default_zero_call_used_regno_p): Declare.
	(default_zero_call_used_regno_mode): Declare.
	* toplev.c (process_options): Issue errors when -fzero-call-used-regs
	is used on targets that do not support it.
	* tree-core.h (struct tree_decl_with_vis): New field 
	zero_call_used_regs_type.
	* tree.h (DECL_ZERO_CALL_USED_REGS): New macro.

gcc/c-family/ChangeLog:

2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>

	* c-attribs.c (c_common_attribute_table): Add new attribute
	zero_call_used_regs.
	(handle_zero_call_used_regs_attribute): New function.

gcc/c/ChangeLog:

2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>

	* c-decl.c (merge_decls): Merge zero_call_used_regs_type.

gcc/testsuite/ChangeLog:

2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>

	* c-c++-common/zero-scratch-regs-1.c: New test.
	* c-c++-common/zero-scratch-regs-2.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-1.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-10.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-11.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-12.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-13.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-14.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-15.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-16.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-17.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-18.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-19.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-2.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-20.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-21.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-22.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-23.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-3.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-4.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-5.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-6.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-7.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-8.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-9.c: Likewise.

---
gcc/c-family/c-attribs.c                           |  68 ++++++++++
gcc/c/c-decl.c                                     |   4 +
gcc/common.opt                                     |  23 ++++
gcc/config/i386/i386.c                             |  58 ++++++++-
gcc/config/i386/i386.md                            |   6 +-
gcc/coretypes.h                                    |  10 ++
gcc/doc/extend.texi                                |  11 ++
gcc/doc/invoke.texi                                |  13 +-
gcc/doc/tm.texi                                    |  27 ++++
gcc/doc/tm.texi.in                                 |   8 ++
gcc/function.c                                     | 145 +++++++++++++++++++++
gcc/function.h                                     |   2 +
gcc/target.def                                     |  33 +++++
gcc/targhooks.c                                    |  17 +++
gcc/targhooks.h                                    |   3 +
gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
.../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
.../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
.../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
.../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
.../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
.../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
.../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
.../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
.../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
.../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
.../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
.../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
.../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
.../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
.../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
.../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
.../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
.../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
.../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
.../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
.../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
.../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
.../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
gcc/toplev.c                                       |   9 ++
gcc/tree-core.h                                    |   6 +-
gcc/tree.h                                         |   5 +
43 files changed, 866 insertions(+), 7 deletions(-)
create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c

Comments

Victor Rodriguez July 16, 2020, 1:17 p.m. UTC | #1
On Tue, Jul 14, 2020 at 9:52 AM Qing Zhao via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi, Gcc team,
>
> This patch is a follow-up on the previous patch and corresponding discussion:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>
>
> From the previous round of discussion, the major issues raised were:
>
> A. should be rewritten by using regsets infrastructure.
> B. Put the patch into middle-end instead of x86 backend.
>
> This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
>
> 1. Change the names of the option and attribute from
> -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> to:
> -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> Add the new option and  new attribute in general.
> 2. The main code generation part is moved from i386 backend to middle-end;
> 3. Add 4 target-hooks;
> 4. Implement these 4 target-hooks on i386 backend.
> 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
>
> The patch is as following:
>
> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> command-line option and
> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
>
>   1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
>
>   Don't zero call-used registers upon function return.
>
>   2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
>
>   Zero used call-used general purpose registers upon function return.
>
>   3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
>
>   Zero all call-used general purpose registers upon function return.
>
>   4. -fzero-call-used-regs=used and zero_call_used_regs("used")
>
>   Zero used call-used registers upon function return.
>
>   5. -fzero-call-used-regs=all and zero_call_used_regs("all")
>
>   Zero all call-used registers upon function return.
>
> The feature is implemented in middle-end. But currently is only valid on X86.
>
> Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> by default on x86-64.
>
> Please take a look and let me know any more comment?
>
> thanks.
>
> Qing
>
>
> ====================================
>
> gcc/ChangeLog:
>
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
>
>         * common.opt: Add new option -fzero-call-used-regs.
>         * config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
>         (ix86_zero_call_used_regno_mode): Likewise.
>         (ix86_zero_all_vector_registers): Likewise.
>         (ix86_expand_prologue): Replace gen_prologue_use with
>         gen_pro_epilogue_use.
>         (TARGET_ZERO_CALL_USED_REGNO_P): Define.
>         (TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
>         (TARGET_PRO_EPILOGUE_USE): Define.
>         (TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
>         * config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
>         with UNSPECV_PRO_EPILOGUE_USE.
>         * coretypes.h (enum zero_call_used_regs): New type.
>         * doc/extend.texi: Document the new zero_call_used_regs attribute.
>         * doc/invoke.texi: Document the new -fzero-call-used-regs option.
>         * doc/tm.texi: Regenerate.
>         * doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
>         (TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
>         (TARGET_PRO_EPILOGUE_USE): Likewise.
>         (TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
>         * function.c (is_live_reg_at_exit): New function.
>         (gen_call_used_regs_seq): Likewise.
>         (make_epilogue_seq): Call gen_call_used_regs_seq.
>         * function.h (is_live_reg_at_exit): Declare.
>         * target.def (zero_call_used_regno_p): New hook.
>         (zero_call_used_regno_mode): Likewise.
>         (pro_epilogue_use): Likewise.
>         (zero_all_vector_registers): Likewise.
>         * targhooks.c (default_zero_call_used_regno_p): New function.
>         (default_zero_call_used_regno_mode): Likewise.
>         * targhooks.h (default_zero_call_used_regno_p): Declare.
>         (default_zero_call_used_regno_mode): Declare.
>         * toplev.c (process_options): Issue errors when -fzero-call-used-regs
>         is used on targets that do not support it.
>         * tree-core.h (struct tree_decl_with_vis): New field
>         zero_call_used_regs_type.
>         * tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
>
> gcc/c-family/ChangeLog:
>
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
>
>         * c-attribs.c (c_common_attribute_table): Add new attribute
>         zero_call_used_regs.
>         (handle_zero_call_used_regs_attribute): New function.
>
> gcc/c/ChangeLog:
>
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
>
>         * c-decl.c (merge_decls): Merge zero_call_used_regs_type.
>
> gcc/testsuite/ChangeLog:
>
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
>
>         * c-c++-common/zero-scratch-regs-1.c: New test.
>         * c-c++-common/zero-scratch-regs-2.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-1.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
>
> ---
> gcc/c-family/c-attribs.c                           |  68 ++++++++++
> gcc/c/c-decl.c                                     |   4 +
> gcc/common.opt                                     |  23 ++++
> gcc/config/i386/i386.c                             |  58 ++++++++-
> gcc/config/i386/i386.md                            |   6 +-
> gcc/coretypes.h                                    |  10 ++
> gcc/doc/extend.texi                                |  11 ++
> gcc/doc/invoke.texi                                |  13 +-
> gcc/doc/tm.texi                                    |  27 ++++
> gcc/doc/tm.texi.in                                 |   8 ++
> gcc/function.c                                     | 145 +++++++++++++++++++++
> gcc/function.h                                     |   2 +
> gcc/target.def                                     |  33 +++++
> gcc/targhooks.c                                    |  17 +++
> gcc/targhooks.h                                    |   3 +
> gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> gcc/toplev.c                                       |   9 ++
> gcc/tree-core.h                                    |   6 +-
> gcc/tree.h                                         |   5 +
> 43 files changed, 866 insertions(+), 7 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
>
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 3721483..cc93d6f 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> static tree ignore_attribute (tree *, tree, tree, int, bool *);
> static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> +                                                bool *);
> static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> @@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
>                               ignore_attribute, NULL },
>   { "no_split_stack",         0, 0, true,  false, false, false,
>                               handle_no_split_stack_attribute, NULL },
> +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> +                             handle_zero_call_used_regs_attribute, NULL },
> +
>   /* For internal use (marking of builtins and runtime functions) only.
>      The name contains space to prevent its usage in source code.  */
>   { "fn spec",                1, 1, false, true, true, false,
> @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
>   return NULL_TREE;
> }
>
> +/* Handle a "zero_call_used_regs" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> +                                     int ARG_UNUSED (flags),
> +                                     bool *no_add_attris)
> +{
> +  tree decl = *node;
> +  tree id = TREE_VALUE (args);
> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> +
> +  if (TREE_CODE (decl) != FUNCTION_DECL)
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl),
> +               "%qE attribute applies only to functions", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +  else if (DECL_INITIAL (decl))
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl),
> +               "cannot set %qE attribute after definition", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (TREE_CODE (id) != STRING_CST)
> +    {
> +      error ("attribute %qE arguments not a string", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (!targetm.calls.pro_epilogue_use)
> +    {
> +      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
> +      return NULL_TREE;
> +    }
> +
> +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_skip;
> +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_used;
> +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_all;
> +  else
> +    {
> +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
> +            name, "skip", "used-gpr", "all-gpr", "used", "all");
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> +
> +  return NULL_TREE;
> +}
> +
> /* Handle a "returns_nonnull" attribute; arguments as in
>    struct attribute_spec.handler.  */
>
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index 81bd2ee..ded1880 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
>           DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
>         }
>
> +      /* Merge the zero_call_used_regs_type information.  */
> +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> +       DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> +
>       /* Merge the storage class information.  */
>       merge_weak (newdecl, olddecl);
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index df8af36..19900f9 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> Common Report Var(flag_zero_initialized_in_bss) Init(1)
> Put zero initialized data in the bss section.
>
> +fzero-call-used-regs=
> +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> +Clear call-used registers upon function return.
> +
> +Enum
> +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> +
> g
> Common Driver RejectNegative JoinedOrMissing
> Generate debug information in default format.
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 5c373c0..fd1aa9c 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
>   return false;
> }
>
> +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> +
> +static bool
> +ix86_zero_call_used_regno_p (const unsigned int regno,
> +                            bool gpr_only)
> +{
> +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> +}
> +
> +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> +
> +static machine_mode
> +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> +{
> +  /* NB: We only need to zero the lower 32 bits for integer registers
> +     and the lower 128 bits for vector registers since destination are
> +     zero-extended to the full register width.  */
> +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> +}
> +
> +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> +
> +static rtx
> +ix86_zero_all_vector_registers (bool used_only)
> +{
> +  if (!TARGET_AVX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> +        || (TARGET_64BIT
> +            && (REX_SSE_REGNO_P (regno)
> +                || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> +       && (!this_target_hard_regs->x_call_used_regs[regno]
> +           || fixed_regs[regno]
> +           || is_live_reg_at_exit (regno)
> +           || (used_only && !df_regs_ever_live_p (regno))))
> +      return NULL;
> +
> +  return gen_avx_vzeroall ();
> +}
> +
> /* Define how to find the value returned by a function.
>    VALTYPE is the data type of the value (as a tree).
>    If the precise function being called is known, FUNC is its FUNCTION_DECL;
> @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
>       insn = emit_insn (gen_set_got (pic));
>       RTX_FRAME_RELATED_P (insn) = 1;
>       add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> -      emit_insn (gen_prologue_use (pic));
> +      emit_insn (gen_pro_epilogue_use (pic));
>       /* Deleting already emmitted SET_GOT if exist and allocated to
>          REAL_PIC_OFFSET_TABLE_REGNUM.  */
>       ix86_elim_entry_set_got (pic);
> @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
>      Further, prevent alloca modifications to the stack pointer from being
>      combined with prologue modifications.  */
>   if (TARGET_SEH)
> -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> }
>
> /* Emit code to restore REG using a POP insn.  */
> @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> #undef TARGET_FUNCTION_VALUE_REGNO_P
> #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
>
> +#undef TARGET_ZERO_CALL_USED_REGNO_P
> +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> +
> +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> +
> +#undef TARGET_PRO_EPILOGUE_USE
> +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> +
> +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> +
> #undef TARGET_PROMOTE_FUNCTION_MODE
> #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index d0ecd9e..e7df59f 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -194,7 +194,7 @@
>   UNSPECV_STACK_PROBE
>   UNSPECV_PROBE_STACK_RANGE
>   UNSPECV_ALIGN
> -  UNSPECV_PROLOGUE_USE
> +  UNSPECV_PRO_EPILOGUE_USE
>   UNSPECV_SPLIT_STACK_RETURN
>   UNSPECV_CLD
>   UNSPECV_NOPS
> @@ -13525,8 +13525,8 @@
>
> ;; As USE insns aren't meaningful after reload, this is used instead
> ;; to prevent deleting instructions setting registers for PIC code
> -(define_insn "prologue_use"
> -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> +(define_insn "pro_epilogue_use"
> +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
>   ""
>   ""
>   [(set_attr "length" "0")])
> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> index 6b6cfcd..e56d6ec 100644
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -418,6 +418,16 @@ enum symbol_visibility
>   VISIBILITY_INTERNAL
> };
>
> +/* Zero call-used registers type.  */
> +enum zero_call_used_regs {
> +  zero_call_used_regs_unset = 0,
> +  zero_call_used_regs_skip,
> +  zero_call_used_regs_used_gpr,
> +  zero_call_used_regs_all_gpr,
> +  zero_call_used_regs_used,
> +  zero_call_used_regs_all
> +};
> +
> /* enums used by the targetm.excess_precision hook.  */
>
> enum flt_eval_method
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index c800b74..b32c55f 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> A declaration to which @code{weakref} is attached and that is associated
> with a named @code{target} must be @code{static}.
>
> +@item zero_call_used_regs ("@var{choice}")
> +@cindex @code{zero_call_used_regs} function attribute
> +The @code{zero_call_used_regs} attribute causes the compiler to zero
> +call-used registers at function return according to @var{choice}.
> +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> +call-used general purpose registers which are used in funciton.
> +@samp{all-gpr} zeros all call-used general purpose registers.
> +@samp{used} zeros call-used registers which are used in function.
> +@samp{all} zeros all call-used registers.  The default for the
> +attribute is controlled by @option{-fzero-call-used-regs}.
> +
> @end table
>
> @c This is the end of the target-independent attribute table
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 09bcc5b..da02686 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> -funsafe-math-optimizations  -funswitch-loops @gol
> -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> --param @var{name}=@var{value}
> -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
>
> @@ -12273,6 +12273,17 @@ int foo (void)
>
> Not all targets support this option.
>
> +@item -fzero-call-used-regs=@var{choice}
> +@opindex fzero-call-used-regs
> +Zero call-used registers at function return according to
> +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> +registers which are used in function.  @samp{all-gpr} zeros all
> +call-used registers.  @samp{used} zeros call-used registers which
> +are used in function.  @samp{all} zeros all call-used registers.  You
> +can control this behavior for a specific function by using the function
> +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> +
> @item --param @var{name}=@var{value}
> @opindex param
> In some places, GCC uses various constants to control the amount of
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 6e7d9dc..43dddd3 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
> If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> @end deftypefn
>
> +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
> +A target hook that returns @code{true} if @var{regno} is the number of a
> +call used register.  If @var{general_reg_only_p} is @code{true},
> +@var{regno} must be the number of a hard general register.
> +
> +If this hook is not defined, then default_zero_call_used_regno_p will be used.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
> +A target hook that returns a mode of suitable to zero the register for the
> +call used register @var{regno} in @var{mode}.
> +
> +If this hook is not defined, then default_zero_call_used_regno_mode will be
> +used.
> +@end deftypefn
> +
> @defmac APPLY_RESULT_SIZE
> Define this macro if @samp{untyped_call} and @samp{untyped_return}
> need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> @@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> is needed.
> @end deftypefn
>
> +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> +This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
> +prevent deleting register setting instructions in proprologue and epilogue.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
> +This hook should return an rtx to zero all vector registers at function
> +exit.  If @var{used_only} is @code{true}, only used vector registers should
> +be zeroed.  Return @code{NULL} if possible
> +@end deftypefn
> +
> @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> When optimization is disabled, this hook indicates whether or not
> arguments should be allocated to stack slots.  Normally, GCC allocates
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 3be984b..bee917a 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -3430,6 +3430,10 @@ for a new target instead.
>
> @hook TARGET_FUNCTION_VALUE_REGNO_P
>
> +@hook TARGET_ZERO_CALL_USED_REGNO_P
> +
> +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> +
> @defmac APPLY_RESULT_SIZE
> Define this macro if @samp{untyped_call} and @samp{untyped_return}
> need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> @@ -8109,6 +8113,10 @@ and the associated definitions of those functions.
>
> @hook TARGET_GET_DRAP_RTX
>
> +@hook TARGET_PRO_EPILOGUE_USE
> +
> +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> +
> @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
>
> @hook TARGET_CONST_ANCHOR
> diff --git a/gcc/function.c b/gcc/function.c
> index 9eee9b5..9908530 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "emit-rtl.h"
> #include "recog.h"
> #include "rtl-error.h"
> +#include "hard-reg-set.h"
> #include "alias.h"
> #include "fold-const.h"
> #include "stor-layout.h"
> @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
>   return seq;
> }
>
> +/* Check whether the hard register REGNO is live at the exit block
> + * of the current routine.  */
> +bool
> +is_live_reg_at_exit (unsigned int regno)
> +{
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +    {
> +      bitmap live_out = df_get_live_out (e->src);
> +      if (REGNO_REG_SET_P (live_out, regno))
> +       return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Emit a sequence of insns to zero the call-used-registers for the current
> + * function.  */
> +
> +static void
> +gen_call_used_regs_seq (void)
> +{
> +  if (!targetm.calls.pro_epilogue_use)
> +    return;
> +
> +  bool gpr_only = true;
> +  bool used_only = true;
> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> +
> +  if (flag_zero_call_used_regs)
> +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> +       == zero_call_used_regs_unset)
> +      zero_call_used_regs_type = flag_zero_call_used_regs;
> +    else
> +      zero_call_used_regs_type
> +       = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> +  else
> +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> +
> +  /* No need to zero call-used-regs when no user request is present.  */
> +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> +    return;
> +
> +  /* No need to zero call-used-regs in main ().  */
> +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> +    return;
> +
> +  /* No need to zero call-used-regs if __builtin_eh_return is called
> +     since it isn't a normal function return.  */
> +  if (crtl->calls_eh_return)
> +    return;
> +
> +  /* If gpr_only is true, only zero call-used-registers that are
> +     general-purpose registers; if used_only is true, only zero
> +     call-used-registers that are used in the current function.  */
> +  switch (zero_call_used_regs_type)
> +    {
> +      case zero_call_used_regs_all_gpr:
> +       used_only = false;
> +       break;
> +      case zero_call_used_regs_used:
> +       gpr_only = false;
> +       break;
> +      case zero_call_used_regs_all:
> +       gpr_only = false;
> +       used_only = false;
> +       break;
> +      default:
> +       break;
> +    }
> +
> +  /* An optimization to use a single hard insn to zero all vector registers on
> +     the target that provides such insn.  */
> +  if (!gpr_only
> +      && targetm.calls.zero_all_vector_registers)
> +    {
> +      rtx zero_all_vec_insn
> +       = targetm.calls.zero_all_vector_registers (used_only);
> +      if (zero_all_vec_insn)
> +       {
> +         emit_insn (zero_all_vec_insn);
> +         gpr_only = true;
> +       }
> +    }
> +
> +  /* For each of the hard registers, check to see whether we should zero it if:
> +     1. it is a call-used-registers;
> + and 2. it is not a fixed-registers;
> + and 3. it is not live at the end of the routine;
> + and 4. it is general purpose register if gpr_only is true;
> + and 5. it is used in the routine if used_only is true;
> +   */
> +
> +  /* This array holds the zero rtx with the correponding machine mode.  */
> +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> +    zero_rtx[i] = NULL_RTX;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!this_target_hard_regs->x_call_used_regs[regno])
> +       continue;
> +      if (fixed_regs[regno])
> +       continue;
> +      if (is_live_reg_at_exit (regno))
> +       continue;
> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> +       continue;
> +      if (used_only && !df_regs_ever_live_p (regno))
> +       continue;
> +
> +      /* Now we can emit insn to zero this register.  */
> +      rtx reg, tmp;
> +
> +      machine_mode mode
> +       = targetm.calls.zero_call_used_regno_mode (regno,
> +                                                  reg_raw_mode[regno]);
> +      if (mode == VOIDmode)
> +       continue;
> +      if (!have_regs_of_mode[mode])
> +       continue;
> +
> +      reg = gen_rtx_REG (mode, regno);
> +      if (zero_rtx[(int)mode] == NULL_RTX)
> +       {
> +         zero_rtx[(int)mode] = reg;
> +         tmp = gen_rtx_SET (reg, const0_rtx);
> +         emit_insn (tmp);
> +       }
> +      else
> +       emit_move_insn (reg, zero_rtx[(int)mode]);
> +
> +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> +    }
> +
> +  return;
> +}
> +
> +
> /* Return a sequence to be used as the epilogue for the current function,
>    or NULL.  */
>
> @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
>
>   start_sequence ();
>   emit_note (NOTE_INSN_EPILOGUE_BEG);
> +
> +  gen_call_used_regs_seq ();
> +
>   rtx_insn *seq = targetm.gen_epilogue ();
>   if (seq)
>     emit_jump_insn (seq);
> diff --git a/gcc/function.h b/gcc/function.h
> index d55cbdd..fc36c3e 100644
> --- a/gcc/function.h
> +++ b/gcc/function.h
> @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
>
> extern void used_types_insert (tree);
>
> +extern bool is_live_reg_at_exit (unsigned int);
> +
> #endif  /* GCC_FUNCTION_H */
> diff --git a/gcc/target.def b/gcc/target.def
> index 07059a8..8aab63e 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
>  default_function_value_regno_p)
>
> DEFHOOK
> +(zero_call_used_regno_p,
> + "A target hook that returns @code{true} if @var{regno} is the number of a\n\
> +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> +@var{regno} must be the number of a hard general register.\n\
> +\n\
> +If this hook is not defined, then default_zero_call_used_regno_p will be used.",
> + bool, (const unsigned int regno, bool general_reg_only_p),
> + default_zero_call_used_regno_p)
> +
> +DEFHOOK
> +(zero_call_used_regno_mode,
> + "A target hook that returns a mode of suitable to zero the register for the\n\
> +call used register @var{regno} in @var{mode}.\n\
> +\n\
> +If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
> +used.",
> + machine_mode, (const unsigned int regno, machine_mode mode),
> + default_zero_call_used_regno_mode)
> +
> +DEFHOOK
> (fntype_abi,
>  "Return the ABI used by a function with type @var{type}; see the\n\
> definition of @code{predefined_function_abi} for details of the ABI\n\
> @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> is needed.",
>  rtx, (void), NULL)
>
> +DEFHOOK
> +(pro_epilogue_use,
> + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
> +prevent deleting register setting instructions in proprologue and epilogue.",
> + rtx, (rtx reg), NULL)
> +
> +DEFHOOK
> +(zero_all_vector_registers,
> + "This hook should return an rtx to zero all vector registers at function\n\
> +exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
> +be zeroed.  Return @code{NULL} if possible",
> + rtx, (bool used_only), NULL)
> +
> /* Return true if all function parameters should be spilled to the
>    stack.  */
> DEFHOOK
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 0113c7b..ed02173 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> #endif
> }
>
> +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> +
> +bool
> +default_zero_call_used_regno_p (const unsigned int,
> +                               bool)
> +{
> +  return false;
> +}
> +
> +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> +
> +machine_mode
> +default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
> +{
> +  return mode;
> +}
> +
> rtx
> default_internal_arg_pointer (void)
> {
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index b572a36..370df19 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> extern rtx default_function_value (const_tree, const_tree, bool);
> extern rtx default_libcall_value (machine_mode, const_rtx);
> extern bool default_function_value_regno_p (const unsigned int);
> +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> +extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
> +                                                      machine_mode);
> extern rtx default_internal_arg_pointer (void);
> extern rtx default_static_chain (const_tree, bool);
> extern void default_trampoline_init (rtx, tree, rtx);
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..3c2ac72
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> @@ -0,0 +1,3 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..acf48c4
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..9f61dc4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> new file mode 100644
> index 0000000..09048e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> new file mode 100644
> index 0000000..4862688
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> new file mode 100644
> index 0000000..500251b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> new file mode 100644
> index 0000000..8b058e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> new file mode 100644
> index 0000000..d4eaaf7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> new file mode 100644
> index 0000000..dd3bb90
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> new file mode 100644
> index 0000000..e2274f6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> new file mode 100644
> index 0000000..7f5d153
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> new file mode 100644
> index 0000000..fe13d2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> new file mode 100644
> index 0000000..205a532
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..e046684
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> new file mode 100644
> index 0000000..4be8ff6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> new file mode 100644
> index 0000000..0eb34e0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> +
> +__attribute__ ((zero_call_used_regs("used")))
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> new file mode 100644
> index 0000000..cbb63a4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> new file mode 100644
> index 0000000..7573197
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> new file mode 100644
> index 0000000..de71223
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> new file mode 100644
> index 0000000..ccfa441
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> new file mode 100644
> index 0000000..6b46ca3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +__attribute__ ((zero_call_used_regs("all-gpr")))
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> new file mode 100644
> index 0000000..0680f38
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> new file mode 100644
> index 0000000..534defa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> new file mode 100644
> index 0000000..477bb19
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> new file mode 100644
> index 0000000..a305a60
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 95eea63..01a1f24 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1464,6 +1464,15 @@ process_options (void)
>         }
>     }
>
> +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> +      && !targetm.calls.pro_epilogue_use)
> +    {
> +      error_at (UNKNOWN_LOCATION,
> +               "%<-fzero-call-used-regs=%> is not supported for this "
> +               "target");
> +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> +    }
> +
>   /* One region RA really helps to decrease the code size.  */
>   if (flag_ira_region == IRA_REGION_AUTODETECT)
>     flag_ira_region
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index 8c5a2e3..71badbd 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
>  unsigned final : 1;
>  /* Belong to FUNCTION_DECL exclusively.  */
>  unsigned regdecl_flag : 1;
> - /* 14 unused bits. */
> +
> + /* How to clear call-used registers upon function return.  */
> + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> +
> + /* 11 unused bits.  */
> };
>
> struct GTY(()) tree_var_decl {
> diff --git a/gcc/tree.h b/gcc/tree.h
> index cf546ed..d378a88 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> #define DECL_VISIBILITY(NODE) \
>   (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
>
> +/* Value of the function decl's type of zeroing the call used
> +   registers upon return from function.  */
> +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> +
> /* Nonzero means that the decl (or an enclosing scope) had its
>    visibility specified rather than being inferred.  */
> #define DECL_VISIBILITY_SPECIFIED(NODE) \
> --
> 1.9.1

+1. Tested on x86
Qing Zhao July 28, 2020, 8:05 p.m. UTC | #2
Richard and Uros,

Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?

This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.  

Thanks a lot for your time.

Qing

> On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> Hi, Gcc team,
> 
> This patch is a follow-up on the previous patch and corresponding discussion:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>
> 
> From the previous round of discussion, the major issues raised were:
> 
> A. should be rewritten by using regsets infrastructure.  
> B. Put the patch into middle-end instead of x86 backend. 
> 
> This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> 
> 1. Change the names of the option and attribute from 
> -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> to:
> -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”) 
> Add the new option and  new attribute in general. 
> 2. The main code generation part is moved from i386 backend to middle-end;
> 3. Add 4 target-hooks;
> 4. Implement these 4 target-hooks on i386 backend. 
> 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> 
> The patch is as following:
> 
> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> command-line option and
> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> 
>  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> 
>  Don't zero call-used registers upon function return.
> 
>  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> 
>  Zero used call-used general purpose registers upon function return.
> 
>  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> 
>  Zero all call-used general purpose registers upon function return.
> 
>  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> 
>  Zero used call-used registers upon function return.
> 
>  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> 
>  Zero all call-used registers upon function return.
> 
> The feature is implemented in middle-end. But currently is only valid on X86.
> 
> Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> by default on x86-64.
> 
> Please take a look and let me know any more comment?
> 
> thanks.
> 
> Qing
> 
> 
> ====================================
> 
> gcc/ChangeLog:
> 
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> 
> 	* common.opt: Add new option -fzero-call-used-regs.
> 	* config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
> 	(ix86_zero_call_used_regno_mode): Likewise.
> 	(ix86_zero_all_vector_registers): Likewise.
> 	(ix86_expand_prologue): Replace gen_prologue_use with
> 	gen_pro_epilogue_use.
> 	(TARGET_ZERO_CALL_USED_REGNO_P): Define.
> 	(TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
> 	(TARGET_PRO_EPILOGUE_USE): Define.
> 	(TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
> 	* config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
> 	with UNSPECV_PRO_EPILOGUE_USE.
> 	* coretypes.h (enum zero_call_used_regs): New type.
> 	* doc/extend.texi: Document the new zero_call_used_regs attribute.
> 	* doc/invoke.texi: Document the new -fzero-call-used-regs option.
> 	* doc/tm.texi: Regenerate.
> 	* doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
> 	(TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
> 	(TARGET_PRO_EPILOGUE_USE): Likewise.
> 	(TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
> 	* function.c (is_live_reg_at_exit): New function.
> 	(gen_call_used_regs_seq): Likewise.
> 	(make_epilogue_seq): Call gen_call_used_regs_seq.
> 	* function.h (is_live_reg_at_exit): Declare.
> 	* target.def (zero_call_used_regno_p): New hook.
> 	(zero_call_used_regno_mode): Likewise.
> 	(pro_epilogue_use): Likewise.
> 	(zero_all_vector_registers): Likewise.
> 	* targhooks.c (default_zero_call_used_regno_p): New function.
> 	(default_zero_call_used_regno_mode): Likewise.
> 	* targhooks.h (default_zero_call_used_regno_p): Declare.
> 	(default_zero_call_used_regno_mode): Declare.
> 	* toplev.c (process_options): Issue errors when -fzero-call-used-regs
> 	is used on targets that do not support it.
> 	* tree-core.h (struct tree_decl_with_vis): New field 
> 	zero_call_used_regs_type.
> 	* tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
> 
> gcc/c-family/ChangeLog:
> 
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> 
> 	* c-attribs.c (c_common_attribute_table): Add new attribute
> 	zero_call_used_regs.
> 	(handle_zero_call_used_regs_attribute): New function.
> 
> gcc/c/ChangeLog:
> 
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> 
> 	* c-decl.c (merge_decls): Merge zero_call_used_regs_type.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> 
> 	* c-c++-common/zero-scratch-regs-1.c: New test.
> 	* c-c++-common/zero-scratch-regs-2.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-1.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> 
> ---
> gcc/c-family/c-attribs.c                           |  68 ++++++++++
> gcc/c/c-decl.c                                     |   4 +
> gcc/common.opt                                     |  23 ++++
> gcc/config/i386/i386.c                             |  58 ++++++++-
> gcc/config/i386/i386.md                            |   6 +-
> gcc/coretypes.h                                    |  10 ++
> gcc/doc/extend.texi                                |  11 ++
> gcc/doc/invoke.texi                                |  13 +-
> gcc/doc/tm.texi                                    |  27 ++++
> gcc/doc/tm.texi.in                                 |   8 ++
> gcc/function.c                                     | 145 +++++++++++++++++++++
> gcc/function.h                                     |   2 +
> gcc/target.def                                     |  33 +++++
> gcc/targhooks.c                                    |  17 +++
> gcc/targhooks.h                                    |   3 +
> gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> gcc/toplev.c                                       |   9 ++
> gcc/tree-core.h                                    |   6 +-
> gcc/tree.h                                         |   5 +
> 43 files changed, 866 insertions(+), 7 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> 
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 3721483..cc93d6f 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> static tree ignore_attribute (tree *, tree, tree, int, bool *);
> static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> +						 bool *);
> static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> @@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
> 			      ignore_attribute, NULL },
>  { "no_split_stack",	      0, 0, true,  false, false, false,
> 			      handle_no_split_stack_attribute, NULL },
> +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> +			      handle_zero_call_used_regs_attribute, NULL },
> +
>  /* For internal use (marking of builtins and runtime functions) only.
>     The name contains space to prevent its usage in source code.  */
>  { "fn spec",		      1, 1, false, true, true, false,
> @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
>  return NULL_TREE;
> }
> 
> +/* Handle a "zero_call_used_regs" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> +				      int ARG_UNUSED (flags),
> +				      bool *no_add_attris)
> +{
> +  tree decl = *node;
> +  tree id = TREE_VALUE (args);
> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> +
> +  if (TREE_CODE (decl) != FUNCTION_DECL)
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl),
> +		"%qE attribute applies only to functions", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +  else if (DECL_INITIAL (decl))
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl),
> +		"cannot set %qE attribute after definition", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (TREE_CODE (id) != STRING_CST)
> +    {
> +      error ("attribute %qE arguments not a string", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (!targetm.calls.pro_epilogue_use)
> +    {
> +      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
> +      return NULL_TREE;
> +    }
> +
> +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_skip;
> +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_used;
> +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_all;
> +  else
> +    {
> +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
> + 	     name, "skip", "used-gpr", "all-gpr", "used", "all");
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> +
> +  return NULL_TREE;
> +}
> +
> /* Handle a "returns_nonnull" attribute; arguments as in
>   struct attribute_spec.handler.  */
> 
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index 81bd2ee..ded1880 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
> 	  DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> 	}
> 
> +      /* Merge the zero_call_used_regs_type information.  */
> +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> +	DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> +
>      /* Merge the storage class information.  */
>      merge_weak (newdecl, olddecl);
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index df8af36..19900f9 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> Common Report Var(flag_zero_initialized_in_bss) Init(1)
> Put zero initialized data in the bss section.
> 
> +fzero-call-used-regs=
> +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> +Clear call-used registers upon function return.
> +
> +Enum
> +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> +
> g
> Common Driver RejectNegative JoinedOrMissing
> Generate debug information in default format.
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 5c373c0..fd1aa9c 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
>  return false;
> }
> 
> +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> +
> +static bool
> +ix86_zero_call_used_regno_p (const unsigned int regno,
> +			     bool gpr_only)
> +{
> +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> +}
> +
> +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> +
> +static machine_mode
> +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> +{
> +  /* NB: We only need to zero the lower 32 bits for integer registers
> +     and the lower 128 bits for vector registers since destination are
> +     zero-extended to the full register width.  */
> +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> +}
> +
> +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> +
> +static rtx
> +ix86_zero_all_vector_registers (bool used_only)
> +{
> +  if (!TARGET_AVX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> +	 || (TARGET_64BIT
> +	     && (REX_SSE_REGNO_P (regno)
> +		 || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> +	&& (!this_target_hard_regs->x_call_used_regs[regno]
> +	    || fixed_regs[regno]
> +	    || is_live_reg_at_exit (regno)
> +	    || (used_only && !df_regs_ever_live_p (regno))))
> +      return NULL;
> +
> +  return gen_avx_vzeroall ();
> +}
> +
> /* Define how to find the value returned by a function.
>   VALTYPE is the data type of the value (as a tree).
>   If the precise function being called is known, FUNC is its FUNCTION_DECL;
> @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
>      insn = emit_insn (gen_set_got (pic));
>      RTX_FRAME_RELATED_P (insn) = 1;
>      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> -      emit_insn (gen_prologue_use (pic));
> +      emit_insn (gen_pro_epilogue_use (pic));
>      /* Deleting already emmitted SET_GOT if exist and allocated to
> 	 REAL_PIC_OFFSET_TABLE_REGNUM.  */
>      ix86_elim_entry_set_got (pic);
> @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
>     Further, prevent alloca modifications to the stack pointer from being
>     combined with prologue modifications.  */
>  if (TARGET_SEH)
> -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> }
> 
> /* Emit code to restore REG using a POP insn.  */
> @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> #undef TARGET_FUNCTION_VALUE_REGNO_P
> #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> 
> +#undef TARGET_ZERO_CALL_USED_REGNO_P
> +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> +
> +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> +
> +#undef TARGET_PRO_EPILOGUE_USE
> +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> +
> +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> +
> #undef TARGET_PROMOTE_FUNCTION_MODE
> #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index d0ecd9e..e7df59f 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -194,7 +194,7 @@
>  UNSPECV_STACK_PROBE
>  UNSPECV_PROBE_STACK_RANGE
>  UNSPECV_ALIGN
> -  UNSPECV_PROLOGUE_USE
> +  UNSPECV_PRO_EPILOGUE_USE
>  UNSPECV_SPLIT_STACK_RETURN
>  UNSPECV_CLD
>  UNSPECV_NOPS
> @@ -13525,8 +13525,8 @@
> 
> ;; As USE insns aren't meaningful after reload, this is used instead
> ;; to prevent deleting instructions setting registers for PIC code
> -(define_insn "prologue_use"
> -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> +(define_insn "pro_epilogue_use"
> +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
>  ""
>  ""
>  [(set_attr "length" "0")])
> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> index 6b6cfcd..e56d6ec 100644
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -418,6 +418,16 @@ enum symbol_visibility
>  VISIBILITY_INTERNAL
> };
> 
> +/* Zero call-used registers type.  */
> +enum zero_call_used_regs {
> +  zero_call_used_regs_unset = 0,
> +  zero_call_used_regs_skip,
> +  zero_call_used_regs_used_gpr,
> +  zero_call_used_regs_all_gpr,
> +  zero_call_used_regs_used,
> +  zero_call_used_regs_all
> +};
> +
> /* enums used by the targetm.excess_precision hook.  */
> 
> enum flt_eval_method
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index c800b74..b32c55f 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> A declaration to which @code{weakref} is attached and that is associated
> with a named @code{target} must be @code{static}.
> 
> +@item zero_call_used_regs ("@var{choice}")
> +@cindex @code{zero_call_used_regs} function attribute
> +The @code{zero_call_used_regs} attribute causes the compiler to zero
> +call-used registers at function return according to @var{choice}.
> +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> +call-used general purpose registers which are used in funciton.
> +@samp{all-gpr} zeros all call-used general purpose registers.
> +@samp{used} zeros call-used registers which are used in function.
> +@samp{all} zeros all call-used registers.  The default for the
> +attribute is controlled by @option{-fzero-call-used-regs}.
> +
> @end table
> 
> @c This is the end of the target-independent attribute table
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 09bcc5b..da02686 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> -funsafe-math-optimizations  -funswitch-loops @gol
> -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> --param @var{name}=@var{value}
> -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> 
> @@ -12273,6 +12273,17 @@ int foo (void)
> 
> Not all targets support this option.
> 
> +@item -fzero-call-used-regs=@var{choice}
> +@opindex fzero-call-used-regs
> +Zero call-used registers at function return according to
> +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> +registers which are used in function.  @samp{all-gpr} zeros all
> +call-used registers.  @samp{used} zeros call-used registers which
> +are used in function.  @samp{all} zeros all call-used registers.  You
> +can control this behavior for a specific function by using the function
> +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> +
> @item --param @var{name}=@var{value}
> @opindex param
> In some places, GCC uses various constants to control the amount of
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 6e7d9dc..43dddd3 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
> If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> @end deftypefn
> 
> +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
> +A target hook that returns @code{true} if @var{regno} is the number of a
> +call used register.  If @var{general_reg_only_p} is @code{true},
> +@var{regno} must be the number of a hard general register.
> +
> +If this hook is not defined, then default_zero_call_used_regno_p will be used.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
> +A target hook that returns a mode of suitable to zero the register for the
> +call used register @var{regno} in @var{mode}.
> +
> +If this hook is not defined, then default_zero_call_used_regno_mode will be
> +used.
> +@end deftypefn
> +
> @defmac APPLY_RESULT_SIZE
> Define this macro if @samp{untyped_call} and @samp{untyped_return}
> need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> @@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> is needed.
> @end deftypefn
> 
> +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> +This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
> +prevent deleting register setting instructions in proprologue and epilogue.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
> +This hook should return an rtx to zero all vector registers at function
> +exit.  If @var{used_only} is @code{true}, only used vector registers should
> +be zeroed.  Return @code{NULL} if possible
> +@end deftypefn
> +
> @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> When optimization is disabled, this hook indicates whether or not
> arguments should be allocated to stack slots.  Normally, GCC allocates
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 3be984b..bee917a 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -3430,6 +3430,10 @@ for a new target instead.
> 
> @hook TARGET_FUNCTION_VALUE_REGNO_P
> 
> +@hook TARGET_ZERO_CALL_USED_REGNO_P
> +
> +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> +
> @defmac APPLY_RESULT_SIZE
> Define this macro if @samp{untyped_call} and @samp{untyped_return}
> need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> @@ -8109,6 +8113,10 @@ and the associated definitions of those functions.
> 
> @hook TARGET_GET_DRAP_RTX
> 
> +@hook TARGET_PRO_EPILOGUE_USE
> +
> +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> +
> @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> 
> @hook TARGET_CONST_ANCHOR
> diff --git a/gcc/function.c b/gcc/function.c
> index 9eee9b5..9908530 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "emit-rtl.h"
> #include "recog.h"
> #include "rtl-error.h"
> +#include "hard-reg-set.h"
> #include "alias.h"
> #include "fold-const.h"
> #include "stor-layout.h"
> @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
>  return seq;
> }
> 
> +/* Check whether the hard register REGNO is live at the exit block
> + * of the current routine.  */
> +bool
> +is_live_reg_at_exit (unsigned int regno)
> +{
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +    {
> +      bitmap live_out = df_get_live_out (e->src);
> +      if (REGNO_REG_SET_P (live_out, regno))
> +	return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Emit a sequence of insns to zero the call-used-registers for the current
> + * function.  */
> +
> +static void
> +gen_call_used_regs_seq (void)
> +{
> +  if (!targetm.calls.pro_epilogue_use)
> +    return;
> +
> +  bool gpr_only = true;
> +  bool used_only = true;
> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> +
> +  if (flag_zero_call_used_regs)
> +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> +	== zero_call_used_regs_unset)
> +      zero_call_used_regs_type = flag_zero_call_used_regs;
> +    else
> +      zero_call_used_regs_type
> +	= DECL_ZERO_CALL_USED_REGS (current_function_decl);
> +  else
> +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> +
> +  /* No need to zero call-used-regs when no user request is present.  */
> +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> +    return;
> +
> +  /* No need to zero call-used-regs in main ().  */
> +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> +    return;
> +
> +  /* No need to zero call-used-regs if __builtin_eh_return is called
> +     since it isn't a normal function return.  */
> +  if (crtl->calls_eh_return)
> +    return;
> +
> +  /* If gpr_only is true, only zero call-used-registers that are
> +     general-purpose registers; if used_only is true, only zero
> +     call-used-registers that are used in the current function.  */
> +  switch (zero_call_used_regs_type)
> +    {
> +      case zero_call_used_regs_all_gpr:
> +	used_only = false;
> +	break;
> +      case zero_call_used_regs_used:
> +	gpr_only = false;
> +	break;
> +      case zero_call_used_regs_all:
> +	gpr_only = false;
> +	used_only = false;
> +	break;
> +      default:
> +	break;
> +    }
> +
> +  /* An optimization to use a single hard insn to zero all vector registers on
> +     the target that provides such insn.  */
> +  if (!gpr_only
> +      && targetm.calls.zero_all_vector_registers)
> +    {
> +      rtx zero_all_vec_insn
> +	= targetm.calls.zero_all_vector_registers (used_only);
> +      if (zero_all_vec_insn)
> +	{
> +	  emit_insn (zero_all_vec_insn);
> +	  gpr_only = true;
> +	}
> +    }
> +
> +  /* For each of the hard registers, check to see whether we should zero it if:
> +     1. it is a call-used-registers;
> + and 2. it is not a fixed-registers;
> + and 3. it is not live at the end of the routine;
> + and 4. it is general purpose register if gpr_only is true;
> + and 5. it is used in the routine if used_only is true;
> +   */
> +
> +  /* This array holds the zero rtx with the correponding machine mode.  */
> +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> +    zero_rtx[i] = NULL_RTX;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!this_target_hard_regs->x_call_used_regs[regno])
> +	continue;
> +      if (fixed_regs[regno])
> +	continue;
> +      if (is_live_reg_at_exit (regno))
> +	continue;
> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> +	continue;
> +      if (used_only && !df_regs_ever_live_p (regno))
> +	continue;
> +
> +      /* Now we can emit insn to zero this register.  */
> +      rtx reg, tmp;
> +
> +      machine_mode mode
> +	= targetm.calls.zero_call_used_regno_mode (regno,
> +						   reg_raw_mode[regno]);
> +      if (mode == VOIDmode)
> +	continue;
> +      if (!have_regs_of_mode[mode])
> +	continue;
> +
> +      reg = gen_rtx_REG (mode, regno);
> +      if (zero_rtx[(int)mode] == NULL_RTX)
> +	{
> +	  zero_rtx[(int)mode] = reg;
> +	  tmp = gen_rtx_SET (reg, const0_rtx);
> +	  emit_insn (tmp);
> +	}
> +      else
> +	emit_move_insn (reg, zero_rtx[(int)mode]);
> +
> +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> +    }
> +
> +  return;
> +}
> +
> +
> /* Return a sequence to be used as the epilogue for the current function,
>   or NULL.  */
> 
> @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> 
>  start_sequence ();
>  emit_note (NOTE_INSN_EPILOGUE_BEG);
> +
> +  gen_call_used_regs_seq ();
> +
>  rtx_insn *seq = targetm.gen_epilogue ();
>  if (seq)
>    emit_jump_insn (seq);
> diff --git a/gcc/function.h b/gcc/function.h
> index d55cbdd..fc36c3e 100644
> --- a/gcc/function.h
> +++ b/gcc/function.h
> @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
> 
> extern void used_types_insert (tree);
> 
> +extern bool is_live_reg_at_exit (unsigned int);
> +
> #endif  /* GCC_FUNCTION_H */
> diff --git a/gcc/target.def b/gcc/target.def
> index 07059a8..8aab63e 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
> default_function_value_regno_p)
> 
> DEFHOOK
> +(zero_call_used_regno_p,
> + "A target hook that returns @code{true} if @var{regno} is the number of a\n\
> +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> +@var{regno} must be the number of a hard general register.\n\
> +\n\
> +If this hook is not defined, then default_zero_call_used_regno_p will be used.",
> + bool, (const unsigned int regno, bool general_reg_only_p),
> + default_zero_call_used_regno_p)
> +
> +DEFHOOK
> +(zero_call_used_regno_mode,
> + "A target hook that returns a mode of suitable to zero the register for the\n\
> +call used register @var{regno} in @var{mode}.\n\
> +\n\
> +If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
> +used.",
> + machine_mode, (const unsigned int regno, machine_mode mode),
> + default_zero_call_used_regno_mode)
> +
> +DEFHOOK
> (fntype_abi,
> "Return the ABI used by a function with type @var{type}; see the\n\
> definition of @code{predefined_function_abi} for details of the ABI\n\
> @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> is needed.",
> rtx, (void), NULL)
> 
> +DEFHOOK
> +(pro_epilogue_use,
> + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
> +prevent deleting register setting instructions in proprologue and epilogue.",
> + rtx, (rtx reg), NULL)
> +
> +DEFHOOK
> +(zero_all_vector_registers,
> + "This hook should return an rtx to zero all vector registers at function\n\
> +exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
> +be zeroed.  Return @code{NULL} if possible",
> + rtx, (bool used_only), NULL)
> +
> /* Return true if all function parameters should be spilled to the
>   stack.  */
> DEFHOOK
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 0113c7b..ed02173 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> #endif
> }
> 
> +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> +
> +bool
> +default_zero_call_used_regno_p (const unsigned int,
> +				bool)
> +{
> +  return false;
> +}
> +
> +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> +
> +machine_mode
> +default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
> +{
> +  return mode;
> +}
> +
> rtx
> default_internal_arg_pointer (void)
> {
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index b572a36..370df19 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> extern rtx default_function_value (const_tree, const_tree, bool);
> extern rtx default_libcall_value (machine_mode, const_rtx);
> extern bool default_function_value_regno_p (const unsigned int);
> +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> +extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
> +						       machine_mode);
> extern rtx default_internal_arg_pointer (void);
> extern rtx default_static_chain (const_tree, bool);
> extern void default_trampoline_init (rtx, tree, rtx);
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..3c2ac72
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> @@ -0,0 +1,3 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..acf48c4
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..9f61dc4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> new file mode 100644
> index 0000000..09048e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> new file mode 100644
> index 0000000..4862688
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> new file mode 100644
> index 0000000..500251b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> new file mode 100644
> index 0000000..8b058e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> new file mode 100644
> index 0000000..d4eaaf7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> new file mode 100644
> index 0000000..dd3bb90
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> new file mode 100644
> index 0000000..e2274f6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> new file mode 100644
> index 0000000..7f5d153
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> new file mode 100644
> index 0000000..fe13d2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> new file mode 100644
> index 0000000..205a532
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..e046684
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> new file mode 100644
> index 0000000..4be8ff6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> new file mode 100644
> index 0000000..0eb34e0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> +
> +__attribute__ ((zero_call_used_regs("used")))
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> new file mode 100644
> index 0000000..cbb63a4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> new file mode 100644
> index 0000000..7573197
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> new file mode 100644
> index 0000000..de71223
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> new file mode 100644
> index 0000000..ccfa441
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> new file mode 100644
> index 0000000..6b46ca3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +__attribute__ ((zero_call_used_regs("all-gpr")))
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> new file mode 100644
> index 0000000..0680f38
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> new file mode 100644
> index 0000000..534defa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> new file mode 100644
> index 0000000..477bb19
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> new file mode 100644
> index 0000000..a305a60
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 95eea63..01a1f24 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1464,6 +1464,15 @@ process_options (void)
> 	}
>    }
> 
> +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> +      && !targetm.calls.pro_epilogue_use)
> +    {
> +      error_at (UNKNOWN_LOCATION,
> +		"%<-fzero-call-used-regs=%> is not supported for this "
> +		"target");
> +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> +    }
> +
>  /* One region RA really helps to decrease the code size.  */
>  if (flag_ira_region == IRA_REGION_AUTODETECT)
>    flag_ira_region
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index 8c5a2e3..71badbd 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
> unsigned final : 1;
> /* Belong to FUNCTION_DECL exclusively.  */
> unsigned regdecl_flag : 1;
> - /* 14 unused bits. */
> +
> + /* How to clear call-used registers upon function return.  */
> + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> +
> + /* 11 unused bits.  */
> };
> 
> struct GTY(()) tree_var_decl {
> diff --git a/gcc/tree.h b/gcc/tree.h
> index cf546ed..d378a88 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> #define DECL_VISIBILITY(NODE) \
>  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
> 
> +/* Value of the function decl's type of zeroing the call used
> +   registers upon return from function.  */
> +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> +
> /* Nonzero means that the decl (or an enclosing scope) had its
>   visibility specified rather than being inferred.  */
> #define DECL_VISIBILITY_SPECIFIED(NODE) \
> -- 
> 1.9.1
Uros Bizjak July 31, 2020, 5:57 p.m. UTC | #3
22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com>
napisala:
>
>
> Richard and Uros,
>
> Could you please review the change that H.J and I rewrote based on your
comments in the previous round of discussion?
>
> This patch is a nice security enhancement for GCC that has been requested
by security people for quite some time.
>
> Thanks a lot for your time.

I'll be away from the keyboard for the next week, but the patch needs a
middle end approval first.

That said, x86 parts looks OK.

Uros.
> Qing
>
> > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi, Gcc team,
> >
> > This patch is a follow-up on the previous patch and corresponding
discussion:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <
https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>
> >
> > From the previous round of discussion, the major issues raised were:
> >
> > A. should be rewritten by using regsets infrastructure.
> > B. Put the patch into middle-end instead of x86 backend.
> >
> > This new patch is rewritten based on the above 2 comments.  The major
changes compared to the previous patch are:
> >
> > 1. Change the names of the option and attribute from
> > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and
zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > to:
> > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and
zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > Add the new option and  new attribute in general.
> > 2. The main code generation part is moved from i386 backend to
middle-end;
> > 3. Add 4 target-hooks;
> > 4. Implement these 4 target-hooks on i386 backend.
> > 5. On a target that does not implement the target hook, issue error for
the new option, issue warning for the new attribute.
> >
> > The patch is as following:
> >
> > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > command-line option and
> > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function
attribue:
> >
> >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> >
> >  Don't zero call-used registers upon function return.
> >
> >  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> >
> >  Zero used call-used general purpose registers upon function return.
> >
> >  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> >
> >  Zero all call-used general purpose registers upon function return.
> >
> >  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> >
> >  Zero used call-used registers upon function return.
> >
> >  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> >
> >  Zero all call-used registers upon function return.
> >
> > The feature is implemented in middle-end. But currently is only valid
on X86.
> >
> > Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> > -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> > -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> > by default on x86-64.
> >
> > Please take a look and let me know any more comment?
> >
> > thanks.
> >
> > Qing
> >
> >
> > ====================================
> >
> > gcc/ChangeLog:
> >
> > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:
qing.zhao@oracle.com>>
> > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> >
> >       * common.opt: Add new option -fzero-call-used-regs.
> >       * config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
> >       (ix86_zero_call_used_regno_mode): Likewise.
> >       (ix86_zero_all_vector_registers): Likewise.
> >       (ix86_expand_prologue): Replace gen_prologue_use with
> >       gen_pro_epilogue_use.
> >       (TARGET_ZERO_CALL_USED_REGNO_P): Define.
> >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
> >       (TARGET_PRO_EPILOGUE_USE): Define.
> >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
> >       * config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
> >       with UNSPECV_PRO_EPILOGUE_USE.
> >       * coretypes.h (enum zero_call_used_regs): New type.
> >       * doc/extend.texi: Document the new zero_call_used_regs attribute.
> >       * doc/invoke.texi: Document the new -fzero-call-used-regs option.
> >       * doc/tm.texi: Regenerate.
> >       * doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
> >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
> >       (TARGET_PRO_EPILOGUE_USE): Likewise.
> >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
> >       * function.c (is_live_reg_at_exit): New function.
> >       (gen_call_used_regs_seq): Likewise.
> >       (make_epilogue_seq): Call gen_call_used_regs_seq.
> >       * function.h (is_live_reg_at_exit): Declare.
> >       * target.def (zero_call_used_regno_p): New hook.
> >       (zero_call_used_regno_mode): Likewise.
> >       (pro_epilogue_use): Likewise.
> >       (zero_all_vector_registers): Likewise.
> >       * targhooks.c (default_zero_call_used_regno_p): New function.
> >       (default_zero_call_used_regno_mode): Likewise.
> >       * targhooks.h (default_zero_call_used_regno_p): Declare.
> >       (default_zero_call_used_regno_mode): Declare.
> >       * toplev.c (process_options): Issue errors when
-fzero-call-used-regs
> >       is used on targets that do not support it.
> >       * tree-core.h (struct tree_decl_with_vis): New field
> >       zero_call_used_regs_type.
> >       * tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
> >
> > gcc/c-family/ChangeLog:
> >
> > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:
qing.zhao@oracle.com>>
> > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> >
> >       * c-attribs.c (c_common_attribute_table): Add new attribute
> >       zero_call_used_regs.
> >       (handle_zero_call_used_regs_attribute): New function.
> >
> > gcc/c/ChangeLog:
> >
> > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:
qing.zhao@oracle.com>>
> > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> >
> >       * c-decl.c (merge_decls): Merge zero_call_used_regs_type.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:
qing.zhao@oracle.com>>
> > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> >
> >       * c-c++-common/zero-scratch-regs-1.c: New test.
> >       * c-c++-common/zero-scratch-regs-2.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-1.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> >
> > ---
> > gcc/c-family/c-attribs.c                           |  68 ++++++++++
> > gcc/c/c-decl.c                                     |   4 +
> > gcc/common.opt                                     |  23 ++++
> > gcc/config/i386/i386.c                             |  58 ++++++++-
> > gcc/config/i386/i386.md                            |   6 +-
> > gcc/coretypes.h                                    |  10 ++
> > gcc/doc/extend.texi                                |  11 ++
> > gcc/doc/invoke.texi                                |  13 +-
> > gcc/doc/tm.texi                                    |  27 ++++
> > gcc/doc/tm.texi.in                                 |   8 ++
> > gcc/function.c                                     | 145
+++++++++++++++++++++
> > gcc/function.h                                     |   2 +
> > gcc/target.def                                     |  33 +++++
> > gcc/targhooks.c                                    |  17 +++
> > gcc/targhooks.h                                    |   3 +
> > gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> > gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> > .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> > .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> > .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> > .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> > .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> > .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> > .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> > .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> > .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> > .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> > .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> > .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> > .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> > .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> > .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> > .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> > .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> > .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> > .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> > .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> > .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> > .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> > .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> > gcc/toplev.c                                       |   9 ++
> > gcc/tree-core.h                                    |   6 +-
> > gcc/tree.h                                         |   5 +
> > 43 files changed, 866 insertions(+), 7 deletions(-)
> > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> >
> > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > index 3721483..cc93d6f 100644
> > --- a/gcc/c-family/c-attribs.c
> > +++ b/gcc/c-family/c-attribs.c
> > @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *,
tree, tree, int, bool *);
> > static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> > static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > static tree handle_no_split_stack_attribute (tree *, tree, tree, int,
bool *);
> > +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree,
int,
> > +                                              bool *);
> > static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> > static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool
*);
> > static tree handle_returns_nonnull_attribute (tree *, tree, tree, int,
bool *);
> > @@ -434,6 +436,9 @@ const struct attribute_spec
c_common_attribute_table[] =
> >                             ignore_attribute, NULL },
> >  { "no_split_stack",        0, 0, true,  false, false, false,
> >                             handle_no_split_stack_attribute, NULL },
> > +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> > +                           handle_zero_call_used_regs_attribute, NULL
},
> > +
> >  /* For internal use (marking of builtins and runtime functions) only.
> >     The name contains space to prevent its usage in source code.  */
> >  { "fn spec",               1, 1, false, true, true, false,
> > @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node,
tree name,
> >  return NULL_TREE;
> > }
> >
> > +/* Handle a "zero_call_used_regs" attribute; arguments as in
> > +   struct attribute_spec.handler.  */
> > +
> > +static tree
> > +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> > +                                   int ARG_UNUSED (flags),
> > +                                   bool *no_add_attris)
> > +{
> > +  tree decl = *node;
> > +  tree id = TREE_VALUE (args);
> > +  enum zero_call_used_regs zero_call_used_regs_type =
zero_call_used_regs_unset;
> > +
> > +  if (TREE_CODE (decl) != FUNCTION_DECL)
> > +    {
> > +      error_at (DECL_SOURCE_LOCATION (decl),
> > +             "%qE attribute applies only to functions", name);
> > +      *no_add_attris = true;
> > +      return NULL_TREE;
> > +    }
> > +  else if (DECL_INITIAL (decl))
> > +    {
> > +      error_at (DECL_SOURCE_LOCATION (decl),
> > +             "cannot set %qE attribute after definition", name);
> > +      *no_add_attris = true;
> > +      return NULL_TREE;
> > +    }
> > +
> > +  if (TREE_CODE (id) != STRING_CST)
> > +    {
> > +      error ("attribute %qE arguments not a string", name);
> > +      *no_add_attris = true;
> > +      return NULL_TREE;
> > +    }
> > +
> > +  if (!targetm.calls.pro_epilogue_use)
> > +    {
> > +      warning (OPT_Wattributes, "%qE attribute directive ignored",
name);
> > +      return NULL_TREE;
> > +    }
> > +
> > +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> > +    zero_call_used_regs_type = zero_call_used_regs_skip;
> > +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> > +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> > +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> > +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> > +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> > +    zero_call_used_regs_type = zero_call_used_regs_used;
> > +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> > +    zero_call_used_regs_type = zero_call_used_regs_all;
> > +  else
> > +    {
> > +      error ("attribute %qE argument must be one of %qs, %qs, %qs,
%qs, or %qs",
> > +          name, "skip", "used-gpr", "all-gpr", "used", "all");
> > +      *no_add_attris = true;
> > +      return NULL_TREE;
> > +    }
> > +
> > +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> > +
> > +  return NULL_TREE;
> > +}
> > +
> > /* Handle a "returns_nonnull" attribute; arguments as in
> >   struct attribute_spec.handler.  */
> >
> > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > index 81bd2ee..ded1880 100644
> > --- a/gcc/c/c-decl.c
> > +++ b/gcc/c/c-decl.c
> > @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree
newtype, tree oldtype)
> >         DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> >       }
> >
> > +      /* Merge the zero_call_used_regs_type information.  */
> > +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> > +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS
(olddecl);
> > +
> >      /* Merge the storage class information.  */
> >      merge_weak (newdecl, olddecl);
> >
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index df8af36..19900f9 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> > Common Report Var(flag_zero_initialized_in_bss) Init(1)
> > Put zero initialized data in the bss section.
> >
> > +fzero-call-used-regs=
> > +Common Report RejectNegative Joined Enum(zero_call_used_regs)
Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> > +Clear call-used registers upon function return.
> > +
> > +Enum
> > +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> > +Known choices of clearing call-used registers upon function return
(for use with the -fzero-call-used-regs= option):
> > +
> > +EnumValue
> > +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> > +
> > +EnumValue
> > +Enum(zero_call_used_regs) String(used-gpr)
Value(zero_call_used_regs_used_gpr)
> > +
> > +EnumValue
> > +Enum(zero_call_used_regs) String(all-gpr)
Value(zero_call_used_regs_all_gpr)
> > +
> > +EnumValue
> > +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> > +
> > +EnumValue
> > +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> > +
> > g
> > Common Driver RejectNegative JoinedOrMissing
> > Generate debug information in default format.
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 5c373c0..fd1aa9c 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int
regno)
> >  return false;
> > }
> >
> > +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> > +
> > +static bool
> > +ix86_zero_call_used_regno_p (const unsigned int regno,
> > +                          bool gpr_only)
> > +{
> > +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> > +}
> > +
> > +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > +
> > +static machine_mode
> > +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> > +{
> > +  /* NB: We only need to zero the lower 32 bits for integer registers
> > +     and the lower 128 bits for vector registers since destination are
> > +     zero-extended to the full register width.  */
> > +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> > +}
> > +
> > +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> > +
> > +static rtx
> > +ix86_zero_all_vector_registers (bool used_only)
> > +{
> > +  if (!TARGET_AVX)
> > +    return NULL;
> > +
> > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> > +      || (TARGET_64BIT
> > +          && (REX_SSE_REGNO_P (regno)
> > +              || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> > +     && (!this_target_hard_regs->x_call_used_regs[regno]
> > +         || fixed_regs[regno]
> > +         || is_live_reg_at_exit (regno)
> > +         || (used_only && !df_regs_ever_live_p (regno))))
> > +      return NULL;
> > +
> > +  return gen_avx_vzeroall ();
> > +}
> > +
> > /* Define how to find the value returned by a function.
> >   VALTYPE is the data type of the value (as a tree).
> >   If the precise function being called is known, FUNC is its
FUNCTION_DECL;
> > @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
> >      insn = emit_insn (gen_set_got (pic));
> >      RTX_FRAME_RELATED_P (insn) = 1;
> >      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> > -      emit_insn (gen_prologue_use (pic));
> > +      emit_insn (gen_pro_epilogue_use (pic));
> >      /* Deleting already emmitted SET_GOT if exist and allocated to
> >        REAL_PIC_OFFSET_TABLE_REGNUM.  */
> >      ix86_elim_entry_set_got (pic);
> > @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
> >     Further, prevent alloca modifications to the stack pointer from
being
> >     combined with prologue modifications.  */
> >  if (TARGET_SEH)
> > -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> > +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> > }
> >
> > /* Emit code to restore REG using a POP insn.  */
> > @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> > #undef TARGET_FUNCTION_VALUE_REGNO_P
> > #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> >
> > +#undef TARGET_ZERO_CALL_USED_REGNO_P
> > +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> > +
> > +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> > +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> > +
> > +#undef TARGET_PRO_EPILOGUE_USE
> > +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> > +
> > +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> > +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> > +
> > #undef TARGET_PROMOTE_FUNCTION_MODE
> > #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index d0ecd9e..e7df59f 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -194,7 +194,7 @@
> >  UNSPECV_STACK_PROBE
> >  UNSPECV_PROBE_STACK_RANGE
> >  UNSPECV_ALIGN
> > -  UNSPECV_PROLOGUE_USE
> > +  UNSPECV_PRO_EPILOGUE_USE
> >  UNSPECV_SPLIT_STACK_RETURN
> >  UNSPECV_CLD
> >  UNSPECV_NOPS
> > @@ -13525,8 +13525,8 @@
> >
> > ;; As USE insns aren't meaningful after reload, this is used instead
> > ;; to prevent deleting instructions setting registers for PIC code
> > -(define_insn "prologue_use"
> > -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> > +(define_insn "pro_epilogue_use"
> > +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
> >  ""
> >  ""
> >  [(set_attr "length" "0")])
> > diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> > index 6b6cfcd..e56d6ec 100644
> > --- a/gcc/coretypes.h
> > +++ b/gcc/coretypes.h
> > @@ -418,6 +418,16 @@ enum symbol_visibility
> >  VISIBILITY_INTERNAL
> > };
> >
> > +/* Zero call-used registers type.  */
> > +enum zero_call_used_regs {
> > +  zero_call_used_regs_unset = 0,
> > +  zero_call_used_regs_skip,
> > +  zero_call_used_regs_used_gpr,
> > +  zero_call_used_regs_all_gpr,
> > +  zero_call_used_regs_used,
> > +  zero_call_used_regs_all
> > +};
> > +
> > /* enums used by the targetm.excess_precision hook.  */
> >
> > enum flt_eval_method
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index c800b74..b32c55f 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@:
@code{ld -r}) on them.
> > A declaration to which @code{weakref} is attached and that is associated
> > with a named @code{target} must be @code{static}.
> >
> > +@item zero_call_used_regs ("@var{choice}")
> > +@cindex @code{zero_call_used_regs} function attribute
> > +The @code{zero_call_used_regs} attribute causes the compiler to zero
> > +call-used registers at function return according to @var{choice}.
> > +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> > +call-used general purpose registers which are used in funciton.
> > +@samp{all-gpr} zeros all call-used general purpose registers.
> > +@samp{used} zeros call-used registers which are used in function.
> > +@samp{all} zeros all call-used registers.  The default for the
> > +attribute is controlled by @option{-fzero-call-used-regs}.
> > +
> > @end table
> >
> > @c This is the end of the target-independent attribute table
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 09bcc5b..da02686 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> > -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> > -funsafe-math-optimizations  -funswitch-loops @gol
> > -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt
@gol
> > --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> > +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin
-fzero-call-used-regs @gol
> > --param @var{name}=@var{value}
> > -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> >
> > @@ -12273,6 +12273,17 @@ int foo (void)
> >
> > Not all targets support this option.
> >
> > +@item -fzero-call-used-regs=@var{choice}
> > +@opindex fzero-call-used-regs
> > +Zero call-used registers at function return according to
> > +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> > +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> > +registers which are used in function.  @samp{all-gpr} zeros all
> > +call-used registers.  @samp{used} zeros call-used registers which
> > +are used in function.  @samp{all} zeros all call-used registers.  You
> > +can control this behavior for a specific function by using the function
> > +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> > +
> > @item --param @var{name}=@var{value}
> > @opindex param
> > In some places, GCC uses various constants to control the amount of
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index 6e7d9dc..43dddd3 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -4571,6 +4571,22 @@ should recognize only the caller's register
numbers.
> > If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> > @end deftypefn
> >
> > +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const
unsigned int @var{regno}, bool @var{general_reg_only_p})
> > +A target hook that returns @code{true} if @var{regno} is the number of
a
> > +call used register.  If @var{general_reg_only_p} is @code{true},
> > +@var{regno} must be the number of a hard general register.
> > +
> > +If this hook is not defined, then default_zero_call_used_regno_p will
be used.
> > +@end deftypefn
> > +
> > +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE
(const unsigned int @var{regno}, machine_mode @var{mode})
> > +A target hook that returns a mode of suitable to zero the register for
the
> > +call used register @var{regno} in @var{mode}.
> > +
> > +If this hook is not defined, then default_zero_call_used_regno_mode
will be
> > +used.
> > +@end deftypefn
> > +
> > @defmac APPLY_RESULT_SIZE
> > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > @@ -12043,6 +12059,17 @@ argument list due to stack realignment.
Return @code{NULL} if no DRAP
> > is needed.
> > @end deftypefn
> >
> > +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> > +This hook should return a UNSPEC_VOLATILE rtx to mark a register in
use to
> > +prevent deleting register setting instructions in proprologue and
epilogue.
> > +@end deftypefn
> > +
> > +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool
@var{used_only})
> > +This hook should return an rtx to zero all vector registers at function
> > +exit.  If @var{used_only} is @code{true}, only used vector registers
should
> > +be zeroed.  Return @code{NULL} if possible
> > +@end deftypefn
> > +
> > @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
(void)
> > When optimization is disabled, this hook indicates whether or not
> > arguments should be allocated to stack slots.  Normally, GCC allocates
> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > index 3be984b..bee917a 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -3430,6 +3430,10 @@ for a new target instead.
> >
> > @hook TARGET_FUNCTION_VALUE_REGNO_P
> >
> > +@hook TARGET_ZERO_CALL_USED_REGNO_P
> > +
> > +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> > +
> > @defmac APPLY_RESULT_SIZE
> > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > @@ -8109,6 +8113,10 @@ and the associated definitions of those
functions.
> >
> > @hook TARGET_GET_DRAP_RTX
> >
> > +@hook TARGET_PRO_EPILOGUE_USE
> > +
> > +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> > +
> > @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> >
> > @hook TARGET_CONST_ANCHOR
> > diff --git a/gcc/function.c b/gcc/function.c
> > index 9eee9b5..9908530 100644
> > --- a/gcc/function.c
> > +++ b/gcc/function.c
> > @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> > #include "emit-rtl.h"
> > #include "recog.h"
> > #include "rtl-error.h"
> > +#include "hard-reg-set.h"
> > #include "alias.h"
> > #include "fold-const.h"
> > #include "stor-layout.h"
> > @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
> >  return seq;
> > }
> >
> > +/* Check whether the hard register REGNO is live at the exit block
> > + * of the current routine.  */
> > +bool
> > +is_live_reg_at_exit (unsigned int regno)
> > +{
> > +  edge e;
> > +  edge_iterator ei;
> > +
> > +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> > +    {
> > +      bitmap live_out = df_get_live_out (e->src);
> > +      if (REGNO_REG_SET_P (live_out, regno))
> > +     return true;
> > +    }
> > +
> > +  return false;
> > +}
> > +
> > +/* Emit a sequence of insns to zero the call-used-registers for the
current
> > + * function.  */
> > +
> > +static void
> > +gen_call_used_regs_seq (void)
> > +{
> > +  if (!targetm.calls.pro_epilogue_use)
> > +    return;
> > +
> > +  bool gpr_only = true;
> > +  bool used_only = true;
> > +  enum zero_call_used_regs zero_call_used_regs_type =
zero_call_used_regs_unset;
> > +
> > +  if (flag_zero_call_used_regs)
> > +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> > +     == zero_call_used_regs_unset)
> > +      zero_call_used_regs_type = flag_zero_call_used_regs;
> > +    else
> > +      zero_call_used_regs_type
> > +     = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > +  else
> > +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS
(current_function_decl);
> > +
> > +  /* No need to zero call-used-regs when no user request is present.
*/
> > +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> > +    return;
> > +
> > +  /* No need to zero call-used-regs in main ().  */
> > +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> > +    return;
> > +
> > +  /* No need to zero call-used-regs if __builtin_eh_return is called
> > +     since it isn't a normal function return.  */
> > +  if (crtl->calls_eh_return)
> > +    return;
> > +
> > +  /* If gpr_only is true, only zero call-used-registers that are
> > +     general-purpose registers; if used_only is true, only zero
> > +     call-used-registers that are used in the current function.  */
> > +  switch (zero_call_used_regs_type)
> > +    {
> > +      case zero_call_used_regs_all_gpr:
> > +     used_only = false;
> > +     break;
> > +      case zero_call_used_regs_used:
> > +     gpr_only = false;
> > +     break;
> > +      case zero_call_used_regs_all:
> > +     gpr_only = false;
> > +     used_only = false;
> > +     break;
> > +      default:
> > +     break;
> > +    }
> > +
> > +  /* An optimization to use a single hard insn to zero all vector
registers on
> > +     the target that provides such insn.  */
> > +  if (!gpr_only
> > +      && targetm.calls.zero_all_vector_registers)
> > +    {
> > +      rtx zero_all_vec_insn
> > +     = targetm.calls.zero_all_vector_registers (used_only);
> > +      if (zero_all_vec_insn)
> > +     {
> > +       emit_insn (zero_all_vec_insn);
> > +       gpr_only = true;
> > +     }
> > +    }
> > +
> > +  /* For each of the hard registers, check to see whether we should
zero it if:
> > +     1. it is a call-used-registers;
> > + and 2. it is not a fixed-registers;
> > + and 3. it is not live at the end of the routine;
> > + and 4. it is general purpose register if gpr_only is true;
> > + and 5. it is used in the routine if used_only is true;
> > +   */
> > +
> > +  /* This array holds the zero rtx with the correponding machine
mode.  */
> > +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> > +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> > +    zero_rtx[i] = NULL_RTX;
> > +
> > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > +    {
> > +      if (!this_target_hard_regs->x_call_used_regs[regno])
> > +     continue;
> > +      if (fixed_regs[regno])
> > +     continue;
> > +      if (is_live_reg_at_exit (regno))
> > +     continue;
> > +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > +     continue;
> > +      if (used_only && !df_regs_ever_live_p (regno))
> > +     continue;
> > +
> > +      /* Now we can emit insn to zero this register.  */
> > +      rtx reg, tmp;
> > +
> > +      machine_mode mode
> > +     = targetm.calls.zero_call_used_regno_mode (regno,
> > +                                                reg_raw_mode[regno]);
> > +      if (mode == VOIDmode)
> > +     continue;
> > +      if (!have_regs_of_mode[mode])
> > +     continue;
> > +
> > +      reg = gen_rtx_REG (mode, regno);
> > +      if (zero_rtx[(int)mode] == NULL_RTX)
> > +     {
> > +       zero_rtx[(int)mode] = reg;
> > +       tmp = gen_rtx_SET (reg, const0_rtx);
> > +       emit_insn (tmp);
> > +     }
> > +      else
> > +     emit_move_insn (reg, zero_rtx[(int)mode]);
> > +
> > +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> > +    }
> > +
> > +  return;
> > +}
> > +
> > +
> > /* Return a sequence to be used as the epilogue for the current
function,
> >   or NULL.  */
> >
> > @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> >
> >  start_sequence ();
> >  emit_note (NOTE_INSN_EPILOGUE_BEG);
> > +
> > +  gen_call_used_regs_seq ();
> > +
> >  rtx_insn *seq = targetm.gen_epilogue ();
> >  if (seq)
> >    emit_jump_insn (seq);
> > diff --git a/gcc/function.h b/gcc/function.h
> > index d55cbdd..fc36c3e 100644
> > --- a/gcc/function.h
> > +++ b/gcc/function.h
> > @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
> >
> > extern void used_types_insert (tree);
> >
> > +extern bool is_live_reg_at_exit (unsigned int);
> > +
> > #endif  /* GCC_FUNCTION_H */
> > diff --git a/gcc/target.def b/gcc/target.def
> > index 07059a8..8aab63e 100644
> > --- a/gcc/target.def
> > +++ b/gcc/target.def
> > @@ -5022,6 +5022,26 @@ If this hook is not defined, then
FUNCTION_VALUE_REGNO_P will be used.",
> > default_function_value_regno_p)
> >
> > DEFHOOK
> > +(zero_call_used_regno_p,
> > + "A target hook that returns @code{true} if @var{regno} is the number
of a\n\
> > +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> > +@var{regno} must be the number of a hard general register.\n\
> > +\n\
> > +If this hook is not defined, then default_zero_call_used_regno_p will
be used.",
> > + bool, (const unsigned int regno, bool general_reg_only_p),
> > + default_zero_call_used_regno_p)
> > +
> > +DEFHOOK
> > +(zero_call_used_regno_mode,
> > + "A target hook that returns a mode of suitable to zero the register
for the\n\
> > +call used register @var{regno} in @var{mode}.\n\
> > +\n\
> > +If this hook is not defined, then default_zero_call_used_regno_mode
will be\n\
> > +used.",
> > + machine_mode, (const unsigned int regno, machine_mode mode),
> > + default_zero_call_used_regno_mode)
> > +
> > +DEFHOOK
> > (fntype_abi,
> > "Return the ABI used by a function with type @var{type}; see the\n\
> > definition of @code{predefined_function_abi} for details of the ABI\n\
> > @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return
@code{NULL} if no DRAP\n\
> > is needed.",
> > rtx, (void), NULL)
> >
> > +DEFHOOK
> > +(pro_epilogue_use,
> > + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in
use to\n\
> > +prevent deleting register setting instructions in proprologue and
epilogue.",
> > + rtx, (rtx reg), NULL)
> > +
> > +DEFHOOK
> > +(zero_all_vector_registers,
> > + "This hook should return an rtx to zero all vector registers at
function\n\
> > +exit.  If @var{used_only} is @code{true}, only used vector registers
should\n\
> > +be zeroed.  Return @code{NULL} if possible",
> > + rtx, (bool used_only), NULL)
> > +
> > /* Return true if all function parameters should be spilled to the
> >   stack.  */
> > DEFHOOK
> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > index 0113c7b..ed02173 100644
> > --- a/gcc/targhooks.c
> > +++ b/gcc/targhooks.c
> > @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int
regno ATTRIBUTE_UNUSED)
> > #endif
> > }
> >
> > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> > +
> > +bool
> > +default_zero_call_used_regno_p (const unsigned int,
> > +                             bool)
> > +{
> > +  return false;
> > +}
> > +
> > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > +
> > +machine_mode
> > +default_zero_call_used_regno_mode (const unsigned int, machine_mode
mode)
> > +{
> > +  return mode;
> > +}
> > +
> > rtx
> > default_internal_arg_pointer (void)
> > {
> > diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> > index b572a36..370df19 100644
> > --- a/gcc/targhooks.h
> > +++ b/gcc/targhooks.h
> > @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p
(const_rtx, int);
> > extern rtx default_function_value (const_tree, const_tree, bool);
> > extern rtx default_libcall_value (machine_mode, const_rtx);
> > extern bool default_function_value_regno_p (const unsigned int);
> > +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> > +extern machine_mode default_zero_call_used_regno_mode (const unsigned
int,
> > +                                                    machine_mode);
> > extern rtx default_internal_arg_pointer (void);
> > extern rtx default_static_chain (const_tree, bool);
> > extern void default_trampoline_init (rtx, tree, rtx);
> > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > new file mode 100644
> > index 0000000..3c2ac72
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > @@ -0,0 +1,3 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > +/* { dg-error "'-fzero-call-used-regs=' is not supported for this
target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > new file mode 100644
> > index 0000000..acf48c4
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > @@ -0,0 +1,4 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2" } */
> > +
> > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
/* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-*
x86_64-*-*" } } 0 } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > new file mode 100644
> > index 0000000..9f61dc4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > new file mode 100644
> > index 0000000..09048e5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > +
> > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> > +
> > +int
> > +foo (int x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > new file mode 100644
> > index 0000000..4862688
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > @@ -0,0 +1,39 @@
> > +/* { dg-do run { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > +
> > +struct S { int i; };
> > +__attribute__((const, noinline, noclone))
> > +struct S foo (int x)
> > +{
> > +  struct S s;
> > +  s.i = x;
> > +  return s;
> > +}
> > +
> > +int a[2048], b[2048], c[2048], d[2048];
> > +struct S e[2048];
> > +
> > +__attribute__((noinline, noclone)) void
> > +bar (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      e[i] = foo (i);
> > +      a[i+2] = a[i] + a[i+1];
> > +      b[10] = b[10] + i;
> > +      c[i] = c[2047 - i];
> > +      d[i] = d[i + 1];
> > +    }
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  int i;
> > +  bar ();
> > +  for (i = 0; i < 1024; i++)
> > +    if (e[i].i != i)
> > +      __builtin_abort ();
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > new file mode 100644
> > index 0000000..500251b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > @@ -0,0 +1,39 @@
> > +/* { dg-do run { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > +
> > +struct S { int i; };
> > +__attribute__((const, noinline, noclone))
> > +struct S foo (int x)
> > +{
> > +  struct S s;
> > +  s.i = x;
> > +  return s;
> > +}
> > +
> > +int a[2048], b[2048], c[2048], d[2048];
> > +struct S e[2048];
> > +
> > +__attribute__((noinline, noclone)) void
> > +bar (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      e[i] = foo (i);
> > +      a[i+2] = a[i] + a[i+1];
> > +      b[10] = b[10] + i;
> > +      c[i] = c[2047 - i];
> > +      d[i] = d[i + 1];
> > +    }
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  int i;
> > +  bar ();
> > +  for (i = 0; i < 1024; i++)
> > +    if (e[i].i != i)
> > +      __builtin_abort ();
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > new file mode 100644
> > index 0000000..8b058e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0,
%xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0,
%xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > new file mode 100644
> > index 0000000..d4eaaf7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" }
*/
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > new file mode 100644
> > index 0000000..dd3bb90
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > +
> > +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > new file mode 100644
> > index 0000000..e2274f6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> > +
> > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > new file mode 100644
> > index 0000000..7f5d153
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > +
> > +int
> > +foo (int x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } }
*/
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > new file mode 100644
> > index 0000000..fe13d2b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > +
> > +float
> > +foo (float z, float y, float x)
> > +{
> > +  return x + y;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target {
! ia32 } } } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > new file mode 100644
> > index 0000000..205a532
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > +
> > +float
> > +foo (float z, float y, float x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > new file mode 100644
> > index 0000000..e046684
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > new file mode 100644
> > index 0000000..4be8ff6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > +
> > +float
> > +foo (float z, float y, float x)
> > +{
> > +  return x + y;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target {
ia32 } } } } */
> > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0,
%xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1,
%xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > new file mode 100644
> > index 0000000..0eb34e0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> > +
> > +__attribute__ ((zero_call_used_regs("used")))
> > +float
> > +foo (float z, float y, float x)
> > +{
> > +  return x + y;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target {
! ia32 } } } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > new file mode 100644
> > index 0000000..cbb63a4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" }
*/
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > new file mode 100644
> > index 0000000..7573197
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7
-mavx512f" } */
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > new file mode 100644
> > index 0000000..de71223
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > new file mode 100644
> > index 0000000..ccfa441
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > +
> > +extern void foo (void) __attribute__
((zero_call_used_regs("used-gpr")));
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > new file mode 100644
> > index 0000000..6b46ca3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > +
> > +__attribute__ ((zero_call_used_regs("all-gpr")))
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > new file mode 100644
> > index 0000000..0680f38
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > +
> > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > new file mode 100644
> > index 0000000..534defa
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > +
> > +int
> > +foo (int x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } }
*/
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > new file mode 100644
> > index 0000000..477bb19
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > +
> > +int
> > +foo (int x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > new file mode 100644
> > index 0000000..a305a60
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > +
> > +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> > +
> > +int
> > +foo (int x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } }
*/
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { !
ia32 } } } } */
> > diff --git a/gcc/toplev.c b/gcc/toplev.c
> > index 95eea63..01a1f24 100644
> > --- a/gcc/toplev.c
> > +++ b/gcc/toplev.c
> > @@ -1464,6 +1464,15 @@ process_options (void)
> >       }
> >    }
> >
> > +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> > +      && !targetm.calls.pro_epilogue_use)
> > +    {
> > +      error_at (UNKNOWN_LOCATION,
> > +             "%<-fzero-call-used-regs=%> is not supported for this "
> > +             "target");
> > +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> > +    }
> > +
> >  /* One region RA really helps to decrease the code size.  */
> >  if (flag_ira_region == IRA_REGION_AUTODETECT)
> >    flag_ira_region
> > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > index 8c5a2e3..71badbd 100644
> > --- a/gcc/tree-core.h
> > +++ b/gcc/tree-core.h
> > @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
> > unsigned final : 1;
> > /* Belong to FUNCTION_DECL exclusively.  */
> > unsigned regdecl_flag : 1;
> > - /* 14 unused bits. */
> > +
> > + /* How to clear call-used registers upon function return.  */
> > + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> > +
> > + /* 11 unused bits.  */
> > };
> >
> > struct GTY(()) tree_var_decl {
> > diff --git a/gcc/tree.h b/gcc/tree.h
> > index cf546ed..d378a88 100644
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> > #define DECL_VISIBILITY(NODE) \
> >  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
> >
> > +/* Value of the function decl's type of zeroing the call used
> > +   registers upon return from function.  */
> > +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> > +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> > +
> > /* Nonzero means that the decl (or an enclosing scope) had its
> >   visibility specified rather than being inferred.  */
> > #define DECL_VISIBILITY_SPECIFIED(NODE) \
> > --
> > 1.9.1
>
Qing Zhao Aug. 3, 2020, 3:42 p.m. UTC | #4
Hi, Uros,

Thanks a lot for your review on X86 parts.

Hi, Richard,

Could you please take a look at the middle-end part to see whether the rewritten addressed your previous concern?

Thanks a lot.

Qing


> On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> 
> 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> >
> >
> > Richard and Uros,
> >
> > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> >
> > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.  
> >
> > Thanks a lot for your time.
> 
> I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> 
> That said, x86 parts looks OK.
> 
> 

> Uros.
> > Qing
> >
> > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > 
> > > Hi, Gcc team,
> > > 
> > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > 
> > > From the previous round of discussion, the major issues raised were:
> > > 
> > > A. should be rewritten by using regsets infrastructure.  
> > > B. Put the patch into middle-end instead of x86 backend. 
> > > 
> > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > 
> > > 1. Change the names of the option and attribute from 
> > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > to:
> > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”) 
> > > Add the new option and  new attribute in general. 
> > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > 3. Add 4 target-hooks;
> > > 4. Implement these 4 target-hooks on i386 backend. 
> > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > 
> > > The patch is as following:
> > > 
> > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > command-line option and
> > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > 
> > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > 
> > >  Don't zero call-used registers upon function return.
> > > 
> > >  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> > > 
> > >  Zero used call-used general purpose registers upon function return.
> > > 
> > >  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> > > 
> > >  Zero all call-used general purpose registers upon function return.
> > > 
> > >  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> > > 
> > >  Zero used call-used registers upon function return.
> > > 
> > >  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> > > 
> > >  Zero all call-used registers upon function return.
> > > 
> > > The feature is implemented in middle-end. But currently is only valid on X86.
> > > 
> > > Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> > > -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> > > -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> > > by default on x86-64.
> > > 
> > > Please take a look and let me know any more comment?
> > > 
> > > thanks.
> > > 
> > > Qing
> > > 
> > > 
> > > ====================================
> > > 
> > > gcc/ChangeLog:
> > > 
> > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > 
> > >       * common.opt: Add new option -fzero-call-used-regs.
> > >       * config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
> > >       (ix86_zero_call_used_regno_mode): Likewise.
> > >       (ix86_zero_all_vector_registers): Likewise.
> > >       (ix86_expand_prologue): Replace gen_prologue_use with
> > >       gen_pro_epilogue_use.
> > >       (TARGET_ZERO_CALL_USED_REGNO_P): Define.
> > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
> > >       (TARGET_PRO_EPILOGUE_USE): Define.
> > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
> > >       * config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
> > >       with UNSPECV_PRO_EPILOGUE_USE.
> > >       * coretypes.h (enum zero_call_used_regs): New type.
> > >       * doc/extend.texi: Document the new zero_call_used_regs attribute.
> > >       * doc/invoke.texi: Document the new -fzero-call-used-regs option.
> > >       * doc/tm.texi: Regenerate.
> > >       * doc/tm.texi.in <http://tm.texi.in/> (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
> > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
> > >       (TARGET_PRO_EPILOGUE_USE): Likewise.
> > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
> > >       * function.c (is_live_reg_at_exit): New function.
> > >       (gen_call_used_regs_seq): Likewise.
> > >       (make_epilogue_seq): Call gen_call_used_regs_seq.
> > >       * function.h (is_live_reg_at_exit): Declare.
> > >       * target.def (zero_call_used_regno_p): New hook.
> > >       (zero_call_used_regno_mode): Likewise.
> > >       (pro_epilogue_use): Likewise.
> > >       (zero_all_vector_registers): Likewise.
> > >       * targhooks.c (default_zero_call_used_regno_p): New function.
> > >       (default_zero_call_used_regno_mode): Likewise.
> > >       * targhooks.h (default_zero_call_used_regno_p): Declare.
> > >       (default_zero_call_used_regno_mode): Declare.
> > >       * toplev.c (process_options): Issue errors when -fzero-call-used-regs
> > >       is used on targets that do not support it.
> > >       * tree-core.h (struct tree_decl_with_vis): New field 
> > >       zero_call_used_regs_type.
> > >       * tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
> > > 
> > > gcc/c-family/ChangeLog:
> > > 
> > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > 
> > >       * c-attribs.c (c_common_attribute_table): Add new attribute
> > >       zero_call_used_regs.
> > >       (handle_zero_call_used_regs_attribute): New function.
> > > 
> > > gcc/c/ChangeLog:
> > > 
> > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > 
> > >       * c-decl.c (merge_decls): Merge zero_call_used_regs_type.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > 
> > >       * c-c++-common/zero-scratch-regs-1.c: New test.
> > >       * c-c++-common/zero-scratch-regs-2.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-1.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> > > 
> > > ---
> > > gcc/c-family/c-attribs.c                           |  68 ++++++++++
> > > gcc/c/c-decl.c                                     |   4 +
> > > gcc/common.opt                                     |  23 ++++
> > > gcc/config/i386/i386.c                             |  58 ++++++++-
> > > gcc/config/i386/i386.md                            |   6 +-
> > > gcc/coretypes.h                                    |  10 ++
> > > gcc/doc/extend.texi                                |  11 ++
> > > gcc/doc/invoke.texi                                |  13 +-
> > > gcc/doc/tm.texi                                    |  27 ++++
> > > gcc/doc/tm.texi.in <http://tm.texi.in/>                                 |   8 ++
> > > gcc/function.c                                     | 145 +++++++++++++++++++++
> > > gcc/function.h                                     |   2 +
> > > gcc/target.def                                     |  33 +++++
> > > gcc/targhooks.c                                    |  17 +++
> > > gcc/targhooks.h                                    |   3 +
> > > gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> > > gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> > > .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> > > .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> > > .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> > > .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> > > .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> > > .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> > > .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> > > .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> > > .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> > > .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> > > .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> > > .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> > > .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> > > .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> > > .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> > > .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> > > .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> > > .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> > > .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> > > .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> > > .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> > > .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> > > .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> > > gcc/toplev.c                                       |   9 ++
> > > gcc/tree-core.h                                    |   6 +-
> > > gcc/tree.h                                         |   5 +
> > > 43 files changed, 866 insertions(+), 7 deletions(-)
> > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > 
> > > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > > index 3721483..cc93d6f 100644
> > > --- a/gcc/c-family/c-attribs.c
> > > +++ b/gcc/c-family/c-attribs.c
> > > @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> > > static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> > > static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > > static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> > > +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> > > +                                              bool *);
> > > static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> > > static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> > > static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> > > @@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
> > >                             ignore_attribute, NULL },
> > >  { "no_split_stack",        0, 0, true,  false, false, false,
> > >                             handle_no_split_stack_attribute, NULL },
> > > +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> > > +                           handle_zero_call_used_regs_attribute, NULL },
> > > +
> > >  /* For internal use (marking of builtins and runtime functions) only.
> > >     The name contains space to prevent its usage in source code.  */
> > >  { "fn spec",               1, 1, false, true, true, false,
> > > @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
> > >  return NULL_TREE;
> > > }
> > > 
> > > +/* Handle a "zero_call_used_regs" attribute; arguments as in
> > > +   struct attribute_spec.handler.  */
> > > +
> > > +static tree
> > > +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> > > +                                   int ARG_UNUSED (flags),
> > > +                                   bool *no_add_attris)
> > > +{
> > > +  tree decl = *node;
> > > +  tree id = TREE_VALUE (args);
> > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > +
> > > +  if (TREE_CODE (decl) != FUNCTION_DECL)
> > > +    {
> > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > +             "%qE attribute applies only to functions", name);
> > > +      *no_add_attris = true;
> > > +      return NULL_TREE;
> > > +    }
> > > +  else if (DECL_INITIAL (decl))
> > > +    {
> > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > +             "cannot set %qE attribute after definition", name);
> > > +      *no_add_attris = true;
> > > +      return NULL_TREE;
> > > +    }
> > > +
> > > +  if (TREE_CODE (id) != STRING_CST)
> > > +    {
> > > +      error ("attribute %qE arguments not a string", name);
> > > +      *no_add_attris = true;
> > > +      return NULL_TREE;
> > > +    }
> > > +
> > > +  if (!targetm.calls.pro_epilogue_use)
> > > +    {
> > > +      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
> > > +      return NULL_TREE;
> > > +    }
> > > +
> > > +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> > > +    zero_call_used_regs_type = zero_call_used_regs_skip;
> > > +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> > > +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> > > +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> > > +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> > > +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> > > +    zero_call_used_regs_type = zero_call_used_regs_used;
> > > +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> > > +    zero_call_used_regs_type = zero_call_used_regs_all;
> > > +  else
> > > +    {
> > > +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
> > > +          name, "skip", "used-gpr", "all-gpr", "used", "all");
> > > +      *no_add_attris = true;
> > > +      return NULL_TREE;
> > > +    }
> > > +
> > > +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> > > +
> > > +  return NULL_TREE;
> > > +}
> > > +
> > > /* Handle a "returns_nonnull" attribute; arguments as in
> > >   struct attribute_spec.handler.  */
> > > 
> > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > > index 81bd2ee..ded1880 100644
> > > --- a/gcc/c/c-decl.c
> > > +++ b/gcc/c/c-decl.c
> > > @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
> > >         DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> > >       }
> > > 
> > > +      /* Merge the zero_call_used_regs_type information.  */
> > > +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> > > +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> > > +
> > >      /* Merge the storage class information.  */
> > >      merge_weak (newdecl, olddecl);
> > > 
> > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > index df8af36..19900f9 100644
> > > --- a/gcc/common.opt
> > > +++ b/gcc/common.opt
> > > @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> > > Common Report Var(flag_zero_initialized_in_bss) Init(1)
> > > Put zero initialized data in the bss section.
> > > 
> > > +fzero-call-used-regs=
> > > +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> > > +Clear call-used registers upon function return.
> > > +
> > > +Enum
> > > +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> > > +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> > > +
> > > +EnumValue
> > > +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> > > +
> > > +EnumValue
> > > +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> > > +
> > > +EnumValue
> > > +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> > > +
> > > +EnumValue
> > > +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> > > +
> > > +EnumValue
> > > +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> > > +
> > > g
> > > Common Driver RejectNegative JoinedOrMissing
> > > Generate debug information in default format.
> > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > index 5c373c0..fd1aa9c 100644
> > > --- a/gcc/config/i386/i386.c
> > > +++ b/gcc/config/i386/i386.c
> > > @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
> > >  return false;
> > > }
> > > 
> > > +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > +
> > > +static bool
> > > +ix86_zero_call_used_regno_p (const unsigned int regno,
> > > +                          bool gpr_only)
> > > +{
> > > +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> > > +}
> > > +
> > > +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > +
> > > +static machine_mode
> > > +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> > > +{
> > > +  /* NB: We only need to zero the lower 32 bits for integer registers
> > > +     and the lower 128 bits for vector registers since destination are
> > > +     zero-extended to the full register width.  */
> > > +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> > > +}
> > > +
> > > +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> > > +
> > > +static rtx
> > > +ix86_zero_all_vector_registers (bool used_only)
> > > +{
> > > +  if (!TARGET_AVX)
> > > +    return NULL;
> > > +
> > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> > > +      || (TARGET_64BIT
> > > +          && (REX_SSE_REGNO_P (regno)
> > > +              || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> > > +     && (!this_target_hard_regs->x_call_used_regs[regno]
> > > +         || fixed_regs[regno]
> > > +         || is_live_reg_at_exit (regno)
> > > +         || (used_only && !df_regs_ever_live_p (regno))))
> > > +      return NULL;
> > > +
> > > +  return gen_avx_vzeroall ();
> > > +}
> > > +
> > > /* Define how to find the value returned by a function.
> > >   VALTYPE is the data type of the value (as a tree).
> > >   If the precise function being called is known, FUNC is its FUNCTION_DECL;
> > > @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
> > >      insn = emit_insn (gen_set_got (pic));
> > >      RTX_FRAME_RELATED_P (insn) = 1;
> > >      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> > > -      emit_insn (gen_prologue_use (pic));
> > > +      emit_insn (gen_pro_epilogue_use (pic));
> > >      /* Deleting already emmitted SET_GOT if exist and allocated to
> > >        REAL_PIC_OFFSET_TABLE_REGNUM.  */
> > >      ix86_elim_entry_set_got (pic);
> > > @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
> > >     Further, prevent alloca modifications to the stack pointer from being
> > >     combined with prologue modifications.  */
> > >  if (TARGET_SEH)
> > > -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> > > +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> > > }
> > > 
> > > /* Emit code to restore REG using a POP insn.  */
> > > @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> > > #undef TARGET_FUNCTION_VALUE_REGNO_P
> > > #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> > > 
> > > +#undef TARGET_ZERO_CALL_USED_REGNO_P
> > > +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> > > +
> > > +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> > > +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> > > +
> > > +#undef TARGET_PRO_EPILOGUE_USE
> > > +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> > > +
> > > +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> > > +
> > > #undef TARGET_PROMOTE_FUNCTION_MODE
> > > #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> > > 
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > index d0ecd9e..e7df59f 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -194,7 +194,7 @@
> > >  UNSPECV_STACK_PROBE
> > >  UNSPECV_PROBE_STACK_RANGE
> > >  UNSPECV_ALIGN
> > > -  UNSPECV_PROLOGUE_USE
> > > +  UNSPECV_PRO_EPILOGUE_USE
> > >  UNSPECV_SPLIT_STACK_RETURN
> > >  UNSPECV_CLD
> > >  UNSPECV_NOPS
> > > @@ -13525,8 +13525,8 @@
> > > 
> > > ;; As USE insns aren't meaningful after reload, this is used instead
> > > ;; to prevent deleting instructions setting registers for PIC code
> > > -(define_insn "prologue_use"
> > > -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> > > +(define_insn "pro_epilogue_use"
> > > +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
> > >  ""
> > >  ""
> > >  [(set_attr "length" "0")])
> > > diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> > > index 6b6cfcd..e56d6ec 100644
> > > --- a/gcc/coretypes.h
> > > +++ b/gcc/coretypes.h
> > > @@ -418,6 +418,16 @@ enum symbol_visibility
> > >  VISIBILITY_INTERNAL
> > > };
> > > 
> > > +/* Zero call-used registers type.  */
> > > +enum zero_call_used_regs {
> > > +  zero_call_used_regs_unset = 0,
> > > +  zero_call_used_regs_skip,
> > > +  zero_call_used_regs_used_gpr,
> > > +  zero_call_used_regs_all_gpr,
> > > +  zero_call_used_regs_used,
> > > +  zero_call_used_regs_all
> > > +};
> > > +
> > > /* enums used by the targetm.excess_precision hook.  */
> > > 
> > > enum flt_eval_method
> > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > > index c800b74..b32c55f 100644
> > > --- a/gcc/doc/extend.texi
> > > +++ b/gcc/doc/extend.texi
> > > @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> > > A declaration to which @code{weakref} is attached and that is associated
> > > with a named @code{target} must be @code{static}.
> > > 
> > > +@item zero_call_used_regs ("@var{choice}")
> > > +@cindex @code{zero_call_used_regs} function attribute
> > > +The @code{zero_call_used_regs} attribute causes the compiler to zero
> > > +call-used registers at function return according to @var{choice}.
> > > +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> > > +call-used general purpose registers which are used in funciton.
> > > +@samp{all-gpr} zeros all call-used general purpose registers.
> > > +@samp{used} zeros call-used registers which are used in function.
> > > +@samp{all} zeros all call-used registers.  The default for the
> > > +attribute is controlled by @option{-fzero-call-used-regs}.
> > > +
> > > @end table
> > > 
> > > @c This is the end of the target-independent attribute table
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index 09bcc5b..da02686 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> > > -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> > > -funsafe-math-optimizations  -funswitch-loops @gol
> > > -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> > > --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> > > +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> > > --param @var{name}=@var{value}
> > > -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> > > 
> > > @@ -12273,6 +12273,17 @@ int foo (void)
> > > 
> > > Not all targets support this option.
> > > 
> > > +@item -fzero-call-used-regs=@var{choice}
> > > +@opindex fzero-call-used-regs
> > > +Zero call-used registers at function return according to
> > > +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> > > +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> > > +registers which are used in function.  @samp{all-gpr} zeros all
> > > +call-used registers.  @samp{used} zeros call-used registers which
> > > +are used in function.  @samp{all} zeros all call-used registers.  You
> > > +can control this behavior for a specific function by using the function
> > > +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> > > +
> > > @item --param @var{name}=@var{value}
> > > @opindex param
> > > In some places, GCC uses various constants to control the amount of
> > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > index 6e7d9dc..43dddd3 100644
> > > --- a/gcc/doc/tm.texi
> > > +++ b/gcc/doc/tm.texi
> > > @@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
> > > If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> > > @end deftypefn
> > > 
> > > +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
> > > +A target hook that returns @code{true} if @var{regno} is the number of a
> > > +call used register.  If @var{general_reg_only_p} is @code{true},
> > > +@var{regno} must be the number of a hard general register.
> > > +
> > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.
> > > +@end deftypefn
> > > +
> > > +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
> > > +A target hook that returns a mode of suitable to zero the register for the
> > > +call used register @var{regno} in @var{mode}.
> > > +
> > > +If this hook is not defined, then default_zero_call_used_regno_mode will be
> > > +used.
> > > +@end deftypefn
> > > +
> > > @defmac APPLY_RESULT_SIZE
> > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > @@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> > > is needed.
> > > @end deftypefn
> > > 
> > > +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> > > +This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
> > > +prevent deleting register setting instructions in proprologue and epilogue.
> > > +@end deftypefn
> > > +
> > > +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
> > > +This hook should return an rtx to zero all vector registers at function
> > > +exit.  If @var{used_only} is @code{true}, only used vector registers should
> > > +be zeroed.  Return @code{NULL} if possible
> > > +@end deftypefn
> > > +
> > > @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> > > When optimization is disabled, this hook indicates whether or not
> > > arguments should be allocated to stack slots.  Normally, GCC allocates
> > > diff --git a/gcc/doc/tm.texi.in <http://tm.texi.in/> b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > index 3be984b..bee917a 100644
> > > --- a/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > +++ b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > @@ -3430,6 +3430,10 @@ for a new target instead.
> > > 
> > > @hook TARGET_FUNCTION_VALUE_REGNO_P
> > > 
> > > +@hook TARGET_ZERO_CALL_USED_REGNO_P
> > > +
> > > +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> > > +
> > > @defmac APPLY_RESULT_SIZE
> > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > @@ -8109,6 +8113,10 @@ and the associated definitions of those functions.
> > > 
> > > @hook TARGET_GET_DRAP_RTX
> > > 
> > > +@hook TARGET_PRO_EPILOGUE_USE
> > > +
> > > +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > +
> > > @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> > > 
> > > @hook TARGET_CONST_ANCHOR
> > > diff --git a/gcc/function.c b/gcc/function.c
> > > index 9eee9b5..9908530 100644
> > > --- a/gcc/function.c
> > > +++ b/gcc/function.c
> > > @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> > > #include "emit-rtl.h"
> > > #include "recog.h"
> > > #include "rtl-error.h"
> > > +#include "hard-reg-set.h"
> > > #include "alias.h"
> > > #include "fold-const.h"
> > > #include "stor-layout.h"
> > > @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
> > >  return seq;
> > > }
> > > 
> > > +/* Check whether the hard register REGNO is live at the exit block
> > > + * of the current routine.  */
> > > +bool
> > > +is_live_reg_at_exit (unsigned int regno)
> > > +{
> > > +  edge e;
> > > +  edge_iterator ei;
> > > +
> > > +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> > > +    {
> > > +      bitmap live_out = df_get_live_out (e->src);
> > > +      if (REGNO_REG_SET_P (live_out, regno))
> > > +     return true;
> > > +    }
> > > +
> > > +  return false;
> > > +}
> > > +
> > > +/* Emit a sequence of insns to zero the call-used-registers for the current
> > > + * function.  */
> > > +
> > > +static void
> > > +gen_call_used_regs_seq (void)
> > > +{
> > > +  if (!targetm.calls.pro_epilogue_use)
> > > +    return;
> > > +
> > > +  bool gpr_only = true;
> > > +  bool used_only = true;
> > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > +
> > > +  if (flag_zero_call_used_regs)
> > > +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> > > +     == zero_call_used_regs_unset)
> > > +      zero_call_used_regs_type = flag_zero_call_used_regs;
> > > +    else
> > > +      zero_call_used_regs_type
> > > +     = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > +  else
> > > +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > +
> > > +  /* No need to zero call-used-regs when no user request is present.  */
> > > +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> > > +    return;
> > > +
> > > +  /* No need to zero call-used-regs in main ().  */
> > > +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> > > +    return;
> > > +
> > > +  /* No need to zero call-used-regs if __builtin_eh_return is called
> > > +     since it isn't a normal function return.  */
> > > +  if (crtl->calls_eh_return)
> > > +    return;
> > > +
> > > +  /* If gpr_only is true, only zero call-used-registers that are
> > > +     general-purpose registers; if used_only is true, only zero
> > > +     call-used-registers that are used in the current function.  */
> > > +  switch (zero_call_used_regs_type)
> > > +    {
> > > +      case zero_call_used_regs_all_gpr:
> > > +     used_only = false;
> > > +     break;
> > > +      case zero_call_used_regs_used:
> > > +     gpr_only = false;
> > > +     break;
> > > +      case zero_call_used_regs_all:
> > > +     gpr_only = false;
> > > +     used_only = false;
> > > +     break;
> > > +      default:
> > > +     break;
> > > +    }
> > > +
> > > +  /* An optimization to use a single hard insn to zero all vector registers on
> > > +     the target that provides such insn.  */
> > > +  if (!gpr_only
> > > +      && targetm.calls.zero_all_vector_registers)
> > > +    {
> > > +      rtx zero_all_vec_insn
> > > +     = targetm.calls.zero_all_vector_registers (used_only);
> > > +      if (zero_all_vec_insn)
> > > +     {
> > > +       emit_insn (zero_all_vec_insn);
> > > +       gpr_only = true;
> > > +     }
> > > +    }
> > > +
> > > +  /* For each of the hard registers, check to see whether we should zero it if:
> > > +     1. it is a call-used-registers;
> > > + and 2. it is not a fixed-registers;
> > > + and 3. it is not live at the end of the routine;
> > > + and 4. it is general purpose register if gpr_only is true;
> > > + and 5. it is used in the routine if used_only is true;
> > > +   */
> > > +
> > > +  /* This array holds the zero rtx with the correponding machine mode.  */
> > > +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> > > +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> > > +    zero_rtx[i] = NULL_RTX;
> > > +
> > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > +    {
> > > +      if (!this_target_hard_regs->x_call_used_regs[regno])
> > > +     continue;
> > > +      if (fixed_regs[regno])
> > > +     continue;
> > > +      if (is_live_reg_at_exit (regno))
> > > +     continue;
> > > +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > > +     continue;
> > > +      if (used_only && !df_regs_ever_live_p (regno))
> > > +     continue;
> > > +
> > > +      /* Now we can emit insn to zero this register.  */
> > > +      rtx reg, tmp;
> > > +
> > > +      machine_mode mode
> > > +     = targetm.calls.zero_call_used_regno_mode (regno,
> > > +                                                reg_raw_mode[regno]);
> > > +      if (mode == VOIDmode)
> > > +     continue;
> > > +      if (!have_regs_of_mode[mode])
> > > +     continue;
> > > +
> > > +      reg = gen_rtx_REG (mode, regno);
> > > +      if (zero_rtx[(int)mode] == NULL_RTX)
> > > +     {
> > > +       zero_rtx[(int)mode] = reg;
> > > +       tmp = gen_rtx_SET (reg, const0_rtx);
> > > +       emit_insn (tmp);
> > > +     }
> > > +      else
> > > +     emit_move_insn (reg, zero_rtx[(int)mode]);
> > > +
> > > +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> > > +    }
> > > +
> > > +  return;
> > > +}
> > > +
> > > +
> > > /* Return a sequence to be used as the epilogue for the current function,
> > >   or NULL.  */
> > > 
> > > @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> > > 
> > >  start_sequence ();
> > >  emit_note (NOTE_INSN_EPILOGUE_BEG);
> > > +
> > > +  gen_call_used_regs_seq ();
> > > +
> > >  rtx_insn *seq = targetm.gen_epilogue ();
> > >  if (seq)
> > >    emit_jump_insn (seq);
> > > diff --git a/gcc/function.h b/gcc/function.h
> > > index d55cbdd..fc36c3e 100644
> > > --- a/gcc/function.h
> > > +++ b/gcc/function.h
> > > @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
> > > 
> > > extern void used_types_insert (tree);
> > > 
> > > +extern bool is_live_reg_at_exit (unsigned int);
> > > +
> > > #endif  /* GCC_FUNCTION_H */
> > > diff --git a/gcc/target.def b/gcc/target.def
> > > index 07059a8..8aab63e 100644
> > > --- a/gcc/target.def
> > > +++ b/gcc/target.def
> > > @@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
> > > default_function_value_regno_p)
> > > 
> > > DEFHOOK
> > > +(zero_call_used_regno_p,
> > > + "A target hook that returns @code{true} if @var{regno} is the number of a\n\
> > > +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> > > +@var{regno} must be the number of a hard general register.\n\
> > > +\n\
> > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.",
> > > + bool, (const unsigned int regno, bool general_reg_only_p),
> > > + default_zero_call_used_regno_p)
> > > +
> > > +DEFHOOK
> > > +(zero_call_used_regno_mode,
> > > + "A target hook that returns a mode of suitable to zero the register for the\n\
> > > +call used register @var{regno} in @var{mode}.\n\
> > > +\n\
> > > +If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
> > > +used.",
> > > + machine_mode, (const unsigned int regno, machine_mode mode),
> > > + default_zero_call_used_regno_mode)
> > > +
> > > +DEFHOOK
> > > (fntype_abi,
> > > "Return the ABI used by a function with type @var{type}; see the\n\
> > > definition of @code{predefined_function_abi} for details of the ABI\n\
> > > @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> > > is needed.",
> > > rtx, (void), NULL)
> > > 
> > > +DEFHOOK
> > > +(pro_epilogue_use,
> > > + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
> > > +prevent deleting register setting instructions in proprologue and epilogue.",
> > > + rtx, (rtx reg), NULL)
> > > +
> > > +DEFHOOK
> > > +(zero_all_vector_registers,
> > > + "This hook should return an rtx to zero all vector registers at function\n\
> > > +exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
> > > +be zeroed.  Return @code{NULL} if possible",
> > > + rtx, (bool used_only), NULL)
> > > +
> > > /* Return true if all function parameters should be spilled to the
> > >   stack.  */
> > > DEFHOOK
> > > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > > index 0113c7b..ed02173 100644
> > > --- a/gcc/targhooks.c
> > > +++ b/gcc/targhooks.c
> > > @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> > > #endif
> > > }
> > > 
> > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > +
> > > +bool
> > > +default_zero_call_used_regno_p (const unsigned int,
> > > +                             bool)
> > > +{
> > > +  return false;
> > > +}
> > > +
> > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > +
> > > +machine_mode
> > > +default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
> > > +{
> > > +  return mode;
> > > +}
> > > +
> > > rtx
> > > default_internal_arg_pointer (void)
> > > {
> > > diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> > > index b572a36..370df19 100644
> > > --- a/gcc/targhooks.h
> > > +++ b/gcc/targhooks.h
> > > @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> > > extern rtx default_function_value (const_tree, const_tree, bool);
> > > extern rtx default_libcall_value (machine_mode, const_rtx);
> > > extern bool default_function_value_regno_p (const unsigned int);
> > > +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> > > +extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
> > > +                                                    machine_mode);
> > > extern rtx default_internal_arg_pointer (void);
> > > extern rtx default_static_chain (const_tree, bool);
> > > extern void default_trampoline_init (rtx, tree, rtx);
> > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > new file mode 100644
> > > index 0000000..3c2ac72
> > > --- /dev/null
> > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > @@ -0,0 +1,3 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > +/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > new file mode 100644
> > > index 0000000..acf48c4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > @@ -0,0 +1,4 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2" } */
> > > +
> > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > new file mode 100644
> > > index 0000000..9f61dc4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > new file mode 100644
> > > index 0000000..09048e5
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > @@ -0,0 +1,21 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > +
> > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> > > +
> > > +int
> > > +foo (int x)
> > > +{
> > > +  return x;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > new file mode 100644
> > > index 0000000..4862688
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > @@ -0,0 +1,39 @@
> > > +/* { dg-do run { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > +
> > > +struct S { int i; };
> > > +__attribute__((const, noinline, noclone))
> > > +struct S foo (int x)
> > > +{
> > > +  struct S s;
> > > +  s.i = x;
> > > +  return s;
> > > +}
> > > +
> > > +int a[2048], b[2048], c[2048], d[2048];
> > > +struct S e[2048];
> > > +
> > > +__attribute__((noinline, noclone)) void
> > > +bar (void)
> > > +{
> > > +  int i;
> > > +  for (i = 0; i < 1024; i++)
> > > +    {
> > > +      e[i] = foo (i);
> > > +      a[i+2] = a[i] + a[i+1];
> > > +      b[10] = b[10] + i;
> > > +      c[i] = c[2047 - i];
> > > +      d[i] = d[i + 1];
> > > +    }
> > > +}
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  int i;
> > > +  bar ();
> > > +  for (i = 0; i < 1024; i++)
> > > +    if (e[i].i != i)
> > > +      __builtin_abort ();
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > new file mode 100644
> > > index 0000000..500251b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > @@ -0,0 +1,39 @@
> > > +/* { dg-do run { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > +
> > > +struct S { int i; };
> > > +__attribute__((const, noinline, noclone))
> > > +struct S foo (int x)
> > > +{
> > > +  struct S s;
> > > +  s.i = x;
> > > +  return s;
> > > +}
> > > +
> > > +int a[2048], b[2048], c[2048], d[2048];
> > > +struct S e[2048];
> > > +
> > > +__attribute__((noinline, noclone)) void
> > > +bar (void)
> > > +{
> > > +  int i;
> > > +  for (i = 0; i < 1024; i++)
> > > +    {
> > > +      e[i] = foo (i);
> > > +      a[i+2] = a[i] + a[i+1];
> > > +      b[10] = b[10] + i;
> > > +      c[i] = c[2047 - i];
> > > +      d[i] = d[i + 1];
> > > +    }
> > > +}
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  int i;
> > > +  bar ();
> > > +  for (i = 0; i < 1024; i++)
> > > +    if (e[i].i != i)
> > > +      __builtin_abort ();
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > new file mode 100644
> > > index 0000000..8b058e3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > @@ -0,0 +1,21 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > new file mode 100644
> > > index 0000000..d4eaaf7
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > new file mode 100644
> > > index 0000000..dd3bb90
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > +
> > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > new file mode 100644
> > > index 0000000..e2274f6
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> > > +
> > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > new file mode 100644
> > > index 0000000..7f5d153
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > +
> > > +int
> > > +foo (int x)
> > > +{
> > > +  return x;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > new file mode 100644
> > > index 0000000..fe13d2b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > +
> > > +float
> > > +foo (float z, float y, float x)
> > > +{
> > > +  return x + y;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > new file mode 100644
> > > index 0000000..205a532
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > +
> > > +float
> > > +foo (float z, float y, float x)
> > > +{
> > > +  return x;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > new file mode 100644
> > > index 0000000..e046684
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > new file mode 100644
> > > index 0000000..4be8ff6
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > +
> > > +float
> > > +foo (float z, float y, float x)
> > > +{
> > > +  return x + y;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > new file mode 100644
> > > index 0000000..0eb34e0
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> > > +
> > > +__attribute__ ((zero_call_used_regs("used")))
> > > +float
> > > +foo (float z, float y, float x)
> > > +{
> > > +  return x + y;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > new file mode 100644
> > > index 0000000..cbb63a4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > new file mode 100644
> > > index 0000000..7573197
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > new file mode 100644
> > > index 0000000..de71223
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > new file mode 100644
> > > index 0000000..ccfa441
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > +
> > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > new file mode 100644
> > > index 0000000..6b46ca3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > +
> > > +__attribute__ ((zero_call_used_regs("all-gpr")))
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > new file mode 100644
> > > index 0000000..0680f38
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > +
> > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > new file mode 100644
> > > index 0000000..534defa
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > +
> > > +int
> > > +foo (int x)
> > > +{
> > > +  return x;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > new file mode 100644
> > > index 0000000..477bb19
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > +
> > > +int
> > > +foo (int x)
> > > +{
> > > +  return x;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > new file mode 100644
> > > index 0000000..a305a60
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > @@ -0,0 +1,15 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > +
> > > +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > +
> > > +int
> > > +foo (int x)
> > > +{
> > > +  return x;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > diff --git a/gcc/toplev.c b/gcc/toplev.c
> > > index 95eea63..01a1f24 100644
> > > --- a/gcc/toplev.c
> > > +++ b/gcc/toplev.c
> > > @@ -1464,6 +1464,15 @@ process_options (void)
> > >       }
> > >    }
> > > 
> > > +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> > > +      && !targetm.calls.pro_epilogue_use)
> > > +    {
> > > +      error_at (UNKNOWN_LOCATION,
> > > +             "%<-fzero-call-used-regs=%> is not supported for this "
> > > +             "target");
> > > +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> > > +    }
> > > +
> > >  /* One region RA really helps to decrease the code size.  */
> > >  if (flag_ira_region == IRA_REGION_AUTODETECT)
> > >    flag_ira_region
> > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > > index 8c5a2e3..71badbd 100644
> > > --- a/gcc/tree-core.h
> > > +++ b/gcc/tree-core.h
> > > @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
> > > unsigned final : 1;
> > > /* Belong to FUNCTION_DECL exclusively.  */
> > > unsigned regdecl_flag : 1;
> > > - /* 14 unused bits. */
> > > +
> > > + /* How to clear call-used registers upon function return.  */
> > > + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> > > +
> > > + /* 11 unused bits.  */
> > > };
> > > 
> > > struct GTY(()) tree_var_decl {
> > > diff --git a/gcc/tree.h b/gcc/tree.h
> > > index cf546ed..d378a88 100644
> > > --- a/gcc/tree.h
> > > +++ b/gcc/tree.h
> > > @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> > > #define DECL_VISIBILITY(NODE) \
> > >  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
> > > 
> > > +/* Value of the function decl's type of zeroing the call used
> > > +   registers upon return from function.  */
> > > +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> > > +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> > > +
> > > /* Nonzero means that the decl (or an enclosing scope) had its
> > >   visibility specified rather than being inferred.  */
> > > #define DECL_VISIBILITY_SPECIFIED(NODE) \
> > > -- 
> > > 1.9.1
> >
>
Richard Biener Aug. 4, 2020, 7:35 a.m. UTC | #5
On Mon, 3 Aug 2020, Qing Zhao wrote:

> Hi, Uros,
> 
> Thanks a lot for your review on X86 parts.
> 
> Hi, Richard,
> 
> Could you please take a look at the middle-end part to see whether the 
> rewritten addressed your previous concern?

I have a few comments below - I'm not sure I'm qualified to fully
review the rest though.

> Thanks a lot.
> 
> Qing
> 
> 
> > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > 
> > 
> > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > >
> > >
> > > Richard and Uros,
> > >
> > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > >
> > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.  
> > >
> > > Thanks a lot for your time.
> > 
> > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > 
> > That said, x86 parts looks OK.
> > 
> > 
> 
> > Uros.
> > > Qing
> > >
> > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > 
> > > > Hi, Gcc team,
> > > > 
> > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > 
> > > > From the previous round of discussion, the major issues raised were:
> > > > 
> > > > A. should be rewritten by using regsets infrastructure.  
> > > > B. Put the patch into middle-end instead of x86 backend. 
> > > > 
> > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > 
> > > > 1. Change the names of the option and attribute from 
> > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > to:
> > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”) 
> > > > Add the new option and  new attribute in general. 
> > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > 3. Add 4 target-hooks;
> > > > 4. Implement these 4 target-hooks on i386 backend. 
> > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > 
> > > > The patch is as following:
> > > > 
> > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > command-line option and
> > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > 
> > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > 
> > > >  Don't zero call-used registers upon function return.

Does a return via EH unwinding also constitute a function return?  I
think you may want to have a finally handler or support in the unwinder
for this?  Then there's abnormal return via longjmp & friends, I guess
there's nothing that can be done there besides patching glibc?

In general I am missing reasoning as why to use -fzero-call-used-regs=
in the documentation, that is, what is the thread model and what are
the guarantees?  Is there any point zeroing registers when spill slots
are left populated with stale register contents?  How do I (and why
would I want to?) ensure that there's no information leak from the
implementation of 'foo' to their callers?  Do I need to compile all
of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
or is it enough to annotate API boundaries I want to proptect with
zero_call_used_regs("...")?

Again - what's the intended use (and how does it fulful anything useful
for that case)?

> > > >  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> > > > 
> > > >  Zero used call-used general purpose registers upon function return.
> > > > 
> > > >  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> > > > 
> > > >  Zero all call-used general purpose registers upon function return.
> > > > 
> > > >  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> > > > 
> > > >  Zero used call-used registers upon function return.
> > > > 
> > > >  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> > > > 
> > > >  Zero all call-used registers upon function return.
> > > > 
> > > > The feature is implemented in middle-end. But currently is only valid on X86.
> > > > 
> > > > Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> > > > -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> > > > -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> > > > by default on x86-64.
> > > > 
> > > > Please take a look and let me know any more comment?
> > > > 
> > > > thanks.
> > > > 
> > > > Qing
> > > > 
> > > > 
> > > > ====================================
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > 
> > > >       * common.opt: Add new option -fzero-call-used-regs.
> > > >       * config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
> > > >       (ix86_zero_call_used_regno_mode): Likewise.
> > > >       (ix86_zero_all_vector_registers): Likewise.
> > > >       (ix86_expand_prologue): Replace gen_prologue_use with
> > > >       gen_pro_epilogue_use.
> > > >       (TARGET_ZERO_CALL_USED_REGNO_P): Define.
> > > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
> > > >       (TARGET_PRO_EPILOGUE_USE): Define.
> > > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
> > > >       * config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
> > > >       with UNSPECV_PRO_EPILOGUE_USE.
> > > >       * coretypes.h (enum zero_call_used_regs): New type.
> > > >       * doc/extend.texi: Document the new zero_call_used_regs attribute.
> > > >       * doc/invoke.texi: Document the new -fzero-call-used-regs option.
> > > >       * doc/tm.texi: Regenerate.
> > > >       * doc/tm.texi.in <http://tm.texi.in/> (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
> > > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
> > > >       (TARGET_PRO_EPILOGUE_USE): Likewise.
> > > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
> > > >       * function.c (is_live_reg_at_exit): New function.
> > > >       (gen_call_used_regs_seq): Likewise.
> > > >       (make_epilogue_seq): Call gen_call_used_regs_seq.
> > > >       * function.h (is_live_reg_at_exit): Declare.
> > > >       * target.def (zero_call_used_regno_p): New hook.
> > > >       (zero_call_used_regno_mode): Likewise.
> > > >       (pro_epilogue_use): Likewise.
> > > >       (zero_all_vector_registers): Likewise.
> > > >       * targhooks.c (default_zero_call_used_regno_p): New function.
> > > >       (default_zero_call_used_regno_mode): Likewise.
> > > >       * targhooks.h (default_zero_call_used_regno_p): Declare.
> > > >       (default_zero_call_used_regno_mode): Declare.
> > > >       * toplev.c (process_options): Issue errors when -fzero-call-used-regs
> > > >       is used on targets that do not support it.
> > > >       * tree-core.h (struct tree_decl_with_vis): New field 
> > > >       zero_call_used_regs_type.
> > > >       * tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
> > > > 
> > > > gcc/c-family/ChangeLog:
> > > > 
> > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > 
> > > >       * c-attribs.c (c_common_attribute_table): Add new attribute
> > > >       zero_call_used_regs.
> > > >       (handle_zero_call_used_regs_attribute): New function.
> > > > 
> > > > gcc/c/ChangeLog:
> > > > 
> > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > 
> > > >       * c-decl.c (merge_decls): Merge zero_call_used_regs_type.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > 
> > > >       * c-c++-common/zero-scratch-regs-1.c: New test.
> > > >       * c-c++-common/zero-scratch-regs-2.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-1.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> > > > 
> > > > ---
> > > > gcc/c-family/c-attribs.c                           |  68 ++++++++++
> > > > gcc/c/c-decl.c                                     |   4 +
> > > > gcc/common.opt                                     |  23 ++++
> > > > gcc/config/i386/i386.c                             |  58 ++++++++-
> > > > gcc/config/i386/i386.md                            |   6 +-
> > > > gcc/coretypes.h                                    |  10 ++
> > > > gcc/doc/extend.texi                                |  11 ++
> > > > gcc/doc/invoke.texi                                |  13 +-
> > > > gcc/doc/tm.texi                                    |  27 ++++
> > > > gcc/doc/tm.texi.in <http://tm.texi.in/>                                 |   8 ++
> > > > gcc/function.c                                     | 145 +++++++++++++++++++++
> > > > gcc/function.h                                     |   2 +
> > > > gcc/target.def                                     |  33 +++++
> > > > gcc/targhooks.c                                    |  17 +++
> > > > gcc/targhooks.h                                    |   3 +
> > > > gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> > > > gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> > > > .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> > > > .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> > > > .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> > > > .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> > > > .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> > > > .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> > > > .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> > > > .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> > > > .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> > > > .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> > > > .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> > > > .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> > > > .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> > > > .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> > > > .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> > > > .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> > > > .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> > > > .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> > > > .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> > > > .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> > > > .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> > > > .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> > > > .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> > > > gcc/toplev.c                                       |   9 ++
> > > > gcc/tree-core.h                                    |   6 +-
> > > > gcc/tree.h                                         |   5 +
> > > > 43 files changed, 866 insertions(+), 7 deletions(-)
> > > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > 
> > > > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > > > index 3721483..cc93d6f 100644
> > > > --- a/gcc/c-family/c-attribs.c
> > > > +++ b/gcc/c-family/c-attribs.c
> > > > @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> > > > static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> > > > static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > > > static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> > > > +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> > > > +                                              bool *);
> > > > static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> > > > static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> > > > static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> > > > @@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
> > > >                             ignore_attribute, NULL },
> > > >  { "no_split_stack",        0, 0, true,  false, false, false,
> > > >                             handle_no_split_stack_attribute, NULL },
> > > > +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> > > > +                           handle_zero_call_used_regs_attribute, NULL },
> > > > +
> > > >  /* For internal use (marking of builtins and runtime functions) only.
> > > >     The name contains space to prevent its usage in source code.  */
> > > >  { "fn spec",               1, 1, false, true, true, false,
> > > > @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
> > > >  return NULL_TREE;
> > > > }
> > > > 
> > > > +/* Handle a "zero_call_used_regs" attribute; arguments as in
> > > > +   struct attribute_spec.handler.  */
> > > > +
> > > > +static tree
> > > > +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> > > > +                                   int ARG_UNUSED (flags),
> > > > +                                   bool *no_add_attris)
> > > > +{
> > > > +  tree decl = *node;
> > > > +  tree id = TREE_VALUE (args);
> > > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > > +
> > > > +  if (TREE_CODE (decl) != FUNCTION_DECL)
> > > > +    {
> > > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > > +             "%qE attribute applies only to functions", name);
> > > > +      *no_add_attris = true;
> > > > +      return NULL_TREE;
> > > > +    }
> > > > +  else if (DECL_INITIAL (decl))
> > > > +    {
> > > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > > +             "cannot set %qE attribute after definition", name);

Why's that?

> > > > +      *no_add_attris = true;
> > > > +      return NULL_TREE;
> > > > +    }
> > > > +
> > > > +  if (TREE_CODE (id) != STRING_CST)
> > > > +    {
> > > > +      error ("attribute %qE arguments not a string", name);
> > > > +      *no_add_attris = true;
> > > > +      return NULL_TREE;
> > > > +    }
> > > > +
> > > > +  if (!targetm.calls.pro_epilogue_use)
> > > > +    {
> > > > +      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
> > > > +      return NULL_TREE;
> > > > +    }
> > > > +
> > > > +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> > > > +    zero_call_used_regs_type = zero_call_used_regs_skip;
> > > > +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> > > > +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> > > > +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> > > > +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> > > > +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> > > > +    zero_call_used_regs_type = zero_call_used_regs_used;
> > > > +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> > > > +    zero_call_used_regs_type = zero_call_used_regs_all;
> > > > +  else
> > > > +    {
> > > > +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
> > > > +          name, "skip", "used-gpr", "all-gpr", "used", "all");
> > > > +      *no_add_attris = true;
> > > > +      return NULL_TREE;
> > > > +    }
> > > > +
> > > > +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> > > > +
> > > > +  return NULL_TREE;
> > > > +}
> > > > +
> > > > /* Handle a "returns_nonnull" attribute; arguments as in
> > > >   struct attribute_spec.handler.  */
> > > > 
> > > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > > > index 81bd2ee..ded1880 100644
> > > > --- a/gcc/c/c-decl.c
> > > > +++ b/gcc/c/c-decl.c
> > > > @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
> > > >         DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> > > >       }
> > > > 
> > > > +      /* Merge the zero_call_used_regs_type information.  */
> > > > +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> > > > +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> > > > +

If you need this (see below) then likely cp/* needs similar adjustment
so do other places in the middle-end (function cloning, etc)

> > > >      /* Merge the storage class information.  */
> > > >      merge_weak (newdecl, olddecl);
> > > > 
> > > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > > index df8af36..19900f9 100644
> > > > --- a/gcc/common.opt
> > > > +++ b/gcc/common.opt
> > > > @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> > > > Common Report Var(flag_zero_initialized_in_bss) Init(1)
> > > > Put zero initialized data in the bss section.
> > > > 
> > > > +fzero-call-used-regs=
> > > > +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> > > > +Clear call-used registers upon function return.
> > > > +
> > > > +Enum
> > > > +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> > > > +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> > > > +
> > > > +EnumValue
> > > > +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> > > > +
> > > > +EnumValue
> > > > +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> > > > +
> > > > +EnumValue
> > > > +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> > > > +
> > > > +EnumValue
> > > > +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> > > > +
> > > > +EnumValue
> > > > +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> > > > +
> > > > g
> > > > Common Driver RejectNegative JoinedOrMissing
> > > > Generate debug information in default format.
> > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > index 5c373c0..fd1aa9c 100644
> > > > --- a/gcc/config/i386/i386.c
> > > > +++ b/gcc/config/i386/i386.c
> > > > @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
> > > >  return false;
> > > > }
> > > > 
> > > > +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > > +
> > > > +static bool
> > > > +ix86_zero_call_used_regno_p (const unsigned int regno,
> > > > +                          bool gpr_only)
> > > > +{
> > > > +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> > > > +}
> > > > +
> > > > +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > > +
> > > > +static machine_mode
> > > > +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> > > > +{
> > > > +  /* NB: We only need to zero the lower 32 bits for integer registers
> > > > +     and the lower 128 bits for vector registers since destination are
> > > > +     zero-extended to the full register width.  */
> > > > +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> > > > +}
> > > > +
> > > > +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> > > > +
> > > > +static rtx
> > > > +ix86_zero_all_vector_registers (bool used_only)
> > > > +{
> > > > +  if (!TARGET_AVX)
> > > > +    return NULL;
> > > > +
> > > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > > +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> > > > +      || (TARGET_64BIT
> > > > +          && (REX_SSE_REGNO_P (regno)
> > > > +              || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> > > > +     && (!this_target_hard_regs->x_call_used_regs[regno]
> > > > +         || fixed_regs[regno]
> > > > +         || is_live_reg_at_exit (regno)
> > > > +         || (used_only && !df_regs_ever_live_p (regno))))
> > > > +      return NULL;
> > > > +
> > > > +  return gen_avx_vzeroall ();
> > > > +}
> > > > +
> > > > /* Define how to find the value returned by a function.
> > > >   VALTYPE is the data type of the value (as a tree).
> > > >   If the precise function being called is known, FUNC is its FUNCTION_DECL;
> > > > @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
> > > >      insn = emit_insn (gen_set_got (pic));
> > > >      RTX_FRAME_RELATED_P (insn) = 1;
> > > >      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> > > > -      emit_insn (gen_prologue_use (pic));
> > > > +      emit_insn (gen_pro_epilogue_use (pic));
> > > >      /* Deleting already emmitted SET_GOT if exist and allocated to
> > > >        REAL_PIC_OFFSET_TABLE_REGNUM.  */
> > > >      ix86_elim_entry_set_got (pic);
> > > > @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
> > > >     Further, prevent alloca modifications to the stack pointer from being
> > > >     combined with prologue modifications.  */
> > > >  if (TARGET_SEH)
> > > > -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> > > > +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> > > > }
> > > > 
> > > > /* Emit code to restore REG using a POP insn.  */
> > > > @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> > > > #undef TARGET_FUNCTION_VALUE_REGNO_P
> > > > #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> > > > 
> > > > +#undef TARGET_ZERO_CALL_USED_REGNO_P
> > > > +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> > > > +
> > > > +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> > > > +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> > > > +
> > > > +#undef TARGET_PRO_EPILOGUE_USE
> > > > +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> > > > +
> > > > +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > > +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> > > > +
> > > > #undef TARGET_PROMOTE_FUNCTION_MODE
> > > > #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> > > > 
> > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > > index d0ecd9e..e7df59f 100644
> > > > --- a/gcc/config/i386/i386.md
> > > > +++ b/gcc/config/i386/i386.md
> > > > @@ -194,7 +194,7 @@
> > > >  UNSPECV_STACK_PROBE
> > > >  UNSPECV_PROBE_STACK_RANGE
> > > >  UNSPECV_ALIGN
> > > > -  UNSPECV_PROLOGUE_USE
> > > > +  UNSPECV_PRO_EPILOGUE_USE
> > > >  UNSPECV_SPLIT_STACK_RETURN
> > > >  UNSPECV_CLD
> > > >  UNSPECV_NOPS
> > > > @@ -13525,8 +13525,8 @@
> > > > 
> > > > ;; As USE insns aren't meaningful after reload, this is used instead
> > > > ;; to prevent deleting instructions setting registers for PIC code
> > > > -(define_insn "prologue_use"
> > > > -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> > > > +(define_insn "pro_epilogue_use"
> > > > +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
> > > >  ""
> > > >  ""
> > > >  [(set_attr "length" "0")])
> > > > diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> > > > index 6b6cfcd..e56d6ec 100644
> > > > --- a/gcc/coretypes.h
> > > > +++ b/gcc/coretypes.h
> > > > @@ -418,6 +418,16 @@ enum symbol_visibility
> > > >  VISIBILITY_INTERNAL
> > > > };
> > > > 
> > > > +/* Zero call-used registers type.  */
> > > > +enum zero_call_used_regs {
> > > > +  zero_call_used_regs_unset = 0,
> > > > +  zero_call_used_regs_skip,
> > > > +  zero_call_used_regs_used_gpr,
> > > > +  zero_call_used_regs_all_gpr,
> > > > +  zero_call_used_regs_used,
> > > > +  zero_call_used_regs_all
> > > > +};
> > > > +
> > > > /* enums used by the targetm.excess_precision hook.  */
> > > > 
> > > > enum flt_eval_method
> > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > > > index c800b74..b32c55f 100644
> > > > --- a/gcc/doc/extend.texi
> > > > +++ b/gcc/doc/extend.texi
> > > > @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> > > > A declaration to which @code{weakref} is attached and that is associated
> > > > with a named @code{target} must be @code{static}.
> > > > 
> > > > +@item zero_call_used_regs ("@var{choice}")
> > > > +@cindex @code{zero_call_used_regs} function attribute
> > > > +The @code{zero_call_used_regs} attribute causes the compiler to zero
> > > > +call-used registers at function return according to @var{choice}.
> > > > +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> > > > +call-used general purpose registers which are used in funciton.
> > > > +@samp{all-gpr} zeros all call-used general purpose registers.
> > > > +@samp{used} zeros call-used registers which are used in function.
> > > > +@samp{all} zeros all call-used registers.  The default for the
> > > > +attribute is controlled by @option{-fzero-call-used-regs}.
> > > > +
> > > > @end table
> > > > 
> > > > @c This is the end of the target-independent attribute table
> > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > index 09bcc5b..da02686 100644
> > > > --- a/gcc/doc/invoke.texi
> > > > +++ b/gcc/doc/invoke.texi
> > > > @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> > > > -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> > > > -funsafe-math-optimizations  -funswitch-loops @gol
> > > > -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> > > > --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> > > > +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> > > > --param @var{name}=@var{value}
> > > > -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> > > > 
> > > > @@ -12273,6 +12273,17 @@ int foo (void)
> > > > 
> > > > Not all targets support this option.
> > > > 
> > > > +@item -fzero-call-used-regs=@var{choice}
> > > > +@opindex fzero-call-used-regs
> > > > +Zero call-used registers at function return according to
> > > > +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> > > > +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> > > > +registers which are used in function.  @samp{all-gpr} zeros all
> > > > +call-used registers.  @samp{used} zeros call-used registers which
> > > > +are used in function.  @samp{all} zeros all call-used registers.  You
> > > > +can control this behavior for a specific function by using the function
> > > > +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> > > > +
> > > > @item --param @var{name}=@var{value}
> > > > @opindex param
> > > > In some places, GCC uses various constants to control the amount of
> > > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > > index 6e7d9dc..43dddd3 100644
> > > > --- a/gcc/doc/tm.texi
> > > > +++ b/gcc/doc/tm.texi
> > > > @@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
> > > > If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> > > > @end deftypefn
> > > > 
> > > > +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
> > > > +A target hook that returns @code{true} if @var{regno} is the number of a
> > > > +call used register.  If @var{general_reg_only_p} is @code{true},
> > > > +@var{regno} must be the number of a hard general register.
> > > > +
> > > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.
> > > > +@end deftypefn
> > > > +
> > > > +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
> > > > +A target hook that returns a mode of suitable to zero the register for the
> > > > +call used register @var{regno} in @var{mode}.
> > > > +
> > > > +If this hook is not defined, then default_zero_call_used_regno_mode will be
> > > > +used.
> > > > +@end deftypefn
> > > > +
> > > > @defmac APPLY_RESULT_SIZE
> > > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > > @@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> > > > is needed.
> > > > @end deftypefn
> > > > 
> > > > +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> > > > +This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
> > > > +prevent deleting register setting instructions in proprologue and epilogue.
> > > > +@end deftypefn
> > > > +
> > > > +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
> > > > +This hook should return an rtx to zero all vector registers at function
> > > > +exit.  If @var{used_only} is @code{true}, only used vector registers should
> > > > +be zeroed.  Return @code{NULL} if possible
> > > > +@end deftypefn
> > > > +
> > > > @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> > > > When optimization is disabled, this hook indicates whether or not
> > > > arguments should be allocated to stack slots.  Normally, GCC allocates
> > > > diff --git a/gcc/doc/tm.texi.in <http://tm.texi.in/> b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > index 3be984b..bee917a 100644
> > > > --- a/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > +++ b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > @@ -3430,6 +3430,10 @@ for a new target instead.
> > > > 
> > > > @hook TARGET_FUNCTION_VALUE_REGNO_P
> > > > 
> > > > +@hook TARGET_ZERO_CALL_USED_REGNO_P
> > > > +
> > > > +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> > > > +
> > > > @defmac APPLY_RESULT_SIZE
> > > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > > @@ -8109,6 +8113,10 @@ and the associated definitions of those functions.
> > > > 
> > > > @hook TARGET_GET_DRAP_RTX
> > > > 
> > > > +@hook TARGET_PRO_EPILOGUE_USE
> > > > +
> > > > +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > > +
> > > > @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> > > > 
> > > > @hook TARGET_CONST_ANCHOR
> > > > diff --git a/gcc/function.c b/gcc/function.c
> > > > index 9eee9b5..9908530 100644
> > > > --- a/gcc/function.c
> > > > +++ b/gcc/function.c
> > > > @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> > > > #include "emit-rtl.h"
> > > > #include "recog.h"
> > > > #include "rtl-error.h"
> > > > +#include "hard-reg-set.h"
> > > > #include "alias.h"
> > > > #include "fold-const.h"
> > > > #include "stor-layout.h"
> > > > @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
> > > >  return seq;
> > > > }
> > > > 
> > > > +/* Check whether the hard register REGNO is live at the exit block
> > > > + * of the current routine.  */
> > > > +bool
> > > > +is_live_reg_at_exit (unsigned int regno)
> > > > +{
> > > > +  edge e;
> > > > +  edge_iterator ei;
> > > > +
> > > > +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> > > > +    {
> > > > +      bitmap live_out = df_get_live_out (e->src);
> > > > +      if (REGNO_REG_SET_P (live_out, regno))
> > > > +     return true;
> > > > +    }
> > > > +
> > > > +  return false;
> > > > +}
> > > > +
> > > > +/* Emit a sequence of insns to zero the call-used-registers for the current
> > > > + * function.  */

No '*' on the continuation line

> > > > +
> > > > +static void
> > > > +gen_call_used_regs_seq (void)
> > > > +{
> > > > +  if (!targetm.calls.pro_epilogue_use)
> > > > +    return;
> > > > +
> > > > +  bool gpr_only = true;
> > > > +  bool used_only = true;
> > > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > > +
> > > > +  if (flag_zero_call_used_regs)
> > > > +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> > > > +     == zero_call_used_regs_unset)
> > > > +      zero_call_used_regs_type = flag_zero_call_used_regs;
> > > > +    else
> > > > +      zero_call_used_regs_type
> > > > +     = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > > +  else
> > > > +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > > +
> > > > +  /* No need to zero call-used-regs when no user request is present.  */
> > > > +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> > > > +    return;
> > > > +
> > > > +  /* No need to zero call-used-regs in main ().  */
> > > > +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> > > > +    return;
> > > > +
> > > > +  /* No need to zero call-used-regs if __builtin_eh_return is called
> > > > +     since it isn't a normal function return.  */
> > > > +  if (crtl->calls_eh_return)
> > > > +    return;
> > > > +
> > > > +  /* If gpr_only is true, only zero call-used-registers that are
> > > > +     general-purpose registers; if used_only is true, only zero
> > > > +     call-used-registers that are used in the current function.  */
> > > > +  switch (zero_call_used_regs_type)
> > > > +    {
> > > > +      case zero_call_used_regs_all_gpr:
> > > > +     used_only = false;
> > > > +     break;
> > > > +      case zero_call_used_regs_used:
> > > > +     gpr_only = false;
> > > > +     break;
> > > > +      case zero_call_used_regs_all:
> > > > +     gpr_only = false;
> > > > +     used_only = false;
> > > > +     break;
> > > > +      default:
> > > > +     break;
> > > > +    }
> > > > +
> > > > +  /* An optimization to use a single hard insn to zero all vector registers on
> > > > +     the target that provides such insn.  */
> > > > +  if (!gpr_only
> > > > +      && targetm.calls.zero_all_vector_registers)
> > > > +    {
> > > > +      rtx zero_all_vec_insn
> > > > +     = targetm.calls.zero_all_vector_registers (used_only);
> > > > +      if (zero_all_vec_insn)
> > > > +     {
> > > > +       emit_insn (zero_all_vec_insn);
> > > > +       gpr_only = true;
> > > > +     }
> > > > +    }
> > > > +
> > > > +  /* For each of the hard registers, check to see whether we should zero it if:
> > > > +     1. it is a call-used-registers;
> > > > + and 2. it is not a fixed-registers;
> > > > + and 3. it is not live at the end of the routine;
> > > > + and 4. it is general purpose register if gpr_only is true;
> > > > + and 5. it is used in the routine if used_only is true;
> > > > +   */
> > > > +
> > > > +  /* This array holds the zero rtx with the correponding machine mode.  */
> > > > +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> > > > +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> > > > +    zero_rtx[i] = NULL_RTX;
> > > > +
> > > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > > +    {
> > > > +      if (!this_target_hard_regs->x_call_used_regs[regno])

Use if (!call_used_regs[regno])

> > > > +     continue;
> > > > +      if (fixed_regs[regno])
> > > > +     continue;
> > > > +      if (is_live_reg_at_exit (regno))
> > > > +     continue;

How can a call-used reg be live at exit?

> > > > +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > > > +     continue;

Why does the target need some extra say here?

> > > > +      if (used_only && !df_regs_ever_live_p (regno))

So I suppose this does not include uses by callees of this function?

> > > > +     continue;
> > > > +
> > > > +      /* Now we can emit insn to zero this register.  */
> > > > +      rtx reg, tmp;
> > > > +
> > > > +      machine_mode mode
> > > > +     = targetm.calls.zero_call_used_regno_mode (regno,
> > > > +                                                reg_raw_mode[regno]);

In what case does the target ever need to adjust this (we're dealing
with hard-regs only?)?

> > > > +      if (mode == VOIDmode)
> > > > +     continue;
> > > > +      if (!have_regs_of_mode[mode])
> > > > +     continue;

When does this happen?

> > > > +
> > > > +      reg = gen_rtx_REG (mode, regno);
> > > > +      if (zero_rtx[(int)mode] == NULL_RTX)
> > > > +     {
> > > > +       zero_rtx[(int)mode] = reg;
> > > > +       tmp = gen_rtx_SET (reg, const0_rtx);
> > > > +       emit_insn (tmp);
> > > > +     }
> > > > +      else
> > > > +     emit_move_insn (reg, zero_rtx[(int)mode]);

Not sure but I think the canonical zero to use is CONST0_RTX (mode)
but I may be wrong.  I'd rather have the target be able to specify
some special instruction for zeroing here.  Some may have
multi-reg set instructions for example.  That said, can't we
defer the actual zeroing to the target in full and only compute
a hard-reg-set of to-be zerored registers here and pass that
to a target hook?

> > > > +
> > > > +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> > > > +    }
> > > > +
> > > > +  return;
> > > > +}
> > > > +
> > > > +
> > > > /* Return a sequence to be used as the epilogue for the current function,
> > > >   or NULL.  */
> > > > 
> > > > @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> > > > 
> > > >  start_sequence ();
> > > >  emit_note (NOTE_INSN_EPILOGUE_BEG);
> > > > +
> > > > +  gen_call_used_regs_seq ();
> > > > +

The caller eventually performs shrink-wrapping - are you sure that
doesn't mess up things?

> > > >  rtx_insn *seq = targetm.gen_epilogue ();
> > > >  if (seq)
> > > >    emit_jump_insn (seq);
> > > > diff --git a/gcc/function.h b/gcc/function.h
> > > > index d55cbdd..fc36c3e 100644
> > > > --- a/gcc/function.h
> > > > +++ b/gcc/function.h
> > > > @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
> > > > 
> > > > extern void used_types_insert (tree);
> > > > 
> > > > +extern bool is_live_reg_at_exit (unsigned int);
> > > > +
> > > > #endif  /* GCC_FUNCTION_H */
> > > > diff --git a/gcc/target.def b/gcc/target.def
> > > > index 07059a8..8aab63e 100644
> > > > --- a/gcc/target.def
> > > > +++ b/gcc/target.def
> > > > @@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
> > > > default_function_value_regno_p)
> > > > 
> > > > DEFHOOK
> > > > +(zero_call_used_regno_p,
> > > > + "A target hook that returns @code{true} if @var{regno} is the number of a\n\
> > > > +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> > > > +@var{regno} must be the number of a hard general register.\n\
> > > > +\n\
> > > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.",
> > > > + bool, (const unsigned int regno, bool general_reg_only_p),
> > > > + default_zero_call_used_regno_p)
> > > > +
> > > > +DEFHOOK
> > > > +(zero_call_used_regno_mode,
> > > > + "A target hook that returns a mode of suitable to zero the register for the\n\
> > > > +call used register @var{regno} in @var{mode}.\n\
> > > > +\n\
> > > > +If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
> > > > +used.",
> > > > + machine_mode, (const unsigned int regno, machine_mode mode),
> > > > + default_zero_call_used_regno_mode)
> > > > +
> > > > +DEFHOOK
> > > > (fntype_abi,
> > > > "Return the ABI used by a function with type @var{type}; see the\n\
> > > > definition of @code{predefined_function_abi} for details of the ABI\n\
> > > > @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> > > > is needed.",
> > > > rtx, (void), NULL)
> > > > 
> > > > +DEFHOOK
> > > > +(pro_epilogue_use,
> > > > + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
> > > > +prevent deleting register setting instructions in proprologue and epilogue.",
> > > > + rtx, (rtx reg), NULL)
> > > > +
> > > > +DEFHOOK
> > > > +(zero_all_vector_registers,
> > > > + "This hook should return an rtx to zero all vector registers at function\n\
> > > > +exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
> > > > +be zeroed.  Return @code{NULL} if possible",
> > > > + rtx, (bool used_only), NULL)
> > > > +
> > > > /* Return true if all function parameters should be spilled to the
> > > >   stack.  */
> > > > DEFHOOK
> > > > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > > > index 0113c7b..ed02173 100644
> > > > --- a/gcc/targhooks.c
> > > > +++ b/gcc/targhooks.c
> > > > @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> > > > #endif
> > > > }
> > > > 
> > > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > > +
> > > > +bool
> > > > +default_zero_call_used_regno_p (const unsigned int,
> > > > +                             bool)
> > > > +{
> > > > +  return false;
> > > > +}
> > > > +
> > > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > > +
> > > > +machine_mode
> > > > +default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
> > > > +{
> > > > +  return mode;
> > > > +}
> > > > +
> > > > rtx
> > > > default_internal_arg_pointer (void)
> > > > {
> > > > diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> > > > index b572a36..370df19 100644
> > > > --- a/gcc/targhooks.h
> > > > +++ b/gcc/targhooks.h
> > > > @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> > > > extern rtx default_function_value (const_tree, const_tree, bool);
> > > > extern rtx default_libcall_value (machine_mode, const_rtx);
> > > > extern bool default_function_value_regno_p (const unsigned int);
> > > > +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> > > > +extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
> > > > +                                                    machine_mode);
> > > > extern rtx default_internal_arg_pointer (void);
> > > > extern rtx default_static_chain (const_tree, bool);
> > > > extern void default_trampoline_init (rtx, tree, rtx);
> > > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > new file mode 100644
> > > > index 0000000..3c2ac72
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > @@ -0,0 +1,3 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > +/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > new file mode 100644
> > > > index 0000000..acf48c4
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > @@ -0,0 +1,4 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2" } */
> > > > +
> > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > new file mode 100644
> > > > index 0000000..9f61dc4
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > @@ -0,0 +1,12 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > new file mode 100644
> > > > index 0000000..09048e5
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > @@ -0,0 +1,21 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > +
> > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> > > > +
> > > > +int
> > > > +foo (int x)
> > > > +{
> > > > +  return x;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > new file mode 100644
> > > > index 0000000..4862688
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > @@ -0,0 +1,39 @@
> > > > +/* { dg-do run { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > > +
> > > > +struct S { int i; };
> > > > +__attribute__((const, noinline, noclone))
> > > > +struct S foo (int x)
> > > > +{
> > > > +  struct S s;
> > > > +  s.i = x;
> > > > +  return s;
> > > > +}
> > > > +
> > > > +int a[2048], b[2048], c[2048], d[2048];
> > > > +struct S e[2048];
> > > > +
> > > > +__attribute__((noinline, noclone)) void
> > > > +bar (void)
> > > > +{
> > > > +  int i;
> > > > +  for (i = 0; i < 1024; i++)
> > > > +    {
> > > > +      e[i] = foo (i);
> > > > +      a[i+2] = a[i] + a[i+1];
> > > > +      b[10] = b[10] + i;
> > > > +      c[i] = c[2047 - i];
> > > > +      d[i] = d[i + 1];
> > > > +    }
> > > > +}
> > > > +
> > > > +int
> > > > +main ()
> > > > +{
> > > > +  int i;
> > > > +  bar ();
> > > > +  for (i = 0; i < 1024; i++)
> > > > +    if (e[i].i != i)
> > > > +      __builtin_abort ();
> > > > +  return 0;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > new file mode 100644
> > > > index 0000000..500251b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > @@ -0,0 +1,39 @@
> > > > +/* { dg-do run { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > +
> > > > +struct S { int i; };
> > > > +__attribute__((const, noinline, noclone))
> > > > +struct S foo (int x)
> > > > +{
> > > > +  struct S s;
> > > > +  s.i = x;
> > > > +  return s;
> > > > +}
> > > > +
> > > > +int a[2048], b[2048], c[2048], d[2048];
> > > > +struct S e[2048];
> > > > +
> > > > +__attribute__((noinline, noclone)) void
> > > > +bar (void)
> > > > +{
> > > > +  int i;
> > > > +  for (i = 0; i < 1024; i++)
> > > > +    {
> > > > +      e[i] = foo (i);
> > > > +      a[i+2] = a[i] + a[i+1];
> > > > +      b[10] = b[10] + i;
> > > > +      c[i] = c[2047 - i];
> > > > +      d[i] = d[i + 1];
> > > > +    }
> > > > +}
> > > > +
> > > > +int
> > > > +main ()
> > > > +{
> > > > +  int i;
> > > > +  bar ();
> > > > +  for (i = 0; i < 1024; i++)
> > > > +    if (e[i].i != i)
> > > > +      __builtin_abort ();
> > > > +  return 0;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > new file mode 100644
> > > > index 0000000..8b058e3
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > @@ -0,0 +1,21 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > new file mode 100644
> > > > index 0000000..d4eaaf7
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > new file mode 100644
> > > > index 0000000..dd3bb90
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > +
> > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > new file mode 100644
> > > > index 0000000..e2274f6
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> > > > +
> > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > new file mode 100644
> > > > index 0000000..7f5d153
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > @@ -0,0 +1,13 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > +
> > > > +int
> > > > +foo (int x)
> > > > +{
> > > > +  return x;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > new file mode 100644
> > > > index 0000000..fe13d2b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > @@ -0,0 +1,13 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > > +
> > > > +float
> > > > +foo (float z, float y, float x)
> > > > +{
> > > > +  return x + y;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > new file mode 100644
> > > > index 0000000..205a532
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > @@ -0,0 +1,12 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > > +
> > > > +float
> > > > +foo (float z, float y, float x)
> > > > +{
> > > > +  return x;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > new file mode 100644
> > > > index 0000000..e046684
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > new file mode 100644
> > > > index 0000000..4be8ff6
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > @@ -0,0 +1,23 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > > +
> > > > +float
> > > > +foo (float z, float y, float x)
> > > > +{
> > > > +  return x + y;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > new file mode 100644
> > > > index 0000000..0eb34e0
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> > > > +
> > > > +__attribute__ ((zero_call_used_regs("used")))
> > > > +float
> > > > +foo (float z, float y, float x)
> > > > +{
> > > > +  return x + y;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > new file mode 100644
> > > > index 0000000..cbb63a4
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > new file mode 100644
> > > > index 0000000..7573197
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > new file mode 100644
> > > > index 0000000..de71223
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > @@ -0,0 +1,12 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > new file mode 100644
> > > > index 0000000..ccfa441
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > +
> > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > new file mode 100644
> > > > index 0000000..6b46ca3
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > @@ -0,0 +1,20 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > +
> > > > +__attribute__ ((zero_call_used_regs("all-gpr")))
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > new file mode 100644
> > > > index 0000000..0680f38
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > +
> > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > new file mode 100644
> > > > index 0000000..534defa
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > @@ -0,0 +1,13 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > > +
> > > > +int
> > > > +foo (int x)
> > > > +{
> > > > +  return x;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > new file mode 100644
> > > > index 0000000..477bb19
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > +
> > > > +int
> > > > +foo (int x)
> > > > +{
> > > > +  return x;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > new file mode 100644
> > > > index 0000000..a305a60
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > @@ -0,0 +1,15 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > +
> > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > > +
> > > > +int
> > > > +foo (int x)
> > > > +{
> > > > +  return x;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/toplev.c b/gcc/toplev.c
> > > > index 95eea63..01a1f24 100644
> > > > --- a/gcc/toplev.c
> > > > +++ b/gcc/toplev.c
> > > > @@ -1464,6 +1464,15 @@ process_options (void)
> > > >       }
> > > >    }
> > > > 
> > > > +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> > > > +      && !targetm.calls.pro_epilogue_use)
> > > > +    {
> > > > +      error_at (UNKNOWN_LOCATION,
> > > > +             "%<-fzero-call-used-regs=%> is not supported for this "
> > > > +             "target");
> > > > +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> > > > +    }
> > > > +
> > > >  /* One region RA really helps to decrease the code size.  */
> > > >  if (flag_ira_region == IRA_REGION_AUTODETECT)
> > > >    flag_ira_region
> > > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > > > index 8c5a2e3..71badbd 100644
> > > > --- a/gcc/tree-core.h
> > > > +++ b/gcc/tree-core.h
> > > > @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
> > > > unsigned final : 1;
> > > > /* Belong to FUNCTION_DECL exclusively.  */
> > > > unsigned regdecl_flag : 1;
> > > > - /* 14 unused bits. */
> > > > +
> > > > + /* How to clear call-used registers upon function return.  */
> > > > + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> > > > +
> > > > + /* 11 unused bits.  */

So instead of wasting "precious" bits please use lookup_attribute
in the single place you query this value (which is once per function).
There's no need to complicate matters by trying to maintain the above.

> > > > };
> > > > 
> > > > struct GTY(()) tree_var_decl {
> > > > diff --git a/gcc/tree.h b/gcc/tree.h
> > > > index cf546ed..d378a88 100644
> > > > --- a/gcc/tree.h
> > > > +++ b/gcc/tree.h
> > > > @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> > > > #define DECL_VISIBILITY(NODE) \
> > > >  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
> > > > 
> > > > +/* Value of the function decl's type of zeroing the call used
> > > > +   registers upon return from function.  */
> > > > +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> > > > +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> > > > +
> > > > /* Nonzero means that the decl (or an enclosing scope) had its
> > > >   visibility specified rather than being inferred.  */
> > > > #define DECL_VISIBILITY_SPECIFIED(NODE) \
> > > > -- 
> > > > 1.9.1
> > >
> > 
> 
>
H.J. Lu Aug. 4, 2020, 6:23 p.m. UTC | #6
On Tue, Aug 4, 2020 at 12:35 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 3 Aug 2020, Qing Zhao wrote:
>
> > Hi, Uros,
> >
> > Thanks a lot for your review on X86 parts.
> >
> > Hi, Richard,
> >
> > Could you please take a look at the middle-end part to see whether the
> > rewritten addressed your previous concern?
>
> I have a few comments below - I'm not sure I'm qualified to fully
> review the rest though.
>
> > Thanks a lot.
> >
> > Qing
> >
> >
> > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > >
> > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > > >
> > > >
> > > > Richard and Uros,
> > > >
> > > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > > >
> > > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.
> > > >
> > > > Thanks a lot for your time.
> > >
> > > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > >
> > > That said, x86 parts looks OK.
> > >
> > >
> >
> > > Uros.
> > > > Qing
> > > >
> > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > >
> > > > > Hi, Gcc team,
> > > > >
> > > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > >
> > > > > From the previous round of discussion, the major issues raised were:
> > > > >
> > > > > A. should be rewritten by using regsets infrastructure.
> > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > >
> > > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > >
> > > > > 1. Change the names of the option and attribute from
> > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > to:
> > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > Add the new option and  new attribute in general.
> > > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > > 3. Add 4 target-hooks;
> > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > >
> > > > > The patch is as following:
> > > > >
> > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > command-line option and
> > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > >
> > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > >
> > > > >  Don't zero call-used registers upon function return.
>
> Does a return via EH unwinding also constitute a function return?  I
> think you may want to have a finally handler or support in the unwinder
> for this?  Then there's abnormal return via longjmp & friends, I guess
> there's nothing that can be done there besides patching glibc?

Abnormal returns, like EH unwinding and longjmp, aren't covered by this
patch. Only normal returns are covered.

> In general I am missing reasoning as why to use -fzero-call-used-regs=
> in the documentation, that is, what is the thread model and what are
> the guarantees?  Is there any point zeroing registers when spill slots
> are left populated with stale register contents?  How do I (and why
> would I want to?) ensure that there's no information leak from the
> implementation of 'foo' to their callers?  Do I need to compile all
> of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> or is it enough to annotate API boundaries I want to proptect with
> zero_call_used_regs("...")?
>
> Again - what's the intended use (and how does it fulful anything useful
> for that case)?
>
> > > > >  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> > > > >
> > > > >  Zero used call-used general purpose registers upon function return.
> > > > >
> > > > >  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> > > > >
> > > > >  Zero all call-used general purpose registers upon function return.
> > > > >
> > > > >  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> > > > >
> > > > >  Zero used call-used registers upon function return.
> > > > >
> > > > >  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> > > > >
> > > > >  Zero all call-used registers upon function return.
> > > > >
> > > > > The feature is implemented in middle-end. But currently is only valid on X86.
> > > > >
> > > > > Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> > > > > -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> > > > > -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> > > > > by default on x86-64.
> > > > >
> > > > > Please take a look and let me know any more comment?
> > > > >
> > > > > thanks.
> > > > >
> > > > > Qing
> > > > >
> > > > >
> > > > > ====================================
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > >
> > > > >       * common.opt: Add new option -fzero-call-used-regs.
> > > > >       * config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
> > > > >       (ix86_zero_call_used_regno_mode): Likewise.
> > > > >       (ix86_zero_all_vector_registers): Likewise.
> > > > >       (ix86_expand_prologue): Replace gen_prologue_use with
> > > > >       gen_pro_epilogue_use.
> > > > >       (TARGET_ZERO_CALL_USED_REGNO_P): Define.
> > > > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
> > > > >       (TARGET_PRO_EPILOGUE_USE): Define.
> > > > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
> > > > >       * config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
> > > > >       with UNSPECV_PRO_EPILOGUE_USE.
> > > > >       * coretypes.h (enum zero_call_used_regs): New type.
> > > > >       * doc/extend.texi: Document the new zero_call_used_regs attribute.
> > > > >       * doc/invoke.texi: Document the new -fzero-call-used-regs option.
> > > > >       * doc/tm.texi: Regenerate.
> > > > >       * doc/tm.texi.in <http://tm.texi.in/> (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
> > > > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
> > > > >       (TARGET_PRO_EPILOGUE_USE): Likewise.
> > > > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
> > > > >       * function.c (is_live_reg_at_exit): New function.
> > > > >       (gen_call_used_regs_seq): Likewise.
> > > > >       (make_epilogue_seq): Call gen_call_used_regs_seq.
> > > > >       * function.h (is_live_reg_at_exit): Declare.
> > > > >       * target.def (zero_call_used_regno_p): New hook.
> > > > >       (zero_call_used_regno_mode): Likewise.
> > > > >       (pro_epilogue_use): Likewise.
> > > > >       (zero_all_vector_registers): Likewise.
> > > > >       * targhooks.c (default_zero_call_used_regno_p): New function.
> > > > >       (default_zero_call_used_regno_mode): Likewise.
> > > > >       * targhooks.h (default_zero_call_used_regno_p): Declare.
> > > > >       (default_zero_call_used_regno_mode): Declare.
> > > > >       * toplev.c (process_options): Issue errors when -fzero-call-used-regs
> > > > >       is used on targets that do not support it.
> > > > >       * tree-core.h (struct tree_decl_with_vis): New field
> > > > >       zero_call_used_regs_type.
> > > > >       * tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
> > > > >
> > > > > gcc/c-family/ChangeLog:
> > > > >
> > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > >
> > > > >       * c-attribs.c (c_common_attribute_table): Add new attribute
> > > > >       zero_call_used_regs.
> > > > >       (handle_zero_call_used_regs_attribute): New function.
> > > > >
> > > > > gcc/c/ChangeLog:
> > > > >
> > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > >
> > > > >       * c-decl.c (merge_decls): Merge zero_call_used_regs_type.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > >
> > > > >       * c-c++-common/zero-scratch-regs-1.c: New test.
> > > > >       * c-c++-common/zero-scratch-regs-2.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-1.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> > > > >
> > > > > ---
> > > > > gcc/c-family/c-attribs.c                           |  68 ++++++++++
> > > > > gcc/c/c-decl.c                                     |   4 +
> > > > > gcc/common.opt                                     |  23 ++++
> > > > > gcc/config/i386/i386.c                             |  58 ++++++++-
> > > > > gcc/config/i386/i386.md                            |   6 +-
> > > > > gcc/coretypes.h                                    |  10 ++
> > > > > gcc/doc/extend.texi                                |  11 ++
> > > > > gcc/doc/invoke.texi                                |  13 +-
> > > > > gcc/doc/tm.texi                                    |  27 ++++
> > > > > gcc/doc/tm.texi.in <http://tm.texi.in/>                                 |   8 ++
> > > > > gcc/function.c                                     | 145 +++++++++++++++++++++
> > > > > gcc/function.h                                     |   2 +
> > > > > gcc/target.def                                     |  33 +++++
> > > > > gcc/targhooks.c                                    |  17 +++
> > > > > gcc/targhooks.h                                    |   3 +
> > > > > gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> > > > > gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> > > > > .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> > > > > .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> > > > > .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> > > > > .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> > > > > gcc/toplev.c                                       |   9 ++
> > > > > gcc/tree-core.h                                    |   6 +-
> > > > > gcc/tree.h                                         |   5 +
> > > > > 43 files changed, 866 insertions(+), 7 deletions(-)
> > > > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > >
> > > > > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > > > > index 3721483..cc93d6f 100644
> > > > > --- a/gcc/c-family/c-attribs.c
> > > > > +++ b/gcc/c-family/c-attribs.c
> > > > > @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> > > > > static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> > > > > static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > > > > static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> > > > > +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> > > > > +                                              bool *);
> > > > > static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> > > > > static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> > > > > static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> > > > > @@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
> > > > >                             ignore_attribute, NULL },
> > > > >  { "no_split_stack",        0, 0, true,  false, false, false,
> > > > >                             handle_no_split_stack_attribute, NULL },
> > > > > +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> > > > > +                           handle_zero_call_used_regs_attribute, NULL },
> > > > > +
> > > > >  /* For internal use (marking of builtins and runtime functions) only.
> > > > >     The name contains space to prevent its usage in source code.  */
> > > > >  { "fn spec",               1, 1, false, true, true, false,
> > > > > @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
> > > > >  return NULL_TREE;
> > > > > }
> > > > >
> > > > > +/* Handle a "zero_call_used_regs" attribute; arguments as in
> > > > > +   struct attribute_spec.handler.  */
> > > > > +
> > > > > +static tree
> > > > > +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> > > > > +                                   int ARG_UNUSED (flags),
> > > > > +                                   bool *no_add_attris)
> > > > > +{
> > > > > +  tree decl = *node;
> > > > > +  tree id = TREE_VALUE (args);
> > > > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > > > +
> > > > > +  if (TREE_CODE (decl) != FUNCTION_DECL)
> > > > > +    {
> > > > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > > > +             "%qE attribute applies only to functions", name);
> > > > > +      *no_add_attris = true;
> > > > > +      return NULL_TREE;
> > > > > +    }
> > > > > +  else if (DECL_INITIAL (decl))
> > > > > +    {
> > > > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > > > +             "cannot set %qE attribute after definition", name);
>
> Why's that?
>
> > > > > +      *no_add_attris = true;
> > > > > +      return NULL_TREE;
> > > > > +    }
> > > > > +
> > > > > +  if (TREE_CODE (id) != STRING_CST)
> > > > > +    {
> > > > > +      error ("attribute %qE arguments not a string", name);
> > > > > +      *no_add_attris = true;
> > > > > +      return NULL_TREE;
> > > > > +    }
> > > > > +
> > > > > +  if (!targetm.calls.pro_epilogue_use)
> > > > > +    {
> > > > > +      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
> > > > > +      return NULL_TREE;
> > > > > +    }
> > > > > +
> > > > > +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> > > > > +    zero_call_used_regs_type = zero_call_used_regs_skip;
> > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> > > > > +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> > > > > +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> > > > > +    zero_call_used_regs_type = zero_call_used_regs_used;
> > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> > > > > +    zero_call_used_regs_type = zero_call_used_regs_all;
> > > > > +  else
> > > > > +    {
> > > > > +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
> > > > > +          name, "skip", "used-gpr", "all-gpr", "used", "all");
> > > > > +      *no_add_attris = true;
> > > > > +      return NULL_TREE;
> > > > > +    }
> > > > > +
> > > > > +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> > > > > +
> > > > > +  return NULL_TREE;
> > > > > +}
> > > > > +
> > > > > /* Handle a "returns_nonnull" attribute; arguments as in
> > > > >   struct attribute_spec.handler.  */
> > > > >
> > > > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > > > > index 81bd2ee..ded1880 100644
> > > > > --- a/gcc/c/c-decl.c
> > > > > +++ b/gcc/c/c-decl.c
> > > > > @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
> > > > >         DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> > > > >       }
> > > > >
> > > > > +      /* Merge the zero_call_used_regs_type information.  */
> > > > > +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> > > > > +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> > > > > +
>
> If you need this (see below) then likely cp/* needs similar adjustment
> so do other places in the middle-end (function cloning, etc)
>
> > > > >      /* Merge the storage class information.  */
> > > > >      merge_weak (newdecl, olddecl);
> > > > >
> > > > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > > > index df8af36..19900f9 100644
> > > > > --- a/gcc/common.opt
> > > > > +++ b/gcc/common.opt
> > > > > @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> > > > > Common Report Var(flag_zero_initialized_in_bss) Init(1)
> > > > > Put zero initialized data in the bss section.
> > > > >
> > > > > +fzero-call-used-regs=
> > > > > +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> > > > > +Clear call-used registers upon function return.
> > > > > +
> > > > > +Enum
> > > > > +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> > > > > +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> > > > > +
> > > > > +EnumValue
> > > > > +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> > > > > +
> > > > > +EnumValue
> > > > > +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> > > > > +
> > > > > +EnumValue
> > > > > +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> > > > > +
> > > > > +EnumValue
> > > > > +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> > > > > +
> > > > > +EnumValue
> > > > > +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> > > > > +
> > > > > g
> > > > > Common Driver RejectNegative JoinedOrMissing
> > > > > Generate debug information in default format.
> > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > index 5c373c0..fd1aa9c 100644
> > > > > --- a/gcc/config/i386/i386.c
> > > > > +++ b/gcc/config/i386/i386.c
> > > > > @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
> > > > >  return false;
> > > > > }
> > > > >
> > > > > +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > > > +
> > > > > +static bool
> > > > > +ix86_zero_call_used_regno_p (const unsigned int regno,
> > > > > +                          bool gpr_only)
> > > > > +{
> > > > > +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> > > > > +}
> > > > > +
> > > > > +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > > > +
> > > > > +static machine_mode
> > > > > +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> > > > > +{
> > > > > +  /* NB: We only need to zero the lower 32 bits for integer registers
> > > > > +     and the lower 128 bits for vector registers since destination are
> > > > > +     zero-extended to the full register width.  */
> > > > > +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> > > > > +}
> > > > > +
> > > > > +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> > > > > +
> > > > > +static rtx
> > > > > +ix86_zero_all_vector_registers (bool used_only)
> > > > > +{
> > > > > +  if (!TARGET_AVX)
> > > > > +    return NULL;
> > > > > +
> > > > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > > > +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> > > > > +      || (TARGET_64BIT
> > > > > +          && (REX_SSE_REGNO_P (regno)
> > > > > +              || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> > > > > +     && (!this_target_hard_regs->x_call_used_regs[regno]
> > > > > +         || fixed_regs[regno]
> > > > > +         || is_live_reg_at_exit (regno)
> > > > > +         || (used_only && !df_regs_ever_live_p (regno))))
> > > > > +      return NULL;
> > > > > +
> > > > > +  return gen_avx_vzeroall ();
> > > > > +}
> > > > > +
> > > > > /* Define how to find the value returned by a function.
> > > > >   VALTYPE is the data type of the value (as a tree).
> > > > >   If the precise function being called is known, FUNC is its FUNCTION_DECL;
> > > > > @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
> > > > >      insn = emit_insn (gen_set_got (pic));
> > > > >      RTX_FRAME_RELATED_P (insn) = 1;
> > > > >      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> > > > > -      emit_insn (gen_prologue_use (pic));
> > > > > +      emit_insn (gen_pro_epilogue_use (pic));
> > > > >      /* Deleting already emmitted SET_GOT if exist and allocated to
> > > > >        REAL_PIC_OFFSET_TABLE_REGNUM.  */
> > > > >      ix86_elim_entry_set_got (pic);
> > > > > @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
> > > > >     Further, prevent alloca modifications to the stack pointer from being
> > > > >     combined with prologue modifications.  */
> > > > >  if (TARGET_SEH)
> > > > > -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> > > > > +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> > > > > }
> > > > >
> > > > > /* Emit code to restore REG using a POP insn.  */
> > > > > @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> > > > > #undef TARGET_FUNCTION_VALUE_REGNO_P
> > > > > #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> > > > >
> > > > > +#undef TARGET_ZERO_CALL_USED_REGNO_P
> > > > > +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> > > > > +
> > > > > +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> > > > > +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> > > > > +
> > > > > +#undef TARGET_PRO_EPILOGUE_USE
> > > > > +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> > > > > +
> > > > > +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > > > +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> > > > > +
> > > > > #undef TARGET_PROMOTE_FUNCTION_MODE
> > > > > #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> > > > >
> > > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > > > index d0ecd9e..e7df59f 100644
> > > > > --- a/gcc/config/i386/i386.md
> > > > > +++ b/gcc/config/i386/i386.md
> > > > > @@ -194,7 +194,7 @@
> > > > >  UNSPECV_STACK_PROBE
> > > > >  UNSPECV_PROBE_STACK_RANGE
> > > > >  UNSPECV_ALIGN
> > > > > -  UNSPECV_PROLOGUE_USE
> > > > > +  UNSPECV_PRO_EPILOGUE_USE
> > > > >  UNSPECV_SPLIT_STACK_RETURN
> > > > >  UNSPECV_CLD
> > > > >  UNSPECV_NOPS
> > > > > @@ -13525,8 +13525,8 @@
> > > > >
> > > > > ;; As USE insns aren't meaningful after reload, this is used instead
> > > > > ;; to prevent deleting instructions setting registers for PIC code
> > > > > -(define_insn "prologue_use"
> > > > > -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> > > > > +(define_insn "pro_epilogue_use"
> > > > > +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
> > > > >  ""
> > > > >  ""
> > > > >  [(set_attr "length" "0")])
> > > > > diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> > > > > index 6b6cfcd..e56d6ec 100644
> > > > > --- a/gcc/coretypes.h
> > > > > +++ b/gcc/coretypes.h
> > > > > @@ -418,6 +418,16 @@ enum symbol_visibility
> > > > >  VISIBILITY_INTERNAL
> > > > > };
> > > > >
> > > > > +/* Zero call-used registers type.  */
> > > > > +enum zero_call_used_regs {
> > > > > +  zero_call_used_regs_unset = 0,
> > > > > +  zero_call_used_regs_skip,
> > > > > +  zero_call_used_regs_used_gpr,
> > > > > +  zero_call_used_regs_all_gpr,
> > > > > +  zero_call_used_regs_used,
> > > > > +  zero_call_used_regs_all
> > > > > +};
> > > > > +
> > > > > /* enums used by the targetm.excess_precision hook.  */
> > > > >
> > > > > enum flt_eval_method
> > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > > > > index c800b74..b32c55f 100644
> > > > > --- a/gcc/doc/extend.texi
> > > > > +++ b/gcc/doc/extend.texi
> > > > > @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> > > > > A declaration to which @code{weakref} is attached and that is associated
> > > > > with a named @code{target} must be @code{static}.
> > > > >
> > > > > +@item zero_call_used_regs ("@var{choice}")
> > > > > +@cindex @code{zero_call_used_regs} function attribute
> > > > > +The @code{zero_call_used_regs} attribute causes the compiler to zero
> > > > > +call-used registers at function return according to @var{choice}.
> > > > > +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> > > > > +call-used general purpose registers which are used in funciton.
> > > > > +@samp{all-gpr} zeros all call-used general purpose registers.
> > > > > +@samp{used} zeros call-used registers which are used in function.
> > > > > +@samp{all} zeros all call-used registers.  The default for the
> > > > > +attribute is controlled by @option{-fzero-call-used-regs}.
> > > > > +
> > > > > @end table
> > > > >
> > > > > @c This is the end of the target-independent attribute table
> > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > > index 09bcc5b..da02686 100644
> > > > > --- a/gcc/doc/invoke.texi
> > > > > +++ b/gcc/doc/invoke.texi
> > > > > @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> > > > > -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> > > > > -funsafe-math-optimizations  -funswitch-loops @gol
> > > > > -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> > > > > --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> > > > > +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> > > > > --param @var{name}=@var{value}
> > > > > -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> > > > >
> > > > > @@ -12273,6 +12273,17 @@ int foo (void)
> > > > >
> > > > > Not all targets support this option.
> > > > >
> > > > > +@item -fzero-call-used-regs=@var{choice}
> > > > > +@opindex fzero-call-used-regs
> > > > > +Zero call-used registers at function return according to
> > > > > +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> > > > > +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> > > > > +registers which are used in function.  @samp{all-gpr} zeros all
> > > > > +call-used registers.  @samp{used} zeros call-used registers which
> > > > > +are used in function.  @samp{all} zeros all call-used registers.  You
> > > > > +can control this behavior for a specific function by using the function
> > > > > +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> > > > > +
> > > > > @item --param @var{name}=@var{value}
> > > > > @opindex param
> > > > > In some places, GCC uses various constants to control the amount of
> > > > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > > > index 6e7d9dc..43dddd3 100644
> > > > > --- a/gcc/doc/tm.texi
> > > > > +++ b/gcc/doc/tm.texi
> > > > > @@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
> > > > > If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> > > > > @end deftypefn
> > > > >
> > > > > +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
> > > > > +A target hook that returns @code{true} if @var{regno} is the number of a
> > > > > +call used register.  If @var{general_reg_only_p} is @code{true},
> > > > > +@var{regno} must be the number of a hard general register.
> > > > > +
> > > > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.
> > > > > +@end deftypefn
> > > > > +
> > > > > +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
> > > > > +A target hook that returns a mode of suitable to zero the register for the
> > > > > +call used register @var{regno} in @var{mode}.
> > > > > +
> > > > > +If this hook is not defined, then default_zero_call_used_regno_mode will be
> > > > > +used.
> > > > > +@end deftypefn
> > > > > +
> > > > > @defmac APPLY_RESULT_SIZE
> > > > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > > > @@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> > > > > is needed.
> > > > > @end deftypefn
> > > > >
> > > > > +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> > > > > +This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
> > > > > +prevent deleting register setting instructions in proprologue and epilogue.
> > > > > +@end deftypefn
> > > > > +
> > > > > +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
> > > > > +This hook should return an rtx to zero all vector registers at function
> > > > > +exit.  If @var{used_only} is @code{true}, only used vector registers should
> > > > > +be zeroed.  Return @code{NULL} if possible
> > > > > +@end deftypefn
> > > > > +
> > > > > @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> > > > > When optimization is disabled, this hook indicates whether or not
> > > > > arguments should be allocated to stack slots.  Normally, GCC allocates
> > > > > diff --git a/gcc/doc/tm.texi.in <http://tm.texi.in/> b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > > index 3be984b..bee917a 100644
> > > > > --- a/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > > +++ b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > > @@ -3430,6 +3430,10 @@ for a new target instead.
> > > > >
> > > > > @hook TARGET_FUNCTION_VALUE_REGNO_P
> > > > >
> > > > > +@hook TARGET_ZERO_CALL_USED_REGNO_P
> > > > > +
> > > > > +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> > > > > +
> > > > > @defmac APPLY_RESULT_SIZE
> > > > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > > > @@ -8109,6 +8113,10 @@ and the associated definitions of those functions.
> > > > >
> > > > > @hook TARGET_GET_DRAP_RTX
> > > > >
> > > > > +@hook TARGET_PRO_EPILOGUE_USE
> > > > > +
> > > > > +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > > > +
> > > > > @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> > > > >
> > > > > @hook TARGET_CONST_ANCHOR
> > > > > diff --git a/gcc/function.c b/gcc/function.c
> > > > > index 9eee9b5..9908530 100644
> > > > > --- a/gcc/function.c
> > > > > +++ b/gcc/function.c
> > > > > @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> > > > > #include "emit-rtl.h"
> > > > > #include "recog.h"
> > > > > #include "rtl-error.h"
> > > > > +#include "hard-reg-set.h"
> > > > > #include "alias.h"
> > > > > #include "fold-const.h"
> > > > > #include "stor-layout.h"
> > > > > @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
> > > > >  return seq;
> > > > > }
> > > > >
> > > > > +/* Check whether the hard register REGNO is live at the exit block
> > > > > + * of the current routine.  */
> > > > > +bool
> > > > > +is_live_reg_at_exit (unsigned int regno)
> > > > > +{
> > > > > +  edge e;
> > > > > +  edge_iterator ei;
> > > > > +
> > > > > +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> > > > > +    {
> > > > > +      bitmap live_out = df_get_live_out (e->src);
> > > > > +      if (REGNO_REG_SET_P (live_out, regno))
> > > > > +     return true;
> > > > > +    }
> > > > > +
> > > > > +  return false;
> > > > > +}
> > > > > +
> > > > > +/* Emit a sequence of insns to zero the call-used-registers for the current
> > > > > + * function.  */
>
> No '*' on the continuation line
>
> > > > > +
> > > > > +static void
> > > > > +gen_call_used_regs_seq (void)
> > > > > +{
> > > > > +  if (!targetm.calls.pro_epilogue_use)
> > > > > +    return;
> > > > > +
> > > > > +  bool gpr_only = true;
> > > > > +  bool used_only = true;
> > > > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > > > +
> > > > > +  if (flag_zero_call_used_regs)
> > > > > +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> > > > > +     == zero_call_used_regs_unset)
> > > > > +      zero_call_used_regs_type = flag_zero_call_used_regs;
> > > > > +    else
> > > > > +      zero_call_used_regs_type
> > > > > +     = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > > > +  else
> > > > > +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > > > +
> > > > > +  /* No need to zero call-used-regs when no user request is present.  */
> > > > > +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> > > > > +    return;
> > > > > +
> > > > > +  /* No need to zero call-used-regs in main ().  */
> > > > > +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> > > > > +    return;
> > > > > +
> > > > > +  /* No need to zero call-used-regs if __builtin_eh_return is called
> > > > > +     since it isn't a normal function return.  */
> > > > > +  if (crtl->calls_eh_return)
> > > > > +    return;
> > > > > +
> > > > > +  /* If gpr_only is true, only zero call-used-registers that are
> > > > > +     general-purpose registers; if used_only is true, only zero
> > > > > +     call-used-registers that are used in the current function.  */
> > > > > +  switch (zero_call_used_regs_type)
> > > > > +    {
> > > > > +      case zero_call_used_regs_all_gpr:
> > > > > +     used_only = false;
> > > > > +     break;
> > > > > +      case zero_call_used_regs_used:
> > > > > +     gpr_only = false;
> > > > > +     break;
> > > > > +      case zero_call_used_regs_all:
> > > > > +     gpr_only = false;
> > > > > +     used_only = false;
> > > > > +     break;
> > > > > +      default:
> > > > > +     break;
> > > > > +    }
> > > > > +
> > > > > +  /* An optimization to use a single hard insn to zero all vector registers on
> > > > > +     the target that provides such insn.  */
> > > > > +  if (!gpr_only
> > > > > +      && targetm.calls.zero_all_vector_registers)
> > > > > +    {
> > > > > +      rtx zero_all_vec_insn
> > > > > +     = targetm.calls.zero_all_vector_registers (used_only);
> > > > > +      if (zero_all_vec_insn)
> > > > > +     {
> > > > > +       emit_insn (zero_all_vec_insn);
> > > > > +       gpr_only = true;
> > > > > +     }
> > > > > +    }
> > > > > +
> > > > > +  /* For each of the hard registers, check to see whether we should zero it if:
> > > > > +     1. it is a call-used-registers;
> > > > > + and 2. it is not a fixed-registers;
> > > > > + and 3. it is not live at the end of the routine;
> > > > > + and 4. it is general purpose register if gpr_only is true;
> > > > > + and 5. it is used in the routine if used_only is true;
> > > > > +   */
> > > > > +
> > > > > +  /* This array holds the zero rtx with the correponding machine mode.  */
> > > > > +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> > > > > +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> > > > > +    zero_rtx[i] = NULL_RTX;
> > > > > +
> > > > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > > > +    {
> > > > > +      if (!this_target_hard_regs->x_call_used_regs[regno])
>
> Use if (!call_used_regs[regno])
>
> > > > > +     continue;
> > > > > +      if (fixed_regs[regno])
> > > > > +     continue;
> > > > > +      if (is_live_reg_at_exit (regno))
> > > > > +     continue;
>
> How can a call-used reg be live at exit?
>
> > > > > +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > > > > +     continue;
>
> Why does the target need some extra say here?
>
> > > > > +      if (used_only && !df_regs_ever_live_p (regno))
>
> So I suppose this does not include uses by callees of this function?
>
> > > > > +     continue;
> > > > > +
> > > > > +      /* Now we can emit insn to zero this register.  */
> > > > > +      rtx reg, tmp;
> > > > > +
> > > > > +      machine_mode mode
> > > > > +     = targetm.calls.zero_call_used_regno_mode (regno,
> > > > > +                                                reg_raw_mode[regno]);
>
> In what case does the target ever need to adjust this (we're dealing
> with hard-regs only?)?
>
> > > > > +      if (mode == VOIDmode)
> > > > > +     continue;
> > > > > +      if (!have_regs_of_mode[mode])
> > > > > +     continue;
>
> When does this happen?
>
> > > > > +
> > > > > +      reg = gen_rtx_REG (mode, regno);
> > > > > +      if (zero_rtx[(int)mode] == NULL_RTX)
> > > > > +     {
> > > > > +       zero_rtx[(int)mode] = reg;
> > > > > +       tmp = gen_rtx_SET (reg, const0_rtx);
> > > > > +       emit_insn (tmp);
> > > > > +     }
> > > > > +      else
> > > > > +     emit_move_insn (reg, zero_rtx[(int)mode]);
>
> Not sure but I think the canonical zero to use is CONST0_RTX (mode)
> but I may be wrong.  I'd rather have the target be able to specify
> some special instruction for zeroing here.  Some may have
> multi-reg set instructions for example.  That said, can't we
> defer the actual zeroing to the target in full and only compute
> a hard-reg-set of to-be zerored registers here and pass that
> to a target hook?
>
> > > > > +
> > > > > +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> > > > > +    }
> > > > > +
> > > > > +  return;
> > > > > +}
> > > > > +
> > > > > +
> > > > > /* Return a sequence to be used as the epilogue for the current function,
> > > > >   or NULL.  */
> > > > >
> > > > > @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> > > > >
> > > > >  start_sequence ();
> > > > >  emit_note (NOTE_INSN_EPILOGUE_BEG);
> > > > > +
> > > > > +  gen_call_used_regs_seq ();
> > > > > +
>
> The caller eventually performs shrink-wrapping - are you sure that
> doesn't mess up things?
>
> > > > >  rtx_insn *seq = targetm.gen_epilogue ();
> > > > >  if (seq)
> > > > >    emit_jump_insn (seq);
> > > > > diff --git a/gcc/function.h b/gcc/function.h
> > > > > index d55cbdd..fc36c3e 100644
> > > > > --- a/gcc/function.h
> > > > > +++ b/gcc/function.h
> > > > > @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
> > > > >
> > > > > extern void used_types_insert (tree);
> > > > >
> > > > > +extern bool is_live_reg_at_exit (unsigned int);
> > > > > +
> > > > > #endif  /* GCC_FUNCTION_H */
> > > > > diff --git a/gcc/target.def b/gcc/target.def
> > > > > index 07059a8..8aab63e 100644
> > > > > --- a/gcc/target.def
> > > > > +++ b/gcc/target.def
> > > > > @@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
> > > > > default_function_value_regno_p)
> > > > >
> > > > > DEFHOOK
> > > > > +(zero_call_used_regno_p,
> > > > > + "A target hook that returns @code{true} if @var{regno} is the number of a\n\
> > > > > +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> > > > > +@var{regno} must be the number of a hard general register.\n\
> > > > > +\n\
> > > > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.",
> > > > > + bool, (const unsigned int regno, bool general_reg_only_p),
> > > > > + default_zero_call_used_regno_p)
> > > > > +
> > > > > +DEFHOOK
> > > > > +(zero_call_used_regno_mode,
> > > > > + "A target hook that returns a mode of suitable to zero the register for the\n\
> > > > > +call used register @var{regno} in @var{mode}.\n\
> > > > > +\n\
> > > > > +If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
> > > > > +used.",
> > > > > + machine_mode, (const unsigned int regno, machine_mode mode),
> > > > > + default_zero_call_used_regno_mode)
> > > > > +
> > > > > +DEFHOOK
> > > > > (fntype_abi,
> > > > > "Return the ABI used by a function with type @var{type}; see the\n\
> > > > > definition of @code{predefined_function_abi} for details of the ABI\n\
> > > > > @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> > > > > is needed.",
> > > > > rtx, (void), NULL)
> > > > >
> > > > > +DEFHOOK
> > > > > +(pro_epilogue_use,
> > > > > + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
> > > > > +prevent deleting register setting instructions in proprologue and epilogue.",
> > > > > + rtx, (rtx reg), NULL)
> > > > > +
> > > > > +DEFHOOK
> > > > > +(zero_all_vector_registers,
> > > > > + "This hook should return an rtx to zero all vector registers at function\n\
> > > > > +exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
> > > > > +be zeroed.  Return @code{NULL} if possible",
> > > > > + rtx, (bool used_only), NULL)
> > > > > +
> > > > > /* Return true if all function parameters should be spilled to the
> > > > >   stack.  */
> > > > > DEFHOOK
> > > > > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > > > > index 0113c7b..ed02173 100644
> > > > > --- a/gcc/targhooks.c
> > > > > +++ b/gcc/targhooks.c
> > > > > @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> > > > > #endif
> > > > > }
> > > > >
> > > > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > > > +
> > > > > +bool
> > > > > +default_zero_call_used_regno_p (const unsigned int,
> > > > > +                             bool)
> > > > > +{
> > > > > +  return false;
> > > > > +}
> > > > > +
> > > > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > > > +
> > > > > +machine_mode
> > > > > +default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
> > > > > +{
> > > > > +  return mode;
> > > > > +}
> > > > > +
> > > > > rtx
> > > > > default_internal_arg_pointer (void)
> > > > > {
> > > > > diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> > > > > index b572a36..370df19 100644
> > > > > --- a/gcc/targhooks.h
> > > > > +++ b/gcc/targhooks.h
> > > > > @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> > > > > extern rtx default_function_value (const_tree, const_tree, bool);
> > > > > extern rtx default_libcall_value (machine_mode, const_rtx);
> > > > > extern bool default_function_value_regno_p (const unsigned int);
> > > > > +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> > > > > +extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
> > > > > +                                                    machine_mode);
> > > > > extern rtx default_internal_arg_pointer (void);
> > > > > extern rtx default_static_chain (const_tree, bool);
> > > > > extern void default_trampoline_init (rtx, tree, rtx);
> > > > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > > new file mode 100644
> > > > > index 0000000..3c2ac72
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > > @@ -0,0 +1,3 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > > +/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > > new file mode 100644
> > > > > index 0000000..acf48c4
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > > @@ -0,0 +1,4 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2" } */
> > > > > +
> > > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > > new file mode 100644
> > > > > index 0000000..9f61dc4
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > > @@ -0,0 +1,12 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > > new file mode 100644
> > > > > index 0000000..09048e5
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > > @@ -0,0 +1,21 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > +
> > > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> > > > > +
> > > > > +int
> > > > > +foo (int x)
> > > > > +{
> > > > > +  return x;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > > new file mode 100644
> > > > > index 0000000..4862688
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > > @@ -0,0 +1,39 @@
> > > > > +/* { dg-do run { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > > > +
> > > > > +struct S { int i; };
> > > > > +__attribute__((const, noinline, noclone))
> > > > > +struct S foo (int x)
> > > > > +{
> > > > > +  struct S s;
> > > > > +  s.i = x;
> > > > > +  return s;
> > > > > +}
> > > > > +
> > > > > +int a[2048], b[2048], c[2048], d[2048];
> > > > > +struct S e[2048];
> > > > > +
> > > > > +__attribute__((noinline, noclone)) void
> > > > > +bar (void)
> > > > > +{
> > > > > +  int i;
> > > > > +  for (i = 0; i < 1024; i++)
> > > > > +    {
> > > > > +      e[i] = foo (i);
> > > > > +      a[i+2] = a[i] + a[i+1];
> > > > > +      b[10] = b[10] + i;
> > > > > +      c[i] = c[2047 - i];
> > > > > +      d[i] = d[i + 1];
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +main ()
> > > > > +{
> > > > > +  int i;
> > > > > +  bar ();
> > > > > +  for (i = 0; i < 1024; i++)
> > > > > +    if (e[i].i != i)
> > > > > +      __builtin_abort ();
> > > > > +  return 0;
> > > > > +}
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > > new file mode 100644
> > > > > index 0000000..500251b
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > > @@ -0,0 +1,39 @@
> > > > > +/* { dg-do run { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > +
> > > > > +struct S { int i; };
> > > > > +__attribute__((const, noinline, noclone))
> > > > > +struct S foo (int x)
> > > > > +{
> > > > > +  struct S s;
> > > > > +  s.i = x;
> > > > > +  return s;
> > > > > +}
> > > > > +
> > > > > +int a[2048], b[2048], c[2048], d[2048];
> > > > > +struct S e[2048];
> > > > > +
> > > > > +__attribute__((noinline, noclone)) void
> > > > > +bar (void)
> > > > > +{
> > > > > +  int i;
> > > > > +  for (i = 0; i < 1024; i++)
> > > > > +    {
> > > > > +      e[i] = foo (i);
> > > > > +      a[i+2] = a[i] + a[i+1];
> > > > > +      b[10] = b[10] + i;
> > > > > +      c[i] = c[2047 - i];
> > > > > +      d[i] = d[i + 1];
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +main ()
> > > > > +{
> > > > > +  int i;
> > > > > +  bar ();
> > > > > +  for (i = 0; i < 1024; i++)
> > > > > +    if (e[i].i != i)
> > > > > +      __builtin_abort ();
> > > > > +  return 0;
> > > > > +}
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > > new file mode 100644
> > > > > index 0000000..8b058e3
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > > @@ -0,0 +1,21 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > > new file mode 100644
> > > > > index 0000000..d4eaaf7
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > > @@ -0,0 +1,19 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > > new file mode 100644
> > > > > index 0000000..dd3bb90
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > > @@ -0,0 +1,14 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > +
> > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > > new file mode 100644
> > > > > index 0000000..e2274f6
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > > @@ -0,0 +1,14 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> > > > > +
> > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > > new file mode 100644
> > > > > index 0000000..7f5d153
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > > @@ -0,0 +1,13 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > > +
> > > > > +int
> > > > > +foo (int x)
> > > > > +{
> > > > > +  return x;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > > new file mode 100644
> > > > > index 0000000..fe13d2b
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > > @@ -0,0 +1,13 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > > > +
> > > > > +float
> > > > > +foo (float z, float y, float x)
> > > > > +{
> > > > > +  return x + y;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > > new file mode 100644
> > > > > index 0000000..205a532
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > > @@ -0,0 +1,12 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > > > +
> > > > > +float
> > > > > +foo (float z, float y, float x)
> > > > > +{
> > > > > +  return x;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > > new file mode 100644
> > > > > index 0000000..e046684
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > > @@ -0,0 +1,19 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > > new file mode 100644
> > > > > index 0000000..4be8ff6
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > > @@ -0,0 +1,23 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > > > +
> > > > > +float
> > > > > +foo (float z, float y, float x)
> > > > > +{
> > > > > +  return x + y;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > > new file mode 100644
> > > > > index 0000000..0eb34e0
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > > @@ -0,0 +1,14 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> > > > > +
> > > > > +__attribute__ ((zero_call_used_regs("used")))
> > > > > +float
> > > > > +foo (float z, float y, float x)
> > > > > +{
> > > > > +  return x + y;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > > new file mode 100644
> > > > > index 0000000..cbb63a4
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > > @@ -0,0 +1,19 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > > new file mode 100644
> > > > > index 0000000..7573197
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > > @@ -0,0 +1,19 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > > new file mode 100644
> > > > > index 0000000..de71223
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > > @@ -0,0 +1,12 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > > new file mode 100644
> > > > > index 0000000..ccfa441
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > > @@ -0,0 +1,14 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > +
> > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > > new file mode 100644
> > > > > index 0000000..6b46ca3
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > > @@ -0,0 +1,20 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > +
> > > > > +__attribute__ ((zero_call_used_regs("all-gpr")))
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > > new file mode 100644
> > > > > index 0000000..0680f38
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > > @@ -0,0 +1,14 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > +
> > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > > new file mode 100644
> > > > > index 0000000..534defa
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > > @@ -0,0 +1,13 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > > > +
> > > > > +int
> > > > > +foo (int x)
> > > > > +{
> > > > > +  return x;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > > new file mode 100644
> > > > > index 0000000..477bb19
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > > @@ -0,0 +1,19 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > +
> > > > > +int
> > > > > +foo (int x)
> > > > > +{
> > > > > +  return x;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > > new file mode 100644
> > > > > index 0000000..a305a60
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > > @@ -0,0 +1,15 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > +
> > > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > > > +
> > > > > +int
> > > > > +foo (int x)
> > > > > +{
> > > > > +  return x;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/toplev.c b/gcc/toplev.c
> > > > > index 95eea63..01a1f24 100644
> > > > > --- a/gcc/toplev.c
> > > > > +++ b/gcc/toplev.c
> > > > > @@ -1464,6 +1464,15 @@ process_options (void)
> > > > >       }
> > > > >    }
> > > > >
> > > > > +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> > > > > +      && !targetm.calls.pro_epilogue_use)
> > > > > +    {
> > > > > +      error_at (UNKNOWN_LOCATION,
> > > > > +             "%<-fzero-call-used-regs=%> is not supported for this "
> > > > > +             "target");
> > > > > +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> > > > > +    }
> > > > > +
> > > > >  /* One region RA really helps to decrease the code size.  */
> > > > >  if (flag_ira_region == IRA_REGION_AUTODETECT)
> > > > >    flag_ira_region
> > > > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > > > > index 8c5a2e3..71badbd 100644
> > > > > --- a/gcc/tree-core.h
> > > > > +++ b/gcc/tree-core.h
> > > > > @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
> > > > > unsigned final : 1;
> > > > > /* Belong to FUNCTION_DECL exclusively.  */
> > > > > unsigned regdecl_flag : 1;
> > > > > - /* 14 unused bits. */
> > > > > +
> > > > > + /* How to clear call-used registers upon function return.  */
> > > > > + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> > > > > +
> > > > > + /* 11 unused bits.  */
>
> So instead of wasting "precious" bits please use lookup_attribute
> in the single place you query this value (which is once per function).
> There's no need to complicate matters by trying to maintain the above.
>
> > > > > };
> > > > >
> > > > > struct GTY(()) tree_var_decl {
> > > > > diff --git a/gcc/tree.h b/gcc/tree.h
> > > > > index cf546ed..d378a88 100644
> > > > > --- a/gcc/tree.h
> > > > > +++ b/gcc/tree.h
> > > > > @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> > > > > #define DECL_VISIBILITY(NODE) \
> > > > >  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
> > > > >
> > > > > +/* Value of the function decl's type of zeroing the call used
> > > > > +   registers upon return from function.  */
> > > > > +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> > > > > +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> > > > > +
> > > > > /* Nonzero means that the decl (or an enclosing scope) had its
> > > > >   visibility specified rather than being inferred.  */
> > > > > #define DECL_VISIBILITY_SPECIFIED(NODE) \
> > > > > --
> > > > > 1.9.1
> > > >
> > >
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Richard Biener Aug. 5, 2020, 7:06 a.m. UTC | #7
On Tue, 4 Aug 2020, H.J. Lu wrote:

> On Tue, Aug 4, 2020 at 12:35 AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Mon, 3 Aug 2020, Qing Zhao wrote:
> >
> > > Hi, Uros,
> > >
> > > Thanks a lot for your review on X86 parts.
> > >
> > > Hi, Richard,
> > >
> > > Could you please take a look at the middle-end part to see whether the
> > > rewritten addressed your previous concern?
> >
> > I have a few comments below - I'm not sure I'm qualified to fully
> > review the rest though.
> >
> > > Thanks a lot.
> > >
> > > Qing
> > >
> > >
> > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > >
> > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > > > >
> > > > >
> > > > > Richard and Uros,
> > > > >
> > > > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > > > >
> > > > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.
> > > > >
> > > > > Thanks a lot for your time.
> > > >
> > > > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > > >
> > > > That said, x86 parts looks OK.
> > > >
> > > >
> > >
> > > > Uros.
> > > > > Qing
> > > > >
> > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > >
> > > > > > Hi, Gcc team,
> > > > > >
> > > > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > > >
> > > > > > From the previous round of discussion, the major issues raised were:
> > > > > >
> > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > >
> > > > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > > >
> > > > > > 1. Change the names of the option and attribute from
> > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > to:
> > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > Add the new option and  new attribute in general.
> > > > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > > > 3. Add 4 target-hooks;
> > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > > >
> > > > > > The patch is as following:
> > > > > >
> > > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > command-line option and
> > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > > >
> > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > >
> > > > > >  Don't zero call-used registers upon function return.
> >
> > Does a return via EH unwinding also constitute a function return?  I
> > think you may want to have a finally handler or support in the unwinder
> > for this?  Then there's abnormal return via longjmp & friends, I guess
> > there's nothing that can be done there besides patching glibc?
> 
> Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> patch. Only normal returns are covered.

What's the point then?  Also specifically thinking about spill slots.

Richard.

> > In general I am missing reasoning as why to use -fzero-call-used-regs=
> > in the documentation, that is, what is the thread model and what are
> > the guarantees?  Is there any point zeroing registers when spill slots
> > are left populated with stale register contents?  How do I (and why
> > would I want to?) ensure that there's no information leak from the
> > implementation of 'foo' to their callers?  Do I need to compile all
> > of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> > or is it enough to annotate API boundaries I want to proptect with
> > zero_call_used_regs("...")?
> >
> > Again - what's the intended use (and how does it fulful anything useful
> > for that case)?
> >
> > > > > >  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> > > > > >
> > > > > >  Zero used call-used general purpose registers upon function return.
> > > > > >
> > > > > >  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> > > > > >
> > > > > >  Zero all call-used general purpose registers upon function return.
> > > > > >
> > > > > >  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> > > > > >
> > > > > >  Zero used call-used registers upon function return.
> > > > > >
> > > > > >  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> > > > > >
> > > > > >  Zero all call-used registers upon function return.
> > > > > >
> > > > > > The feature is implemented in middle-end. But currently is only valid on X86.
> > > > > >
> > > > > > Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> > > > > > -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> > > > > > -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> > > > > > by default on x86-64.
> > > > > >
> > > > > > Please take a look and let me know any more comment?
> > > > > >
> > > > > > thanks.
> > > > > >
> > > > > > Qing
> > > > > >
> > > > > >
> > > > > > ====================================
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > > >
> > > > > >       * common.opt: Add new option -fzero-call-used-regs.
> > > > > >       * config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
> > > > > >       (ix86_zero_call_used_regno_mode): Likewise.
> > > > > >       (ix86_zero_all_vector_registers): Likewise.
> > > > > >       (ix86_expand_prologue): Replace gen_prologue_use with
> > > > > >       gen_pro_epilogue_use.
> > > > > >       (TARGET_ZERO_CALL_USED_REGNO_P): Define.
> > > > > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
> > > > > >       (TARGET_PRO_EPILOGUE_USE): Define.
> > > > > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
> > > > > >       * config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
> > > > > >       with UNSPECV_PRO_EPILOGUE_USE.
> > > > > >       * coretypes.h (enum zero_call_used_regs): New type.
> > > > > >       * doc/extend.texi: Document the new zero_call_used_regs attribute.
> > > > > >       * doc/invoke.texi: Document the new -fzero-call-used-regs option.
> > > > > >       * doc/tm.texi: Regenerate.
> > > > > >       * doc/tm.texi.in <http://tm.texi.in/> (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
> > > > > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
> > > > > >       (TARGET_PRO_EPILOGUE_USE): Likewise.
> > > > > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
> > > > > >       * function.c (is_live_reg_at_exit): New function.
> > > > > >       (gen_call_used_regs_seq): Likewise.
> > > > > >       (make_epilogue_seq): Call gen_call_used_regs_seq.
> > > > > >       * function.h (is_live_reg_at_exit): Declare.
> > > > > >       * target.def (zero_call_used_regno_p): New hook.
> > > > > >       (zero_call_used_regno_mode): Likewise.
> > > > > >       (pro_epilogue_use): Likewise.
> > > > > >       (zero_all_vector_registers): Likewise.
> > > > > >       * targhooks.c (default_zero_call_used_regno_p): New function.
> > > > > >       (default_zero_call_used_regno_mode): Likewise.
> > > > > >       * targhooks.h (default_zero_call_used_regno_p): Declare.
> > > > > >       (default_zero_call_used_regno_mode): Declare.
> > > > > >       * toplev.c (process_options): Issue errors when -fzero-call-used-regs
> > > > > >       is used on targets that do not support it.
> > > > > >       * tree-core.h (struct tree_decl_with_vis): New field
> > > > > >       zero_call_used_regs_type.
> > > > > >       * tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
> > > > > >
> > > > > > gcc/c-family/ChangeLog:
> > > > > >
> > > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > > >
> > > > > >       * c-attribs.c (c_common_attribute_table): Add new attribute
> > > > > >       zero_call_used_regs.
> > > > > >       (handle_zero_call_used_regs_attribute): New function.
> > > > > >
> > > > > > gcc/c/ChangeLog:
> > > > > >
> > > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > > >
> > > > > >       * c-decl.c (merge_decls): Merge zero_call_used_regs_type.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > > >
> > > > > >       * c-c++-common/zero-scratch-regs-1.c: New test.
> > > > > >       * c-c++-common/zero-scratch-regs-2.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-1.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> > > > > >
> > > > > > ---
> > > > > > gcc/c-family/c-attribs.c                           |  68 ++++++++++
> > > > > > gcc/c/c-decl.c                                     |   4 +
> > > > > > gcc/common.opt                                     |  23 ++++
> > > > > > gcc/config/i386/i386.c                             |  58 ++++++++-
> > > > > > gcc/config/i386/i386.md                            |   6 +-
> > > > > > gcc/coretypes.h                                    |  10 ++
> > > > > > gcc/doc/extend.texi                                |  11 ++
> > > > > > gcc/doc/invoke.texi                                |  13 +-
> > > > > > gcc/doc/tm.texi                                    |  27 ++++
> > > > > > gcc/doc/tm.texi.in <http://tm.texi.in/>                                 |   8 ++
> > > > > > gcc/function.c                                     | 145 +++++++++++++++++++++
> > > > > > gcc/function.h                                     |   2 +
> > > > > > gcc/target.def                                     |  33 +++++
> > > > > > gcc/targhooks.c                                    |  17 +++
> > > > > > gcc/targhooks.h                                    |   3 +
> > > > > > gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> > > > > > gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> > > > > > .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> > > > > > .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> > > > > > .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> > > > > > .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> > > > > > gcc/toplev.c                                       |   9 ++
> > > > > > gcc/tree-core.h                                    |   6 +-
> > > > > > gcc/tree.h                                         |   5 +
> > > > > > 43 files changed, 866 insertions(+), 7 deletions(-)
> > > > > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > > >
> > > > > > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > > > > > index 3721483..cc93d6f 100644
> > > > > > --- a/gcc/c-family/c-attribs.c
> > > > > > +++ b/gcc/c-family/c-attribs.c
> > > > > > @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> > > > > > static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> > > > > > static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > > > > > static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> > > > > > +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> > > > > > +                                              bool *);
> > > > > > static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> > > > > > static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> > > > > > static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> > > > > > @@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
> > > > > >                             ignore_attribute, NULL },
> > > > > >  { "no_split_stack",        0, 0, true,  false, false, false,
> > > > > >                             handle_no_split_stack_attribute, NULL },
> > > > > > +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> > > > > > +                           handle_zero_call_used_regs_attribute, NULL },
> > > > > > +
> > > > > >  /* For internal use (marking of builtins and runtime functions) only.
> > > > > >     The name contains space to prevent its usage in source code.  */
> > > > > >  { "fn spec",               1, 1, false, true, true, false,
> > > > > > @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
> > > > > >  return NULL_TREE;
> > > > > > }
> > > > > >
> > > > > > +/* Handle a "zero_call_used_regs" attribute; arguments as in
> > > > > > +   struct attribute_spec.handler.  */
> > > > > > +
> > > > > > +static tree
> > > > > > +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> > > > > > +                                   int ARG_UNUSED (flags),
> > > > > > +                                   bool *no_add_attris)
> > > > > > +{
> > > > > > +  tree decl = *node;
> > > > > > +  tree id = TREE_VALUE (args);
> > > > > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > > > > +
> > > > > > +  if (TREE_CODE (decl) != FUNCTION_DECL)
> > > > > > +    {
> > > > > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > > > > +             "%qE attribute applies only to functions", name);
> > > > > > +      *no_add_attris = true;
> > > > > > +      return NULL_TREE;
> > > > > > +    }
> > > > > > +  else if (DECL_INITIAL (decl))
> > > > > > +    {
> > > > > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > > > > +             "cannot set %qE attribute after definition", name);
> >
> > Why's that?
> >
> > > > > > +      *no_add_attris = true;
> > > > > > +      return NULL_TREE;
> > > > > > +    }
> > > > > > +
> > > > > > +  if (TREE_CODE (id) != STRING_CST)
> > > > > > +    {
> > > > > > +      error ("attribute %qE arguments not a string", name);
> > > > > > +      *no_add_attris = true;
> > > > > > +      return NULL_TREE;
> > > > > > +    }
> > > > > > +
> > > > > > +  if (!targetm.calls.pro_epilogue_use)
> > > > > > +    {
> > > > > > +      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
> > > > > > +      return NULL_TREE;
> > > > > > +    }
> > > > > > +
> > > > > > +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> > > > > > +    zero_call_used_regs_type = zero_call_used_regs_skip;
> > > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> > > > > > +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> > > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> > > > > > +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> > > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> > > > > > +    zero_call_used_regs_type = zero_call_used_regs_used;
> > > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> > > > > > +    zero_call_used_regs_type = zero_call_used_regs_all;
> > > > > > +  else
> > > > > > +    {
> > > > > > +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
> > > > > > +          name, "skip", "used-gpr", "all-gpr", "used", "all");
> > > > > > +      *no_add_attris = true;
> > > > > > +      return NULL_TREE;
> > > > > > +    }
> > > > > > +
> > > > > > +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> > > > > > +
> > > > > > +  return NULL_TREE;
> > > > > > +}
> > > > > > +
> > > > > > /* Handle a "returns_nonnull" attribute; arguments as in
> > > > > >   struct attribute_spec.handler.  */
> > > > > >
> > > > > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > > > > > index 81bd2ee..ded1880 100644
> > > > > > --- a/gcc/c/c-decl.c
> > > > > > +++ b/gcc/c/c-decl.c
> > > > > > @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
> > > > > >         DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> > > > > >       }
> > > > > >
> > > > > > +      /* Merge the zero_call_used_regs_type information.  */
> > > > > > +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> > > > > > +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> > > > > > +
> >
> > If you need this (see below) then likely cp/* needs similar adjustment
> > so do other places in the middle-end (function cloning, etc)
> >
> > > > > >      /* Merge the storage class information.  */
> > > > > >      merge_weak (newdecl, olddecl);
> > > > > >
> > > > > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > > > > index df8af36..19900f9 100644
> > > > > > --- a/gcc/common.opt
> > > > > > +++ b/gcc/common.opt
> > > > > > @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> > > > > > Common Report Var(flag_zero_initialized_in_bss) Init(1)
> > > > > > Put zero initialized data in the bss section.
> > > > > >
> > > > > > +fzero-call-used-regs=
> > > > > > +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> > > > > > +Clear call-used registers upon function return.
> > > > > > +
> > > > > > +Enum
> > > > > > +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> > > > > > +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> > > > > > +
> > > > > > +EnumValue
> > > > > > +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> > > > > > +
> > > > > > +EnumValue
> > > > > > +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> > > > > > +
> > > > > > +EnumValue
> > > > > > +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> > > > > > +
> > > > > > +EnumValue
> > > > > > +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> > > > > > +
> > > > > > +EnumValue
> > > > > > +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> > > > > > +
> > > > > > g
> > > > > > Common Driver RejectNegative JoinedOrMissing
> > > > > > Generate debug information in default format.
> > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > > index 5c373c0..fd1aa9c 100644
> > > > > > --- a/gcc/config/i386/i386.c
> > > > > > +++ b/gcc/config/i386/i386.c
> > > > > > @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
> > > > > >  return false;
> > > > > > }
> > > > > >
> > > > > > +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > > > > +
> > > > > > +static bool
> > > > > > +ix86_zero_call_used_regno_p (const unsigned int regno,
> > > > > > +                          bool gpr_only)
> > > > > > +{
> > > > > > +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> > > > > > +}
> > > > > > +
> > > > > > +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > > > > +
> > > > > > +static machine_mode
> > > > > > +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> > > > > > +{
> > > > > > +  /* NB: We only need to zero the lower 32 bits for integer registers
> > > > > > +     and the lower 128 bits for vector registers since destination are
> > > > > > +     zero-extended to the full register width.  */
> > > > > > +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> > > > > > +}
> > > > > > +
> > > > > > +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> > > > > > +
> > > > > > +static rtx
> > > > > > +ix86_zero_all_vector_registers (bool used_only)
> > > > > > +{
> > > > > > +  if (!TARGET_AVX)
> > > > > > +    return NULL;
> > > > > > +
> > > > > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > > > > +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> > > > > > +      || (TARGET_64BIT
> > > > > > +          && (REX_SSE_REGNO_P (regno)
> > > > > > +              || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> > > > > > +     && (!this_target_hard_regs->x_call_used_regs[regno]
> > > > > > +         || fixed_regs[regno]
> > > > > > +         || is_live_reg_at_exit (regno)
> > > > > > +         || (used_only && !df_regs_ever_live_p (regno))))
> > > > > > +      return NULL;
> > > > > > +
> > > > > > +  return gen_avx_vzeroall ();
> > > > > > +}
> > > > > > +
> > > > > > /* Define how to find the value returned by a function.
> > > > > >   VALTYPE is the data type of the value (as a tree).
> > > > > >   If the precise function being called is known, FUNC is its FUNCTION_DECL;
> > > > > > @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
> > > > > >      insn = emit_insn (gen_set_got (pic));
> > > > > >      RTX_FRAME_RELATED_P (insn) = 1;
> > > > > >      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> > > > > > -      emit_insn (gen_prologue_use (pic));
> > > > > > +      emit_insn (gen_pro_epilogue_use (pic));
> > > > > >      /* Deleting already emmitted SET_GOT if exist and allocated to
> > > > > >        REAL_PIC_OFFSET_TABLE_REGNUM.  */
> > > > > >      ix86_elim_entry_set_got (pic);
> > > > > > @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
> > > > > >     Further, prevent alloca modifications to the stack pointer from being
> > > > > >     combined with prologue modifications.  */
> > > > > >  if (TARGET_SEH)
> > > > > > -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> > > > > > +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> > > > > > }
> > > > > >
> > > > > > /* Emit code to restore REG using a POP insn.  */
> > > > > > @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> > > > > > #undef TARGET_FUNCTION_VALUE_REGNO_P
> > > > > > #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> > > > > >
> > > > > > +#undef TARGET_ZERO_CALL_USED_REGNO_P
> > > > > > +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> > > > > > +
> > > > > > +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> > > > > > +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> > > > > > +
> > > > > > +#undef TARGET_PRO_EPILOGUE_USE
> > > > > > +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> > > > > > +
> > > > > > +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > > > > +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> > > > > > +
> > > > > > #undef TARGET_PROMOTE_FUNCTION_MODE
> > > > > > #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> > > > > >
> > > > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > > > > index d0ecd9e..e7df59f 100644
> > > > > > --- a/gcc/config/i386/i386.md
> > > > > > +++ b/gcc/config/i386/i386.md
> > > > > > @@ -194,7 +194,7 @@
> > > > > >  UNSPECV_STACK_PROBE
> > > > > >  UNSPECV_PROBE_STACK_RANGE
> > > > > >  UNSPECV_ALIGN
> > > > > > -  UNSPECV_PROLOGUE_USE
> > > > > > +  UNSPECV_PRO_EPILOGUE_USE
> > > > > >  UNSPECV_SPLIT_STACK_RETURN
> > > > > >  UNSPECV_CLD
> > > > > >  UNSPECV_NOPS
> > > > > > @@ -13525,8 +13525,8 @@
> > > > > >
> > > > > > ;; As USE insns aren't meaningful after reload, this is used instead
> > > > > > ;; to prevent deleting instructions setting registers for PIC code
> > > > > > -(define_insn "prologue_use"
> > > > > > -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> > > > > > +(define_insn "pro_epilogue_use"
> > > > > > +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
> > > > > >  ""
> > > > > >  ""
> > > > > >  [(set_attr "length" "0")])
> > > > > > diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> > > > > > index 6b6cfcd..e56d6ec 100644
> > > > > > --- a/gcc/coretypes.h
> > > > > > +++ b/gcc/coretypes.h
> > > > > > @@ -418,6 +418,16 @@ enum symbol_visibility
> > > > > >  VISIBILITY_INTERNAL
> > > > > > };
> > > > > >
> > > > > > +/* Zero call-used registers type.  */
> > > > > > +enum zero_call_used_regs {
> > > > > > +  zero_call_used_regs_unset = 0,
> > > > > > +  zero_call_used_regs_skip,
> > > > > > +  zero_call_used_regs_used_gpr,
> > > > > > +  zero_call_used_regs_all_gpr,
> > > > > > +  zero_call_used_regs_used,
> > > > > > +  zero_call_used_regs_all
> > > > > > +};
> > > > > > +
> > > > > > /* enums used by the targetm.excess_precision hook.  */
> > > > > >
> > > > > > enum flt_eval_method
> > > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > > > > > index c800b74..b32c55f 100644
> > > > > > --- a/gcc/doc/extend.texi
> > > > > > +++ b/gcc/doc/extend.texi
> > > > > > @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> > > > > > A declaration to which @code{weakref} is attached and that is associated
> > > > > > with a named @code{target} must be @code{static}.
> > > > > >
> > > > > > +@item zero_call_used_regs ("@var{choice}")
> > > > > > +@cindex @code{zero_call_used_regs} function attribute
> > > > > > +The @code{zero_call_used_regs} attribute causes the compiler to zero
> > > > > > +call-used registers at function return according to @var{choice}.
> > > > > > +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> > > > > > +call-used general purpose registers which are used in funciton.
> > > > > > +@samp{all-gpr} zeros all call-used general purpose registers.
> > > > > > +@samp{used} zeros call-used registers which are used in function.
> > > > > > +@samp{all} zeros all call-used registers.  The default for the
> > > > > > +attribute is controlled by @option{-fzero-call-used-regs}.
> > > > > > +
> > > > > > @end table
> > > > > >
> > > > > > @c This is the end of the target-independent attribute table
> > > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > > > index 09bcc5b..da02686 100644
> > > > > > --- a/gcc/doc/invoke.texi
> > > > > > +++ b/gcc/doc/invoke.texi
> > > > > > @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> > > > > > -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> > > > > > -funsafe-math-optimizations  -funswitch-loops @gol
> > > > > > -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> > > > > > --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> > > > > > +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> > > > > > --param @var{name}=@var{value}
> > > > > > -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> > > > > >
> > > > > > @@ -12273,6 +12273,17 @@ int foo (void)
> > > > > >
> > > > > > Not all targets support this option.
> > > > > >
> > > > > > +@item -fzero-call-used-regs=@var{choice}
> > > > > > +@opindex fzero-call-used-regs
> > > > > > +Zero call-used registers at function return according to
> > > > > > +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> > > > > > +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> > > > > > +registers which are used in function.  @samp{all-gpr} zeros all
> > > > > > +call-used registers.  @samp{used} zeros call-used registers which
> > > > > > +are used in function.  @samp{all} zeros all call-used registers.  You
> > > > > > +can control this behavior for a specific function by using the function
> > > > > > +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> > > > > > +
> > > > > > @item --param @var{name}=@var{value}
> > > > > > @opindex param
> > > > > > In some places, GCC uses various constants to control the amount of
> > > > > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > > > > index 6e7d9dc..43dddd3 100644
> > > > > > --- a/gcc/doc/tm.texi
> > > > > > +++ b/gcc/doc/tm.texi
> > > > > > @@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
> > > > > > If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> > > > > > @end deftypefn
> > > > > >
> > > > > > +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
> > > > > > +A target hook that returns @code{true} if @var{regno} is the number of a
> > > > > > +call used register.  If @var{general_reg_only_p} is @code{true},
> > > > > > +@var{regno} must be the number of a hard general register.
> > > > > > +
> > > > > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.
> > > > > > +@end deftypefn
> > > > > > +
> > > > > > +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
> > > > > > +A target hook that returns a mode of suitable to zero the register for the
> > > > > > +call used register @var{regno} in @var{mode}.
> > > > > > +
> > > > > > +If this hook is not defined, then default_zero_call_used_regno_mode will be
> > > > > > +used.
> > > > > > +@end deftypefn
> > > > > > +
> > > > > > @defmac APPLY_RESULT_SIZE
> > > > > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > > > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > > > > @@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> > > > > > is needed.
> > > > > > @end deftypefn
> > > > > >
> > > > > > +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> > > > > > +This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
> > > > > > +prevent deleting register setting instructions in proprologue and epilogue.
> > > > > > +@end deftypefn
> > > > > > +
> > > > > > +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
> > > > > > +This hook should return an rtx to zero all vector registers at function
> > > > > > +exit.  If @var{used_only} is @code{true}, only used vector registers should
> > > > > > +be zeroed.  Return @code{NULL} if possible
> > > > > > +@end deftypefn
> > > > > > +
> > > > > > @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> > > > > > When optimization is disabled, this hook indicates whether or not
> > > > > > arguments should be allocated to stack slots.  Normally, GCC allocates
> > > > > > diff --git a/gcc/doc/tm.texi.in <http://tm.texi.in/> b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > > > index 3be984b..bee917a 100644
> > > > > > --- a/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > > > +++ b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > > > @@ -3430,6 +3430,10 @@ for a new target instead.
> > > > > >
> > > > > > @hook TARGET_FUNCTION_VALUE_REGNO_P
> > > > > >
> > > > > > +@hook TARGET_ZERO_CALL_USED_REGNO_P
> > > > > > +
> > > > > > +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> > > > > > +
> > > > > > @defmac APPLY_RESULT_SIZE
> > > > > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > > > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > > > > @@ -8109,6 +8113,10 @@ and the associated definitions of those functions.
> > > > > >
> > > > > > @hook TARGET_GET_DRAP_RTX
> > > > > >
> > > > > > +@hook TARGET_PRO_EPILOGUE_USE
> > > > > > +
> > > > > > +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > > > > +
> > > > > > @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> > > > > >
> > > > > > @hook TARGET_CONST_ANCHOR
> > > > > > diff --git a/gcc/function.c b/gcc/function.c
> > > > > > index 9eee9b5..9908530 100644
> > > > > > --- a/gcc/function.c
> > > > > > +++ b/gcc/function.c
> > > > > > @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> > > > > > #include "emit-rtl.h"
> > > > > > #include "recog.h"
> > > > > > #include "rtl-error.h"
> > > > > > +#include "hard-reg-set.h"
> > > > > > #include "alias.h"
> > > > > > #include "fold-const.h"
> > > > > > #include "stor-layout.h"
> > > > > > @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
> > > > > >  return seq;
> > > > > > }
> > > > > >
> > > > > > +/* Check whether the hard register REGNO is live at the exit block
> > > > > > + * of the current routine.  */
> > > > > > +bool
> > > > > > +is_live_reg_at_exit (unsigned int regno)
> > > > > > +{
> > > > > > +  edge e;
> > > > > > +  edge_iterator ei;
> > > > > > +
> > > > > > +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> > > > > > +    {
> > > > > > +      bitmap live_out = df_get_live_out (e->src);
> > > > > > +      if (REGNO_REG_SET_P (live_out, regno))
> > > > > > +     return true;
> > > > > > +    }
> > > > > > +
> > > > > > +  return false;
> > > > > > +}
> > > > > > +
> > > > > > +/* Emit a sequence of insns to zero the call-used-registers for the current
> > > > > > + * function.  */
> >
> > No '*' on the continuation line
> >
> > > > > > +
> > > > > > +static void
> > > > > > +gen_call_used_regs_seq (void)
> > > > > > +{
> > > > > > +  if (!targetm.calls.pro_epilogue_use)
> > > > > > +    return;
> > > > > > +
> > > > > > +  bool gpr_only = true;
> > > > > > +  bool used_only = true;
> > > > > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > > > > +
> > > > > > +  if (flag_zero_call_used_regs)
> > > > > > +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> > > > > > +     == zero_call_used_regs_unset)
> > > > > > +      zero_call_used_regs_type = flag_zero_call_used_regs;
> > > > > > +    else
> > > > > > +      zero_call_used_regs_type
> > > > > > +     = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > > > > +  else
> > > > > > +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > > > > +
> > > > > > +  /* No need to zero call-used-regs when no user request is present.  */
> > > > > > +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> > > > > > +    return;
> > > > > > +
> > > > > > +  /* No need to zero call-used-regs in main ().  */
> > > > > > +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> > > > > > +    return;
> > > > > > +
> > > > > > +  /* No need to zero call-used-regs if __builtin_eh_return is called
> > > > > > +     since it isn't a normal function return.  */
> > > > > > +  if (crtl->calls_eh_return)
> > > > > > +    return;
> > > > > > +
> > > > > > +  /* If gpr_only is true, only zero call-used-registers that are
> > > > > > +     general-purpose registers; if used_only is true, only zero
> > > > > > +     call-used-registers that are used in the current function.  */
> > > > > > +  switch (zero_call_used_regs_type)
> > > > > > +    {
> > > > > > +      case zero_call_used_regs_all_gpr:
> > > > > > +     used_only = false;
> > > > > > +     break;
> > > > > > +      case zero_call_used_regs_used:
> > > > > > +     gpr_only = false;
> > > > > > +     break;
> > > > > > +      case zero_call_used_regs_all:
> > > > > > +     gpr_only = false;
> > > > > > +     used_only = false;
> > > > > > +     break;
> > > > > > +      default:
> > > > > > +     break;
> > > > > > +    }
> > > > > > +
> > > > > > +  /* An optimization to use a single hard insn to zero all vector registers on
> > > > > > +     the target that provides such insn.  */
> > > > > > +  if (!gpr_only
> > > > > > +      && targetm.calls.zero_all_vector_registers)
> > > > > > +    {
> > > > > > +      rtx zero_all_vec_insn
> > > > > > +     = targetm.calls.zero_all_vector_registers (used_only);
> > > > > > +      if (zero_all_vec_insn)
> > > > > > +     {
> > > > > > +       emit_insn (zero_all_vec_insn);
> > > > > > +       gpr_only = true;
> > > > > > +     }
> > > > > > +    }
> > > > > > +
> > > > > > +  /* For each of the hard registers, check to see whether we should zero it if:
> > > > > > +     1. it is a call-used-registers;
> > > > > > + and 2. it is not a fixed-registers;
> > > > > > + and 3. it is not live at the end of the routine;
> > > > > > + and 4. it is general purpose register if gpr_only is true;
> > > > > > + and 5. it is used in the routine if used_only is true;
> > > > > > +   */
> > > > > > +
> > > > > > +  /* This array holds the zero rtx with the correponding machine mode.  */
> > > > > > +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> > > > > > +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> > > > > > +    zero_rtx[i] = NULL_RTX;
> > > > > > +
> > > > > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > > > > +    {
> > > > > > +      if (!this_target_hard_regs->x_call_used_regs[regno])
> >
> > Use if (!call_used_regs[regno])
> >
> > > > > > +     continue;
> > > > > > +      if (fixed_regs[regno])
> > > > > > +     continue;
> > > > > > +      if (is_live_reg_at_exit (regno))
> > > > > > +     continue;
> >
> > How can a call-used reg be live at exit?
> >
> > > > > > +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > > > > > +     continue;
> >
> > Why does the target need some extra say here?
> >
> > > > > > +      if (used_only && !df_regs_ever_live_p (regno))
> >
> > So I suppose this does not include uses by callees of this function?
> >
> > > > > > +     continue;
> > > > > > +
> > > > > > +      /* Now we can emit insn to zero this register.  */
> > > > > > +      rtx reg, tmp;
> > > > > > +
> > > > > > +      machine_mode mode
> > > > > > +     = targetm.calls.zero_call_used_regno_mode (regno,
> > > > > > +                                                reg_raw_mode[regno]);
> >
> > In what case does the target ever need to adjust this (we're dealing
> > with hard-regs only?)?
> >
> > > > > > +      if (mode == VOIDmode)
> > > > > > +     continue;
> > > > > > +      if (!have_regs_of_mode[mode])
> > > > > > +     continue;
> >
> > When does this happen?
> >
> > > > > > +
> > > > > > +      reg = gen_rtx_REG (mode, regno);
> > > > > > +      if (zero_rtx[(int)mode] == NULL_RTX)
> > > > > > +     {
> > > > > > +       zero_rtx[(int)mode] = reg;
> > > > > > +       tmp = gen_rtx_SET (reg, const0_rtx);
> > > > > > +       emit_insn (tmp);
> > > > > > +     }
> > > > > > +      else
> > > > > > +     emit_move_insn (reg, zero_rtx[(int)mode]);
> >
> > Not sure but I think the canonical zero to use is CONST0_RTX (mode)
> > but I may be wrong.  I'd rather have the target be able to specify
> > some special instruction for zeroing here.  Some may have
> > multi-reg set instructions for example.  That said, can't we
> > defer the actual zeroing to the target in full and only compute
> > a hard-reg-set of to-be zerored registers here and pass that
> > to a target hook?
> >
> > > > > > +
> > > > > > +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> > > > > > +    }
> > > > > > +
> > > > > > +  return;
> > > > > > +}
> > > > > > +
> > > > > > +
> > > > > > /* Return a sequence to be used as the epilogue for the current function,
> > > > > >   or NULL.  */
> > > > > >
> > > > > > @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> > > > > >
> > > > > >  start_sequence ();
> > > > > >  emit_note (NOTE_INSN_EPILOGUE_BEG);
> > > > > > +
> > > > > > +  gen_call_used_regs_seq ();
> > > > > > +
> >
> > The caller eventually performs shrink-wrapping - are you sure that
> > doesn't mess up things?
> >
> > > > > >  rtx_insn *seq = targetm.gen_epilogue ();
> > > > > >  if (seq)
> > > > > >    emit_jump_insn (seq);
> > > > > > diff --git a/gcc/function.h b/gcc/function.h
> > > > > > index d55cbdd..fc36c3e 100644
> > > > > > --- a/gcc/function.h
> > > > > > +++ b/gcc/function.h
> > > > > > @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
> > > > > >
> > > > > > extern void used_types_insert (tree);
> > > > > >
> > > > > > +extern bool is_live_reg_at_exit (unsigned int);
> > > > > > +
> > > > > > #endif  /* GCC_FUNCTION_H */
> > > > > > diff --git a/gcc/target.def b/gcc/target.def
> > > > > > index 07059a8..8aab63e 100644
> > > > > > --- a/gcc/target.def
> > > > > > +++ b/gcc/target.def
> > > > > > @@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
> > > > > > default_function_value_regno_p)
> > > > > >
> > > > > > DEFHOOK
> > > > > > +(zero_call_used_regno_p,
> > > > > > + "A target hook that returns @code{true} if @var{regno} is the number of a\n\
> > > > > > +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> > > > > > +@var{regno} must be the number of a hard general register.\n\
> > > > > > +\n\
> > > > > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.",
> > > > > > + bool, (const unsigned int regno, bool general_reg_only_p),
> > > > > > + default_zero_call_used_regno_p)
> > > > > > +
> > > > > > +DEFHOOK
> > > > > > +(zero_call_used_regno_mode,
> > > > > > + "A target hook that returns a mode of suitable to zero the register for the\n\
> > > > > > +call used register @var{regno} in @var{mode}.\n\
> > > > > > +\n\
> > > > > > +If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
> > > > > > +used.",
> > > > > > + machine_mode, (const unsigned int regno, machine_mode mode),
> > > > > > + default_zero_call_used_regno_mode)
> > > > > > +
> > > > > > +DEFHOOK
> > > > > > (fntype_abi,
> > > > > > "Return the ABI used by a function with type @var{type}; see the\n\
> > > > > > definition of @code{predefined_function_abi} for details of the ABI\n\
> > > > > > @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> > > > > > is needed.",
> > > > > > rtx, (void), NULL)
> > > > > >
> > > > > > +DEFHOOK
> > > > > > +(pro_epilogue_use,
> > > > > > + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
> > > > > > +prevent deleting register setting instructions in proprologue and epilogue.",
> > > > > > + rtx, (rtx reg), NULL)
> > > > > > +
> > > > > > +DEFHOOK
> > > > > > +(zero_all_vector_registers,
> > > > > > + "This hook should return an rtx to zero all vector registers at function\n\
> > > > > > +exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
> > > > > > +be zeroed.  Return @code{NULL} if possible",
> > > > > > + rtx, (bool used_only), NULL)
> > > > > > +
> > > > > > /* Return true if all function parameters should be spilled to the
> > > > > >   stack.  */
> > > > > > DEFHOOK
> > > > > > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > > > > > index 0113c7b..ed02173 100644
> > > > > > --- a/gcc/targhooks.c
> > > > > > +++ b/gcc/targhooks.c
> > > > > > @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> > > > > > #endif
> > > > > > }
> > > > > >
> > > > > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > > > > +
> > > > > > +bool
> > > > > > +default_zero_call_used_regno_p (const unsigned int,
> > > > > > +                             bool)
> > > > > > +{
> > > > > > +  return false;
> > > > > > +}
> > > > > > +
> > > > > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > > > > +
> > > > > > +machine_mode
> > > > > > +default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
> > > > > > +{
> > > > > > +  return mode;
> > > > > > +}
> > > > > > +
> > > > > > rtx
> > > > > > default_internal_arg_pointer (void)
> > > > > > {
> > > > > > diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> > > > > > index b572a36..370df19 100644
> > > > > > --- a/gcc/targhooks.h
> > > > > > +++ b/gcc/targhooks.h
> > > > > > @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> > > > > > extern rtx default_function_value (const_tree, const_tree, bool);
> > > > > > extern rtx default_libcall_value (machine_mode, const_rtx);
> > > > > > extern bool default_function_value_regno_p (const unsigned int);
> > > > > > +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> > > > > > +extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
> > > > > > +                                                    machine_mode);
> > > > > > extern rtx default_internal_arg_pointer (void);
> > > > > > extern rtx default_static_chain (const_tree, bool);
> > > > > > extern void default_trampoline_init (rtx, tree, rtx);
> > > > > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > > > new file mode 100644
> > > > > > index 0000000..3c2ac72
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > > > @@ -0,0 +1,3 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > > > +/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > > > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > > > new file mode 100644
> > > > > > index 0000000..acf48c4
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > > > @@ -0,0 +1,4 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2" } */
> > > > > > +
> > > > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > > > new file mode 100644
> > > > > > index 0000000..9f61dc4
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > > > @@ -0,0 +1,12 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > > > new file mode 100644
> > > > > > index 0000000..09048e5
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > > > @@ -0,0 +1,21 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > > +
> > > > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> > > > > > +
> > > > > > +int
> > > > > > +foo (int x)
> > > > > > +{
> > > > > > +  return x;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > > > new file mode 100644
> > > > > > index 0000000..4862688
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > > > @@ -0,0 +1,39 @@
> > > > > > +/* { dg-do run { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > > > > +
> > > > > > +struct S { int i; };
> > > > > > +__attribute__((const, noinline, noclone))
> > > > > > +struct S foo (int x)
> > > > > > +{
> > > > > > +  struct S s;
> > > > > > +  s.i = x;
> > > > > > +  return s;
> > > > > > +}
> > > > > > +
> > > > > > +int a[2048], b[2048], c[2048], d[2048];
> > > > > > +struct S e[2048];
> > > > > > +
> > > > > > +__attribute__((noinline, noclone)) void
> > > > > > +bar (void)
> > > > > > +{
> > > > > > +  int i;
> > > > > > +  for (i = 0; i < 1024; i++)
> > > > > > +    {
> > > > > > +      e[i] = foo (i);
> > > > > > +      a[i+2] = a[i] + a[i+1];
> > > > > > +      b[10] = b[10] + i;
> > > > > > +      c[i] = c[2047 - i];
> > > > > > +      d[i] = d[i + 1];
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > > +int
> > > > > > +main ()
> > > > > > +{
> > > > > > +  int i;
> > > > > > +  bar ();
> > > > > > +  for (i = 0; i < 1024; i++)
> > > > > > +    if (e[i].i != i)
> > > > > > +      __builtin_abort ();
> > > > > > +  return 0;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > > > new file mode 100644
> > > > > > index 0000000..500251b
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > > > @@ -0,0 +1,39 @@
> > > > > > +/* { dg-do run { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > > +
> > > > > > +struct S { int i; };
> > > > > > +__attribute__((const, noinline, noclone))
> > > > > > +struct S foo (int x)
> > > > > > +{
> > > > > > +  struct S s;
> > > > > > +  s.i = x;
> > > > > > +  return s;
> > > > > > +}
> > > > > > +
> > > > > > +int a[2048], b[2048], c[2048], d[2048];
> > > > > > +struct S e[2048];
> > > > > > +
> > > > > > +__attribute__((noinline, noclone)) void
> > > > > > +bar (void)
> > > > > > +{
> > > > > > +  int i;
> > > > > > +  for (i = 0; i < 1024; i++)
> > > > > > +    {
> > > > > > +      e[i] = foo (i);
> > > > > > +      a[i+2] = a[i] + a[i+1];
> > > > > > +      b[10] = b[10] + i;
> > > > > > +      c[i] = c[2047 - i];
> > > > > > +      d[i] = d[i + 1];
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > > +int
> > > > > > +main ()
> > > > > > +{
> > > > > > +  int i;
> > > > > > +  bar ();
> > > > > > +  for (i = 0; i < 1024; i++)
> > > > > > +    if (e[i].i != i)
> > > > > > +      __builtin_abort ();
> > > > > > +  return 0;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > > > new file mode 100644
> > > > > > index 0000000..8b058e3
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > > > @@ -0,0 +1,21 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> > > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > > > new file mode 100644
> > > > > > index 0000000..d4eaaf7
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > > > @@ -0,0 +1,19 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > > > new file mode 100644
> > > > > > index 0000000..dd3bb90
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > > > @@ -0,0 +1,14 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > > +
> > > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > > > new file mode 100644
> > > > > > index 0000000..e2274f6
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > > > @@ -0,0 +1,14 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> > > > > > +
> > > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > > > new file mode 100644
> > > > > > index 0000000..7f5d153
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > > > @@ -0,0 +1,13 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > > > +
> > > > > > +int
> > > > > > +foo (int x)
> > > > > > +{
> > > > > > +  return x;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > > > new file mode 100644
> > > > > > index 0000000..fe13d2b
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > > > @@ -0,0 +1,13 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > > > > +
> > > > > > +float
> > > > > > +foo (float z, float y, float x)
> > > > > > +{
> > > > > > +  return x + y;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > > > new file mode 100644
> > > > > > index 0000000..205a532
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > > > @@ -0,0 +1,12 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > > > > +
> > > > > > +float
> > > > > > +foo (float z, float y, float x)
> > > > > > +{
> > > > > > +  return x;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > > > new file mode 100644
> > > > > > index 0000000..e046684
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > > > @@ -0,0 +1,19 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > > > new file mode 100644
> > > > > > index 0000000..4be8ff6
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > > > @@ -0,0 +1,23 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > > > > +
> > > > > > +float
> > > > > > +foo (float z, float y, float x)
> > > > > > +{
> > > > > > +  return x + y;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > > > new file mode 100644
> > > > > > index 0000000..0eb34e0
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > > > @@ -0,0 +1,14 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> > > > > > +
> > > > > > +__attribute__ ((zero_call_used_regs("used")))
> > > > > > +float
> > > > > > +foo (float z, float y, float x)
> > > > > > +{
> > > > > > +  return x + y;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > > > new file mode 100644
> > > > > > index 0000000..cbb63a4
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > > > @@ -0,0 +1,19 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > > > new file mode 100644
> > > > > > index 0000000..7573197
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > > > @@ -0,0 +1,19 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > > > new file mode 100644
> > > > > > index 0000000..de71223
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > > > @@ -0,0 +1,12 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > > > new file mode 100644
> > > > > > index 0000000..ccfa441
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > > > @@ -0,0 +1,14 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > > +
> > > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > > > new file mode 100644
> > > > > > index 0000000..6b46ca3
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > > > @@ -0,0 +1,20 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > > +
> > > > > > +__attribute__ ((zero_call_used_regs("all-gpr")))
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > > > new file mode 100644
> > > > > > index 0000000..0680f38
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > > > @@ -0,0 +1,14 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > > +
> > > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > > > new file mode 100644
> > > > > > index 0000000..534defa
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > > > @@ -0,0 +1,13 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > > > > +
> > > > > > +int
> > > > > > +foo (int x)
> > > > > > +{
> > > > > > +  return x;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > > > new file mode 100644
> > > > > > index 0000000..477bb19
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > > > @@ -0,0 +1,19 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > > +
> > > > > > +int
> > > > > > +foo (int x)
> > > > > > +{
> > > > > > +  return x;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > > > new file mode 100644
> > > > > > index 0000000..a305a60
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > > > @@ -0,0 +1,15 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > > +
> > > > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > > > > +
> > > > > > +int
> > > > > > +foo (int x)
> > > > > > +{
> > > > > > +  return x;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/toplev.c b/gcc/toplev.c
> > > > > > index 95eea63..01a1f24 100644
> > > > > > --- a/gcc/toplev.c
> > > > > > +++ b/gcc/toplev.c
> > > > > > @@ -1464,6 +1464,15 @@ process_options (void)
> > > > > >       }
> > > > > >    }
> > > > > >
> > > > > > +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> > > > > > +      && !targetm.calls.pro_epilogue_use)
> > > > > > +    {
> > > > > > +      error_at (UNKNOWN_LOCATION,
> > > > > > +             "%<-fzero-call-used-regs=%> is not supported for this "
> > > > > > +             "target");
> > > > > > +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> > > > > > +    }
> > > > > > +
> > > > > >  /* One region RA really helps to decrease the code size.  */
> > > > > >  if (flag_ira_region == IRA_REGION_AUTODETECT)
> > > > > >    flag_ira_region
> > > > > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > > > > > index 8c5a2e3..71badbd 100644
> > > > > > --- a/gcc/tree-core.h
> > > > > > +++ b/gcc/tree-core.h
> > > > > > @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
> > > > > > unsigned final : 1;
> > > > > > /* Belong to FUNCTION_DECL exclusively.  */
> > > > > > unsigned regdecl_flag : 1;
> > > > > > - /* 14 unused bits. */
> > > > > > +
> > > > > > + /* How to clear call-used registers upon function return.  */
> > > > > > + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> > > > > > +
> > > > > > + /* 11 unused bits.  */
> >
> > So instead of wasting "precious" bits please use lookup_attribute
> > in the single place you query this value (which is once per function).
> > There's no need to complicate matters by trying to maintain the above.
> >
> > > > > > };
> > > > > >
> > > > > > struct GTY(()) tree_var_decl {
> > > > > > diff --git a/gcc/tree.h b/gcc/tree.h
> > > > > > index cf546ed..d378a88 100644
> > > > > > --- a/gcc/tree.h
> > > > > > +++ b/gcc/tree.h
> > > > > > @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> > > > > > #define DECL_VISIBILITY(NODE) \
> > > > > >  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
> > > > > >
> > > > > > +/* Value of the function decl's type of zeroing the call used
> > > > > > +   registers upon return from function.  */
> > > > > > +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> > > > > > +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> > > > > > +
> > > > > > /* Nonzero means that the decl (or an enclosing scope) had its
> > > > > >   visibility specified rather than being inferred.  */
> > > > > > #define DECL_VISIBILITY_SPECIFIED(NODE) \
> > > > > > --
> > > > > > 1.9.1
> > > > >
> > > >
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
> 
> 
>
H.J. Lu Aug. 5, 2020, 12:26 p.m. UTC | #8
On Wed, Aug 5, 2020 at 12:06 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 4 Aug 2020, H.J. Lu wrote:
>
> > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> > >
> > > > Hi, Uros,
> > > >
> > > > Thanks a lot for your review on X86 parts.
> > > >
> > > > Hi, Richard,
> > > >
> > > > Could you please take a look at the middle-end part to see whether the
> > > > rewritten addressed your previous concern?
> > >
> > > I have a few comments below - I'm not sure I'm qualified to fully
> > > review the rest though.
> > >
> > > > Thanks a lot.
> > > >
> > > > Qing
> > > >
> > > >
> > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > >
> > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > > > > >
> > > > > >
> > > > > > Richard and Uros,
> > > > > >
> > > > > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > > > > >
> > > > > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.
> > > > > >
> > > > > > Thanks a lot for your time.
> > > > >
> > > > > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > > > >
> > > > > That said, x86 parts looks OK.
> > > > >
> > > > >
> > > >
> > > > > Uros.
> > > > > > Qing
> > > > > >
> > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > > >
> > > > > > > Hi, Gcc team,
> > > > > > >
> > > > > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > > > >
> > > > > > > From the previous round of discussion, the major issues raised were:
> > > > > > >
> > > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > > >
> > > > > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > > > >
> > > > > > > 1. Change the names of the option and attribute from
> > > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > to:
> > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > Add the new option and  new attribute in general.
> > > > > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > > > > 3. Add 4 target-hooks;
> > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > > > >
> > > > > > > The patch is as following:
> > > > > > >
> > > > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > > command-line option and
> > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > > > >
> > > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > > >
> > > > > > >  Don't zero call-used registers upon function return.
> > >
> > > Does a return via EH unwinding also constitute a function return?  I
> > > think you may want to have a finally handler or support in the unwinder
> > > for this?  Then there's abnormal return via longjmp & friends, I guess
> > > there's nothing that can be done there besides patching glibc?
> >
> > Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> > patch. Only normal returns are covered.
>
> What's the point then?  Also specifically thinking about spill slots.
>

The goal of this patch is to zero caller-saved registers upon normal
function return.  Abnormal returns and spill slots are outside of the
scope of this patch.
Richard Biener Aug. 5, 2020, 12:30 p.m. UTC | #9
On Wed, 5 Aug 2020, H.J. Lu wrote:

> On Wed, Aug 5, 2020 at 12:06 AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Tue, 4 Aug 2020, H.J. Lu wrote:
> >
> > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> > > >
> > > > > Hi, Uros,
> > > > >
> > > > > Thanks a lot for your review on X86 parts.
> > > > >
> > > > > Hi, Richard,
> > > > >
> > > > > Could you please take a look at the middle-end part to see whether the
> > > > > rewritten addressed your previous concern?
> > > >
> > > > I have a few comments below - I'm not sure I'm qualified to fully
> > > > review the rest though.
> > > >
> > > > > Thanks a lot.
> > > > >
> > > > > Qing
> > > > >
> > > > >
> > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > > > > > >
> > > > > > >
> > > > > > > Richard and Uros,
> > > > > > >
> > > > > > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > > > > > >
> > > > > > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.
> > > > > > >
> > > > > > > Thanks a lot for your time.
> > > > > >
> > > > > > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > > > > >
> > > > > > That said, x86 parts looks OK.
> > > > > >
> > > > > >
> > > > >
> > > > > > Uros.
> > > > > > > Qing
> > > > > > >
> > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > > > >
> > > > > > > > Hi, Gcc team,
> > > > > > > >
> > > > > > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > > > > >
> > > > > > > > From the previous round of discussion, the major issues raised were:
> > > > > > > >
> > > > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > > > >
> > > > > > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > > > > >
> > > > > > > > 1. Change the names of the option and attribute from
> > > > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > to:
> > > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > Add the new option and  new attribute in general.
> > > > > > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > > > > > 3. Add 4 target-hooks;
> > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > > > > >
> > > > > > > > The patch is as following:
> > > > > > > >
> > > > > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > > > command-line option and
> > > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > > > > >
> > > > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > > > >
> > > > > > > >  Don't zero call-used registers upon function return.
> > > >
> > > > Does a return via EH unwinding also constitute a function return?  I
> > > > think you may want to have a finally handler or support in the unwinder
> > > > for this?  Then there's abnormal return via longjmp & friends, I guess
> > > > there's nothing that can be done there besides patching glibc?
> > >
> > > Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> > > patch. Only normal returns are covered.
> >
> > What's the point then?  Also specifically thinking about spill slots.
> >
> 
> The goal of this patch is to zero caller-saved registers upon normal
> function return.  Abnormal returns and spill slots are outside of the
> scope of this patch.

Sure, I can write a patch that spills some regs, writes zeros to them
and then restores them.  And the patch will fulfil what it was designed
to do.

Still I need to come up with a reason that this is a useful feature
by its own for it to be accepted.

I am asking for that reason.  What's the reason for the "goal of this
patch"?  Why's that a useful goal on its own?

Richard.
H.J. Lu Aug. 5, 2020, 12:34 p.m. UTC | #10
On Wed, Aug 5, 2020 at 5:31 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Wed, 5 Aug 2020, H.J. Lu wrote:
>
> > On Wed, Aug 5, 2020 at 12:06 AM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Tue, 4 Aug 2020, H.J. Lu wrote:
> > >
> > > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> > > > >
> > > > > > Hi, Uros,
> > > > > >
> > > > > > Thanks a lot for your review on X86 parts.
> > > > > >
> > > > > > Hi, Richard,
> > > > > >
> > > > > > Could you please take a look at the middle-end part to see whether the
> > > > > > rewritten addressed your previous concern?
> > > > >
> > > > > I have a few comments below - I'm not sure I'm qualified to fully
> > > > > review the rest though.
> > > > >
> > > > > > Thanks a lot.
> > > > > >
> > > > > > Qing
> > > > > >
> > > > > >
> > > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > > > > > > >
> > > > > > > >
> > > > > > > > Richard and Uros,
> > > > > > > >
> > > > > > > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > > > > > > >
> > > > > > > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.
> > > > > > > >
> > > > > > > > Thanks a lot for your time.
> > > > > > >
> > > > > > > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > > > > > >
> > > > > > > That said, x86 parts looks OK.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > > Uros.
> > > > > > > > Qing
> > > > > > > >
> > > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > > > > >
> > > > > > > > > Hi, Gcc team,
> > > > > > > > >
> > > > > > > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > > > > > >
> > > > > > > > > From the previous round of discussion, the major issues raised were:
> > > > > > > > >
> > > > > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > > > > >
> > > > > > > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > > > > > >
> > > > > > > > > 1. Change the names of the option and attribute from
> > > > > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > > to:
> > > > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > > Add the new option and  new attribute in general.
> > > > > > > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > > > > > > 3. Add 4 target-hooks;
> > > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > > > > > >
> > > > > > > > > The patch is as following:
> > > > > > > > >
> > > > > > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > > > > command-line option and
> > > > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > > > > > >
> > > > > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > > > > >
> > > > > > > > >  Don't zero call-used registers upon function return.
> > > > >
> > > > > Does a return via EH unwinding also constitute a function return?  I
> > > > > think you may want to have a finally handler or support in the unwinder
> > > > > for this?  Then there's abnormal return via longjmp & friends, I guess
> > > > > there's nothing that can be done there besides patching glibc?
> > > >
> > > > Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> > > > patch. Only normal returns are covered.
> > >
> > > What's the point then?  Also specifically thinking about spill slots.
> > >
> >
> > The goal of this patch is to zero caller-saved registers upon normal
> > function return.  Abnormal returns and spill slots are outside of the
> > scope of this patch.
>
> Sure, I can write a patch that spills some regs, writes zeros to them
> and then restores them.  And the patch will fulfil what it was designed
> to do.
>
> Still I need to come up with a reason that this is a useful feature
> by its own for it to be accepted.
>
> I am asking for that reason.  What's the reason for the "goal of this
> patch"?  Why's that a useful goal on its own?
>

Hi Victor,

Can you provide some background information about how/why this feature
is used?

Thanks.
H.J. Lu Aug. 5, 2020, 2:45 p.m. UTC | #11
On Wed, Aug 5, 2020 at 5:34 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Wed, Aug 5, 2020 at 5:31 AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Wed, 5 Aug 2020, H.J. Lu wrote:
> >
> > > On Wed, Aug 5, 2020 at 12:06 AM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Tue, 4 Aug 2020, H.J. Lu wrote:
> > > >
> > > > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> > > > > >
> > > > > > > Hi, Uros,
> > > > > > >
> > > > > > > Thanks a lot for your review on X86 parts.
> > > > > > >
> > > > > > > Hi, Richard,
> > > > > > >
> > > > > > > Could you please take a look at the middle-end part to see whether the
> > > > > > > rewritten addressed your previous concern?
> > > > > >
> > > > > > I have a few comments below - I'm not sure I'm qualified to fully
> > > > > > review the rest though.
> > > > > >
> > > > > > > Thanks a lot.
> > > > > > >
> > > > > > > Qing
> > > > > > >
> > > > > > >
> > > > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Richard and Uros,
> > > > > > > > >
> > > > > > > > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > > > > > > > >
> > > > > > > > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.
> > > > > > > > >
> > > > > > > > > Thanks a lot for your time.
> > > > > > > >
> > > > > > > > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > > > > > > >
> > > > > > > > That said, x86 parts looks OK.
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > > Uros.
> > > > > > > > > Qing
> > > > > > > > >
> > > > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi, Gcc team,
> > > > > > > > > >
> > > > > > > > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > > > > > > >
> > > > > > > > > > From the previous round of discussion, the major issues raised were:
> > > > > > > > > >
> > > > > > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > > > > > >
> > > > > > > > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > > > > > > >
> > > > > > > > > > 1. Change the names of the option and attribute from
> > > > > > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > > > to:
> > > > > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > > > Add the new option and  new attribute in general.
> > > > > > > > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > > > > > > > 3. Add 4 target-hooks;
> > > > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > > > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > > > > > > >
> > > > > > > > > > The patch is as following:
> > > > > > > > > >
> > > > > > > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > > > > > command-line option and
> > > > > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > > > > > > >
> > > > > > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > > > > > >
> > > > > > > > > >  Don't zero call-used registers upon function return.
> > > > > >
> > > > > > Does a return via EH unwinding also constitute a function return?  I
> > > > > > think you may want to have a finally handler or support in the unwinder
> > > > > > for this?  Then there's abnormal return via longjmp & friends, I guess
> > > > > > there's nothing that can be done there besides patching glibc?
> > > > >
> > > > > Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> > > > > patch. Only normal returns are covered.
> > > >
> > > > What's the point then?  Also specifically thinking about spill slots.
> > > >
> > >
> > > The goal of this patch is to zero caller-saved registers upon normal
> > > function return.  Abnormal returns and spill slots are outside of the
> > > scope of this patch.
> >
> > Sure, I can write a patch that spills some regs, writes zeros to them
> > and then restores them.  And the patch will fulfil what it was designed
> > to do.
> >
> > Still I need to come up with a reason that this is a useful feature
> > by its own for it to be accepted.
> >
> > I am asking for that reason.  What's the reason for the "goal of this
> > patch"?  Why's that a useful goal on its own?
> >
>
> Hi Victor,
>
> Can you provide some background information about how/why this feature
> is used?
>

From The SECURE project and GCC in GCC Cauldron 2018:

Speaker: Graham Markall

The SECURE project is a 15 month program funded by Innovate UK, to
take well known security techniques from academia and make them
generally available in standard compilers, specfically GCC and LLVM.
An explicit objective is for those techniques to be incorporated in
the upstream versions of compilers. The Cauldron takes place in the
final month of the project and this talk will present the technical
details of some of the techniques implemented, and review those that
are yet to be implemented. A particular focus of this talk will be on
verifying that the implemetnation is correct, which can be a bigger
challenge than the implementation.

Techniques to be covered in the project include the following:

Stack and register erasure. Ensuring that on return from a function,
no data is left lying on the stack or in registers. Particular
challenges are in dealing with inlining, shrink wrapping and caching.

This patch implemens register erasure.
Qing Zhao Aug. 5, 2020, 3 p.m. UTC | #12
> On Aug 5, 2020, at 9:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> On Wed, Aug 5, 2020 at 5:34 AM H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>> wrote:
>> 
>> On Wed, Aug 5, 2020 at 5:31 AM Richard Biener <rguenther@suse.de> wrote:
>>> 
>>>>>>>>>>> 
>>>>>>>>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>>>>>>>>>>> command-line option and
>>>>>>>>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
>>>>>>>>>>> 
>>>>>>>>>>> 1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
>>>>>>>>>>> 
>>>>>>>>>>> Don't zero call-used registers upon function return.
>>>>>>> 
>>>>>>> Does a return via EH unwinding also constitute a function return?  I
>>>>>>> think you may want to have a finally handler or support in the unwinder
>>>>>>> for this?  Then there's abnormal return via longjmp & friends, I guess
>>>>>>> there's nothing that can be done there besides patching glibc?
>>>>>> 
>>>>>> Abnormal returns, like EH unwinding and longjmp, aren't covered by this
>>>>>> patch. Only normal returns are covered.
>>>>> 
>>>>> What's the point then?  Also specifically thinking about spill slots.
>>>>> 
>>>> 
>>>> The goal of this patch is to zero caller-saved registers upon normal
>>>> function return.  Abnormal returns and spill slots are outside of the
>>>> scope of this patch.
>>> 
>>> Sure, I can write a patch that spills some regs, writes zeros to them
>>> and then restores them.  And the patch will fulfil what it was designed
>>> to do.
>>> 
>>> Still I need to come up with a reason that this is a useful feature
>>> by its own for it to be accepted.
>>> 
>>> I am asking for that reason.  What's the reason for the "goal of this
>>> patch"?  Why's that a useful goal on its own?
>>> 
>> 
>> Hi Victor,
>> 
>> Can you provide some background information about how/why this feature
>> is used?
>> 
> 
> From The SECURE project and GCC in GCC Cauldron 2018:
> 
> Speaker: Graham Markall
> 
> The SECURE project is a 15 month program funded by Innovate UK, to
> take well known security techniques from academia and make them
> generally available in standard compilers, specfically GCC and LLVM.
> An explicit objective is for those techniques to be incorporated in
> the upstream versions of compilers. The Cauldron takes place in the
> final month of the project and this talk will present the technical
> details of some of the techniques implemented, and review those that
> are yet to be implemented. A particular focus of this talk will be on
> verifying that the implemetnation is correct, which can be a bigger
> challenge than the implementation.
> 
> Techniques to be covered in the project include the following:
> 
> Stack and register erasure. Ensuring that on return from a function,
> no data is left lying on the stack or in registers. Particular
> challenges are in dealing with inlining, shrink wrapping and caching.
> 
> This patch implemens register erasure.

In addition to the above, Victor mentioned a paper that can provide good background information
For this patch:

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

The abstract of this paper is:

"With the implementation of W ⊕ X security model on computer system, 
Return-Oriented Programming(ROP) has become the primary exploitation
 technique for adversaries. Although many solutions that defend against ROP 
exploits have been proposed, they still suffer from various shortcomings.
 In this paper, we propose a new way to mitigate ROP attacks that are based 
on return instructions. We clean the scratch registers which are also the
 parameter registers based on the features of ROP malicious code and calling 
convention. A prototype is implemented on x64-based Linux platform based on Pin.
 Preliminary experimental results show that our method can efficiently mitigate 
conventional ROP attacks."

Qing
 
> 
> 
> -- 
> H.J.
Richard Biener Aug. 5, 2020, 6:53 p.m. UTC | #13
On August 5, 2020 4:45:00 PM GMT+02:00, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>On Wed, Aug 5, 2020 at 5:34 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Wed, Aug 5, 2020 at 5:31 AM Richard Biener <rguenther@suse.de>
>wrote:
>> >
>> > On Wed, 5 Aug 2020, H.J. Lu wrote:
>> >
>> > > On Wed, Aug 5, 2020 at 12:06 AM Richard Biener
><rguenther@suse.de> wrote:
>> > > >
>> > > > On Tue, 4 Aug 2020, H.J. Lu wrote:
>> > > >
>> > > > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener
><rguenther@suse.de> wrote:
>> > > > > >
>> > > > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
>> > > > > >
>> > > > > > > Hi, Uros,
>> > > > > > >
>> > > > > > > Thanks a lot for your review on X86 parts.
>> > > > > > >
>> > > > > > > Hi, Richard,
>> > > > > > >
>> > > > > > > Could you please take a look at the middle-end part to
>see whether the
>> > > > > > > rewritten addressed your previous concern?
>> > > > > >
>> > > > > > I have a few comments below - I'm not sure I'm qualified to
>fully
>> > > > > > review the rest though.
>> > > > > >
>> > > > > > > Thanks a lot.
>> > > > > > >
>> > > > > > > Qing
>> > > > > > >
>> > > > > > >
>> > > > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak
><ubizjak@gmail.com> wrote:
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao
><QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Richard and Uros,
>> > > > > > > > >
>> > > > > > > > > Could you please review the change that H.J and I
>rewrote based on your comments in the previous round of discussion?
>> > > > > > > > >
>> > > > > > > > > This patch is a nice security enhancement for GCC
>that has been requested by security people for quite some time.
>> > > > > > > > >
>> > > > > > > > > Thanks a lot for your time.
>> > > > > > > >
>> > > > > > > > I'll be away from the keyboard for the next week, but
>the patch needs a middle end approval first.
>> > > > > > > >
>> > > > > > > > That said, x86 parts looks OK.
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > > > Uros.
>> > > > > > > > > Qing
>> > > > > > > > >
>> > > > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via
>Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>>
>wrote:
>> > > > > > > > > >
>> > > > > > > > > > Hi, Gcc team,
>> > > > > > > > > >
>> > > > > > > > > > This patch is a follow-up on the previous patch and
>corresponding discussion:
>> > > > > > > > > >
>https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html
><https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>
><https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html
><https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
>> > > > > > > > > >
>> > > > > > > > > > From the previous round of discussion, the major
>issues raised were:
>> > > > > > > > > >
>> > > > > > > > > > A. should be rewritten by using regsets
>infrastructure.
>> > > > > > > > > > B. Put the patch into middle-end instead of x86
>backend.
>> > > > > > > > > >
>> > > > > > > > > > This new patch is rewritten based on the above 2
>comments.  The major changes compared to the previous patch are:
>> > > > > > > > > >
>> > > > > > > > > > 1. Change the names of the option and attribute
>from
>> > > > > > > > > >
>-mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and
>zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
>> > > > > > > > > > to:
>> > > > > > > > > >
>-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and 
>zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
>> > > > > > > > > > Add the new option and  new attribute in general.
>> > > > > > > > > > 2. The main code generation part is moved from i386
>backend to middle-end;
>> > > > > > > > > > 3. Add 4 target-hooks;
>> > > > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
>> > > > > > > > > > 5. On a target that does not implement the target
>hook, issue error for the new option, issue warning for the new
>attribute.
>> > > > > > > > > >
>> > > > > > > > > > The patch is as following:
>> > > > > > > > > >
>> > > > > > > > > > [PATCH] Add
>-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>> > > > > > > > > > command-line option and
>> > > > > > > > > >
>zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function
>attribue:
>> > > > > > > > > >
>> > > > > > > > > >  1. -fzero-call-used-regs=skip and
>zero_call_used_regs("skip")
>> > > > > > > > > >
>> > > > > > > > > >  Don't zero call-used registers upon function
>return.
>> > > > > >
>> > > > > > Does a return via EH unwinding also constitute a function
>return?  I
>> > > > > > think you may want to have a finally handler or support in
>the unwinder
>> > > > > > for this?  Then there's abnormal return via longjmp &
>friends, I guess
>> > > > > > there's nothing that can be done there besides patching
>glibc?
>> > > > >
>> > > > > Abnormal returns, like EH unwinding and longjmp, aren't
>covered by this
>> > > > > patch. Only normal returns are covered.
>> > > >
>> > > > What's the point then?  Also specifically thinking about spill
>slots.
>> > > >
>> > >
>> > > The goal of this patch is to zero caller-saved registers upon
>normal
>> > > function return.  Abnormal returns and spill slots are outside of
>the
>> > > scope of this patch.
>> >
>> > Sure, I can write a patch that spills some regs, writes zeros to
>them
>> > and then restores them.  And the patch will fulfil what it was
>designed
>> > to do.
>> >
>> > Still I need to come up with a reason that this is a useful feature
>> > by its own for it to be accepted.
>> >
>> > I am asking for that reason.  What's the reason for the "goal of
>this
>> > patch"?  Why's that a useful goal on its own?
>> >
>>
>> Hi Victor,
>>
>> Can you provide some background information about how/why this
>feature
>> is used?
>>
>
>From The SECURE project and GCC in GCC Cauldron 2018:
>
>Speaker: Graham Markall
>
>The SECURE project is a 15 month program funded by Innovate UK, to
>take well known security techniques from academia and make them
>generally available in standard compilers, specfically GCC and LLVM.
>An explicit objective is for those techniques to be incorporated in
>the upstream versions of compilers. The Cauldron takes place in the
>final month of the project and this talk will present the technical
>details of some of the techniques implemented, and review those that
>are yet to be implemented. A particular focus of this talk will be on
>verifying that the implemetnation is correct, which can be a bigger
>challenge than the implementation.
>
>Techniques to be covered in the project include the following:
>
>Stack and register erasure. Ensuring that on return from a function,
>no data is left lying on the stack or in registers. Particular
>challenges are in dealing with inlining, shrink wrapping and caching.
>
>This patch implemens register erasure.

Part of it, yes. While I can see abnormal transfer of control is difficult exception handling is used too wide spread to be ignored. What's the plan there? 

So can we also see the other parts? In particular I wonder whether exposing just register clearing (in this fine-grained manner) is required and useful rather than thinking of a better interface for the whole thing?

Richard.
H.J. Lu Aug. 5, 2020, 7:08 p.m. UTC | #14
On Wed, Aug 5, 2020 at 11:53 AM Richard Biener <rguenther@suse.de> wrote:
>
> On August 5, 2020 4:45:00 PM GMT+02:00, "H.J. Lu" <hjl.tools@gmail.com> wrote:
> >On Wed, Aug 5, 2020 at 5:34 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Wed, Aug 5, 2020 at 5:31 AM Richard Biener <rguenther@suse.de>
> >wrote:
> >> >
> >> > On Wed, 5 Aug 2020, H.J. Lu wrote:
> >> >
> >> > > On Wed, Aug 5, 2020 at 12:06 AM Richard Biener
> ><rguenther@suse.de> wrote:
> >> > > >
> >> > > > On Tue, 4 Aug 2020, H.J. Lu wrote:
> >> > > >
> >> > > > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener
> ><rguenther@suse.de> wrote:
> >> > > > > >
> >> > > > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> >> > > > > >
> >> > > > > > > Hi, Uros,
> >> > > > > > >
> >> > > > > > > Thanks a lot for your review on X86 parts.
> >> > > > > > >
> >> > > > > > > Hi, Richard,
> >> > > > > > >
> >> > > > > > > Could you please take a look at the middle-end part to
> >see whether the
> >> > > > > > > rewritten addressed your previous concern?
> >> > > > > >
> >> > > > > > I have a few comments below - I'm not sure I'm qualified to
> >fully
> >> > > > > > review the rest though.
> >> > > > > >
> >> > > > > > > Thanks a lot.
> >> > > > > > >
> >> > > > > > > Qing
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak
> ><ubizjak@gmail.com> wrote:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao
> ><QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Richard and Uros,
> >> > > > > > > > >
> >> > > > > > > > > Could you please review the change that H.J and I
> >rewrote based on your comments in the previous round of discussion?
> >> > > > > > > > >
> >> > > > > > > > > This patch is a nice security enhancement for GCC
> >that has been requested by security people for quite some time.
> >> > > > > > > > >
> >> > > > > > > > > Thanks a lot for your time.
> >> > > > > > > >
> >> > > > > > > > I'll be away from the keyboard for the next week, but
> >the patch needs a middle end approval first.
> >> > > > > > > >
> >> > > > > > > > That said, x86 parts looks OK.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > > Uros.
> >> > > > > > > > > Qing
> >> > > > > > > > >
> >> > > > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via
> >Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>>
> >wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > Hi, Gcc team,
> >> > > > > > > > > >
> >> > > > > > > > > > This patch is a follow-up on the previous patch and
> >corresponding discussion:
> >> > > > > > > > > >
> >https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html
> ><https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>
> ><https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html
> ><https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> >> > > > > > > > > >
> >> > > > > > > > > > From the previous round of discussion, the major
> >issues raised were:
> >> > > > > > > > > >
> >> > > > > > > > > > A. should be rewritten by using regsets
> >infrastructure.
> >> > > > > > > > > > B. Put the patch into middle-end instead of x86
> >backend.
> >> > > > > > > > > >
> >> > > > > > > > > > This new patch is rewritten based on the above 2
> >comments.  The major changes compared to the previous patch are:
> >> > > > > > > > > >
> >> > > > > > > > > > 1. Change the names of the option and attribute
> >from
> >> > > > > > > > > >
> >-mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and
> >zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> >> > > > > > > > > > to:
> >> > > > > > > > > >
> >-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and
> >zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> >> > > > > > > > > > Add the new option and  new attribute in general.
> >> > > > > > > > > > 2. The main code generation part is moved from i386
> >backend to middle-end;
> >> > > > > > > > > > 3. Add 4 target-hooks;
> >> > > > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> >> > > > > > > > > > 5. On a target that does not implement the target
> >hook, issue error for the new option, issue warning for the new
> >attribute.
> >> > > > > > > > > >
> >> > > > > > > > > > The patch is as following:
> >> > > > > > > > > >
> >> > > > > > > > > > [PATCH] Add
> >-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> >> > > > > > > > > > command-line option and
> >> > > > > > > > > >
> >zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function
> >attribue:
> >> > > > > > > > > >
> >> > > > > > > > > >  1. -fzero-call-used-regs=skip and
> >zero_call_used_regs("skip")
> >> > > > > > > > > >
> >> > > > > > > > > >  Don't zero call-used registers upon function
> >return.
> >> > > > > >
> >> > > > > > Does a return via EH unwinding also constitute a function
> >return?  I
> >> > > > > > think you may want to have a finally handler or support in
> >the unwinder
> >> > > > > > for this?  Then there's abnormal return via longjmp &
> >friends, I guess
> >> > > > > > there's nothing that can be done there besides patching
> >glibc?
> >> > > > >
> >> > > > > Abnormal returns, like EH unwinding and longjmp, aren't
> >covered by this
> >> > > > > patch. Only normal returns are covered.
> >> > > >
> >> > > > What's the point then?  Also specifically thinking about spill
> >slots.
> >> > > >
> >> > >
> >> > > The goal of this patch is to zero caller-saved registers upon
> >normal
> >> > > function return.  Abnormal returns and spill slots are outside of
> >the
> >> > > scope of this patch.
> >> >
> >> > Sure, I can write a patch that spills some regs, writes zeros to
> >them
> >> > and then restores them.  And the patch will fulfil what it was
> >designed
> >> > to do.
> >> >
> >> > Still I need to come up with a reason that this is a useful feature
> >> > by its own for it to be accepted.
> >> >
> >> > I am asking for that reason.  What's the reason for the "goal of
> >this
> >> > patch"?  Why's that a useful goal on its own?
> >> >
> >>
> >> Hi Victor,
> >>
> >> Can you provide some background information about how/why this
> >feature
> >> is used?
> >>
> >
> >From The SECURE project and GCC in GCC Cauldron 2018:
> >
> >Speaker: Graham Markall
> >
> >The SECURE project is a 15 month program funded by Innovate UK, to
> >take well known security techniques from academia and make them
> >generally available in standard compilers, specfically GCC and LLVM.
> >An explicit objective is for those techniques to be incorporated in
> >the upstream versions of compilers. The Cauldron takes place in the
> >final month of the project and this talk will present the technical
> >details of some of the techniques implemented, and review those that
> >are yet to be implemented. A particular focus of this talk will be on
> >verifying that the implemetnation is correct, which can be a bigger
> >challenge than the implementation.
> >
> >Techniques to be covered in the project include the following:
> >
> >Stack and register erasure. Ensuring that on return from a function,
> >no data is left lying on the stack or in registers. Particular
> >challenges are in dealing with inlining, shrink wrapping and caching.
> >
> >This patch implemens register erasure.
>
> Part of it, yes. While I can see abnormal transfer of control is difficult exception handling is used too wide spread to be ignored. What's the plan there?

The initial usage is in Linux kernel where user space EH isn't an issue.
Further improvement can be investigated later.

> So can we also see the other parts? In particular I wonder whether exposing just register clearing (in this fine-grained manner) is required and useful rather than thinking of a better interface for the whole thing?
>
> Richard.

This patch is for caller-saved registers only.  Stack temporaries aren't
covered by this.  We can simply clear the stack first before releasing it
for function return.
Qing Zhao Aug. 5, 2020, 8:22 p.m. UTC | #15
>> 
>> From The SECURE project and GCC in GCC Cauldron 2018:
>> 
>> Speaker: Graham Markall
>> 
>> The SECURE project is a 15 month program funded by Innovate UK, to
>> take well known security techniques from academia and make them
>> generally available in standard compilers, specfically GCC and LLVM.
>> An explicit objective is for those techniques to be incorporated in
>> the upstream versions of compilers. The Cauldron takes place in the
>> final month of the project and this talk will present the technical
>> details of some of the techniques implemented, and review those that
>> are yet to be implemented. A particular focus of this talk will be on
>> verifying that the implemetnation is correct, which can be a bigger
>> challenge than the implementation.
>> 
>> Techniques to be covered in the project include the following:
>> 
>> Stack and register erasure. Ensuring that on return from a function,
>> no data is left lying on the stack or in registers. Particular
>> challenges are in dealing with inlining, shrink wrapping and caching.
>> 
>> This patch implemens register erasure.
> 
> Part of it, yes. While I can see abnormal transfer of control is difficult exception handling is used too wide spread to be ignored. What's the plan there? 
> 
> So can we also see the other parts? In particular I wonder whether exposing just register clearing (in this fine-grained manner) is required and useful rather than thinking of a better interface for the whole thing?

You mean to provide an integrated interface for both stack and register erasure for security purpose? 

However, Is stack erasure at function return really a better idea than zero-init auto-variables in the beginning of the function?

We had some discussion with Kees Cook several weeks ago on the idea of stack erasure at function return, Kees provided the following comments:

"But back to why I don't think it's the right approach:

Based on the performance measurements of pattern-init and zero-init
in Clang, MSVC, and the kernel plugin, it's clear that adding these
initializations has measurable performance cost. Doing it at function
exit means performing large unconditional wipes. Doing it at function
entry means initializations can be dead-store eliminated and highly
optimized. Given the current debates on the measurable performance
difference between pattern and zero initialization (even in the face of
existing dead-store elimination), I would expect wipe-on-function-exit to
be outside the acceptable tolerance for performance impact. (Additionally,
we've seen negative cache effects on wiping memory when the CPU is done
using it, though this is more pronounced in heap wiping. Zeroing at
free is about twice as expensive as zeroing at free time due to cache
temporality. This is true for the stack as well, but it's not as high.)”

From my understanding, the major issue with stack erasure at function result is the big performance overhead,
And these performance overhead cannot be reduced with compiler optimizations since those 
additional wiping insns are inserted at the end of the routine.

Based on the previous discussion with Kees, I don’t think that stack erasure at function return is a good idea,  
Instead, we might provide an alternative approach:  zero/pattern init to auto-variables. (This functionality has
Been available in LLVM already)
This will be another patch we want to add to GCC for the security purpose in general. 

So, I think for the current patch, -fzero-call-used-regs should be good enough. 

Any comments?

Qing





> 
> Richard.
Qing Zhao Aug. 5, 2020, 9:35 p.m. UTC | #16
Hi, Richard,

Thanks a lot for your careful review and detailed comments.  


> On Aug 4, 2020, at 2:35 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> I have a few comments below - I'm not sure I'm qualified to fully
> review the rest though.

Could you let me know who will be the more qualified person to fully review the rest of middle-end change?

>>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>>>>> command-line option and
>>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
>>>>> 
>>>>> 1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
>>>>> 
>>>>> Don't zero call-used registers upon function return.
> 
> Does a return via EH unwinding also constitute a function return?  I
> think you may want to have a finally handler or support in the unwinder
> for this?  Then there's abnormal return via longjmp & friends, I guess
> there's nothing that can be done there besides patching glibc?
> 
> In general I am missing reasoning as why to use -fzero-call-used-regs=
> in the documentation, that is, what is the thread model and what are
> the guarantees?  Is there any point zeroing registers when spill slots
> are left populated with stale register contents?  How do I (and why
> would I want to?) ensure that there's no information leak from the
> implementation of 'foo' to their callers?  Do I need to compile all
> of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> or is it enough to annotate API boundaries I want to proptect with
> zero_call_used_regs("...")?
> 
> Again - what's the intended use (and how does it fulful anything useful
> for that case)?

The major question of the above is:  what’s the motivation of the whole patch?
H.J.Lu and I have replied this question in separated emails, let’s continue with
this high-level discussion in that thread. 


>>>>> @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
>>>>> return NULL_TREE;
>>>>> }
>>>>> 
>>>>> +/* Handle a "zero_call_used_regs" attribute; arguments as in
>>>>> +   struct attribute_spec.handler.  */
>>>>> +
>>>>> +static tree
>>>>> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
>>>>> +                                   int ARG_UNUSED (flags),
>>>>> +                                   bool *no_add_attris)
>>>>> +{
>>>>> +  tree decl = *node;
>>>>> +  tree id = TREE_VALUE (args);
>>>>> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
>>>>> +
>>>>> +  if (TREE_CODE (decl) != FUNCTION_DECL)
>>>>> +    {
>>>>> +      error_at (DECL_SOURCE_LOCATION (decl),
>>>>> +             "%qE attribute applies only to functions", name);
>>>>> +      *no_add_attris = true;
>>>>> +      return NULL_TREE;
>>>>> +    }
>>>>> +  else if (DECL_INITIAL (decl))
>>>>> +    {
>>>>> +      error_at (DECL_SOURCE_LOCATION (decl),
>>>>> +             "cannot set %qE attribute after definition", name);
> 
> Why's that?
This might not be needed, I will fix this in the next update.

>>>>> 
>>>>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
>>>>> index 81bd2ee..ded1880 100644
>>>>> --- a/gcc/c/c-decl.c
>>>>> +++ b/gcc/c/c-decl.c
>>>>> @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
>>>>>        DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
>>>>>      }
>>>>> 
>>>>> +      /* Merge the zero_call_used_regs_type information.  */
>>>>> +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
>>>>> +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
>>>>> +
> 
> If you need this (see below) then likely cp/* needs similar adjustment
> so do other places in the middle-end (function cloning, etc)

Will check this in cp/* and function cloning etc to see whether the copying and merging are needed in other
places.

Another thought, if I use “lookup_attribute” of the function decl instead of checking these bits as you suggested
later,  all these copying and merging might not be necessary anymore. I will check on that. 
> 
>>>>> 
>>>>> +
>>>>> +/* Emit a sequence of insns to zero the call-used-registers for the current
>>>>> + * function.  */
> 
> No '*' on the continuation line

Okay, will fix this.

>>>>> +
>>>>> +  /* This array holds the zero rtx with the correponding machine mode.  */
>>>>> +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
>>>>> +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
>>>>> +    zero_rtx[i] = NULL_RTX;
>>>>> +
>>>>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>>>> +    {
>>>>> +      if (!this_target_hard_regs->x_call_used_regs[regno])
> 
> Use if (!call_used_regs[regno])
Okay.

> 
>>>>> +     continue;
>>>>> +      if (fixed_regs[regno])
>>>>> +     continue;
>>>>> +      if (is_live_reg_at_exit (regno))
>>>>> +     continue;
> 
> How can a call-used reg be live at exit?

Yes, this might not be needed, I will double check on this.

> 
>>>>> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
>>>>> +     continue;
> 
> Why does the target need some extra say here?

Only target can decide which hard regs should be zeroed, and which hard regs are general purpose register. 

> 
>>>>> +      if (used_only && !df_regs_ever_live_p (regno))
> 
> So I suppose this does not include uses by callees of this function?

Yes, I think so. 
> 
>>>>> +     continue;
>>>>> +
>>>>> +      /* Now we can emit insn to zero this register.  */
>>>>> +      rtx reg, tmp;
>>>>> +
>>>>> +      machine_mode mode
>>>>> +     = targetm.calls.zero_call_used_regno_mode (regno,
>>>>> +                                                reg_raw_mode[regno]);
> 
> In what case does the target ever need to adjust this (we're dealing
> with hard-regs only?)?

For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.

> 
>>>>> +      if (mode == VOIDmode)
>>>>> +     continue;
>>>>> +      if (!have_regs_of_mode[mode])
>>>>> +     continue;
> 
> When does this happen?

This might be removed. I will check. 
> 
>>>>> +
>>>>> +      reg = gen_rtx_REG (mode, regno);
>>>>> +      if (zero_rtx[(int)mode] == NULL_RTX)
>>>>> +     {
>>>>> +       zero_rtx[(int)mode] = reg;
>>>>> +       tmp = gen_rtx_SET (reg, const0_rtx);
>>>>> +       emit_insn (tmp);
>>>>> +     }
>>>>> +      else
>>>>> +     emit_move_insn (reg, zero_rtx[(int)mode]);
> 
> Not sure but I think the canonical zero to use is CONST0_RTX (mode)
> but I may be wrong.  

You mean “const0_rtx” should be “CONST0_RTX(mode)”? 
I will check on this.

> I'd rather have the target be able to specify
> some special instruction for zeroing here.  Some may have
> multi-reg set instructions for example.  That said, can't we
> defer the actual zeroing to the target in full and only compute
> a hard-reg-set of to-be zerored registers here and pass that
> to a target hook?

For vector regs, we have already provided this interface with 

targetm.calls.zero_all_vector_registers (used_only)

For integer registers, do we need such target hook too? 
If so, yes, it might be better to let the target decide how to zero the registers.

If Not, the current design might be good enough, right?

> 
>>>>> +
>>>>> +      emit_insn (targetm.calls.pro_epilogue_use (reg));
>>>>> +    }
>>>>> +
>>>>> +  return;
>>>>> +}
>>>>> +
>>>>> +
>>>>> /* Return a sequence to be used as the epilogue for the current function,
>>>>>  or NULL.  */
>>>>> 
>>>>> @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
>>>>> 
>>>>> start_sequence ();
>>>>> emit_note (NOTE_INSN_EPILOGUE_BEG);
>>>>> +
>>>>> +  gen_call_used_regs_seq ();
>>>>> +
> 
> The caller eventually performs shrink-wrapping - are you sure that
> doesn't mess up things?

My understanding is, in the standard epilogue, there is no handling of “call-used” registers.  Therefore, shrink-wrapping will not impact
“call-used” registers as well. 
Our patch only handles call-used registers, so, there should be no any interaction between this patch and shrink-wrapping.

> 
>>>>> 
>>>>> +
>>>>> + /* How to clear call-used registers upon function return.  */
>>>>> + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
>>>>> +
>>>>> + /* 11 unused bits.  */
> 
> So instead of wasting "precious" bits please use lookup_attribute
> in the single place you query this value (which is once per function).
> There's no need to complicate matters by trying to maintain the above.

Thanks for the suggestion.
Yes, I will try to use lookup_attribute in function.c instead of adding these bits. That will save us these
precious space.

Thanks again.

Qing
Richard Biener Aug. 6, 2020, 8:31 a.m. UTC | #17
On Wed, 5 Aug 2020, Qing Zhao wrote:

> Hi, Richard,
> 
> Thanks a lot for your careful review and detailed comments.  
> 
> 
> > On Aug 4, 2020, at 2:35 AM, Richard Biener <rguenther@suse.de> wrote:
> > 
> > I have a few comments below - I'm not sure I'm qualified to fully
> > review the rest though.
> 
> Could you let me know who will be the more qualified person to fully review the rest of middle-end change?

Jeff might be, but with the intended purpose (ROP mitigation AFAIU)
it would be nice for other target maintainers to chime in (Segher for
power maybe) for the question below...

> >>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> >>>>> command-line option and
> >>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> >>>>> 
> >>>>> 1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> >>>>> 
> >>>>> Don't zero call-used registers upon function return.
> > 
> > Does a return via EH unwinding also constitute a function return?  I
> > think you may want to have a finally handler or support in the unwinder
> > for this?  Then there's abnormal return via longjmp & friends, I guess
> > there's nothing that can be done there besides patching glibc?
> > 
> > In general I am missing reasoning as why to use -fzero-call-used-regs=
> > in the documentation, that is, what is the thread model and what are
> > the guarantees?  Is there any point zeroing registers when spill slots
> > are left populated with stale register contents?  How do I (and why
> > would I want to?) ensure that there's no information leak from the
> > implementation of 'foo' to their callers?  Do I need to compile all
> > of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> > or is it enough to annotate API boundaries I want to proptect with
> > zero_call_used_regs("...")?
> > 
> > Again - what's the intended use (and how does it fulful anything useful
> > for that case)?
> 
> The major question of the above is:  what’s the motivation of the whole patch?
> H.J.Lu and I have replied this question in separated emails, let’s continue with
> this high-level discussion in that thread. 
> 
> 
> >>>>> @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
> >>>>> return NULL_TREE;
> >>>>> }
> >>>>> 
> >>>>> +/* Handle a "zero_call_used_regs" attribute; arguments as in
> >>>>> +   struct attribute_spec.handler.  */
> >>>>> +
> >>>>> +static tree
> >>>>> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> >>>>> +                                   int ARG_UNUSED (flags),
> >>>>> +                                   bool *no_add_attris)
> >>>>> +{
> >>>>> +  tree decl = *node;
> >>>>> +  tree id = TREE_VALUE (args);
> >>>>> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> >>>>> +
> >>>>> +  if (TREE_CODE (decl) != FUNCTION_DECL)
> >>>>> +    {
> >>>>> +      error_at (DECL_SOURCE_LOCATION (decl),
> >>>>> +             "%qE attribute applies only to functions", name);
> >>>>> +      *no_add_attris = true;
> >>>>> +      return NULL_TREE;
> >>>>> +    }
> >>>>> +  else if (DECL_INITIAL (decl))
> >>>>> +    {
> >>>>> +      error_at (DECL_SOURCE_LOCATION (decl),
> >>>>> +             "cannot set %qE attribute after definition", name);
> > 
> > Why's that?
> This might not be needed, I will fix this in the next update.
> 
> >>>>> 
> >>>>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> >>>>> index 81bd2ee..ded1880 100644
> >>>>> --- a/gcc/c/c-decl.c
> >>>>> +++ b/gcc/c/c-decl.c
> >>>>> @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
> >>>>>        DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> >>>>>      }
> >>>>> 
> >>>>> +      /* Merge the zero_call_used_regs_type information.  */
> >>>>> +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> >>>>> +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> >>>>> +
> > 
> > If you need this (see below) then likely cp/* needs similar adjustment
> > so do other places in the middle-end (function cloning, etc)
> 
> Will check this in cp/* and function cloning etc to see whether the copying and merging are needed in other
> places.
> 
> Another thought, if I use “lookup_attribute” of the function decl instead of checking these bits as you suggested
> later,  all these copying and merging might not be necessary anymore. I will check on that. 
> > 
> >>>>> 
> >>>>> +
> >>>>> +/* Emit a sequence of insns to zero the call-used-registers for the current
> >>>>> + * function.  */
> > 
> > No '*' on the continuation line
> 
> Okay, will fix this.
> 
> >>>>> +
> >>>>> +  /* This array holds the zero rtx with the correponding machine mode.  */
> >>>>> +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> >>>>> +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> >>>>> +    zero_rtx[i] = NULL_RTX;
> >>>>> +
> >>>>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> >>>>> +    {
> >>>>> +      if (!this_target_hard_regs->x_call_used_regs[regno])
> > 
> > Use if (!call_used_regs[regno])
> Okay.
> 
> > 
> >>>>> +     continue;
> >>>>> +      if (fixed_regs[regno])
> >>>>> +     continue;
> >>>>> +      if (is_live_reg_at_exit (regno))
> >>>>> +     continue;
> > 
> > How can a call-used reg be live at exit?
> 
> Yes, this might not be needed, I will double check on this.
> 
> > 
> >>>>> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> >>>>> +     continue;
> > 
> > Why does the target need some extra say here?
> 
> Only target can decide which hard regs should be zeroed, and which hard regs are general purpose register. 

I'm mostly questioning the plethora of target hooks added and whether
this details are a good granularity applying to more than just x86.
Did I suggest to compute a hardreg set that the middle-end says was
used and is not live and leave the rest to the target?

> > 
> >>>>> +      if (used_only && !df_regs_ever_live_p (regno))
> > 
> > So I suppose this does not include uses by callees of this function?
> 
> Yes, I think so. 
> > 
> >>>>> +     continue;
> >>>>> +
> >>>>> +      /* Now we can emit insn to zero this register.  */
> >>>>> +      rtx reg, tmp;
> >>>>> +
> >>>>> +      machine_mode mode
> >>>>> +     = targetm.calls.zero_call_used_regno_mode (regno,
> >>>>> +                                                reg_raw_mode[regno]);
> > 
> > In what case does the target ever need to adjust this (we're dealing
> > with hard-regs only?)?
> 
> For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.

That's an optimization, yes.

> > 
> >>>>> +      if (mode == VOIDmode)
> >>>>> +     continue;
> >>>>> +      if (!have_regs_of_mode[mode])
> >>>>> +     continue;
> > 
> > When does this happen?
> 
> This might be removed. I will check. 
> > 
> >>>>> +
> >>>>> +      reg = gen_rtx_REG (mode, regno);
> >>>>> +      if (zero_rtx[(int)mode] == NULL_RTX)
> >>>>> +     {
> >>>>> +       zero_rtx[(int)mode] = reg;
> >>>>> +       tmp = gen_rtx_SET (reg, const0_rtx);
> >>>>> +       emit_insn (tmp);
> >>>>> +     }
> >>>>> +      else
> >>>>> +     emit_move_insn (reg, zero_rtx[(int)mode]);
> > 
> > Not sure but I think the canonical zero to use is CONST0_RTX (mode)
> > but I may be wrong.  
> 
> You mean “const0_rtx” should be “CONST0_RTX(mode)”? 
> I will check on this.
> 
> > I'd rather have the target be able to specify
> > some special instruction for zeroing here.  Some may have
> > multi-reg set instructions for example.  That said, can't we
> > defer the actual zeroing to the target in full and only compute
> > a hard-reg-set of to-be zerored registers here and pass that
> > to a target hook?

Ah, I did.

> For vector regs, we have already provided this interface with 
> 
> targetm.calls.zero_all_vector_registers (used_only)
> 
> For integer registers, do we need such target hook too? 
> If so, yes, it might be better to let the target decide how to zero the registers.
> 
> If Not, the current design might be good enough, right?

But why not simplify it all to a single hook

  targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);

?

> > 
> >>>>> +
> >>>>> +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> >>>>> +    }
> >>>>> +
> >>>>> +  return;
> >>>>> +}
> >>>>> +
> >>>>> +
> >>>>> /* Return a sequence to be used as the epilogue for the current function,
> >>>>>  or NULL.  */
> >>>>> 
> >>>>> @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> >>>>> 
> >>>>> start_sequence ();
> >>>>> emit_note (NOTE_INSN_EPILOGUE_BEG);
> >>>>> +
> >>>>> +  gen_call_used_regs_seq ();
> >>>>> +
> > 
> > The caller eventually performs shrink-wrapping - are you sure that
> > doesn't mess up things?
> 
> My understanding is, in the standard epilogue, there is no handling of “call-used” registers.  Therefore, shrink-wrapping will not impact
> “call-used” registers as well. 
> Our patch only handles call-used registers, so, there should be no any interaction between this patch and shrink-wrapping.

I don't know (CCed Segher, he should eventually).

> > 
> >>>>> 
> >>>>> +
> >>>>> + /* How to clear call-used registers upon function return.  */
> >>>>> + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> >>>>> +
> >>>>> + /* 11 unused bits.  */
> > 
> > So instead of wasting "precious" bits please use lookup_attribute
> > in the single place you query this value (which is once per function).
> > There's no need to complicate matters by trying to maintain the above.
> 
> Thanks for the suggestion.
> Yes, I will try to use lookup_attribute in function.c instead of adding these bits. That will save us these
> precious space.

Yes, I think this will simplify the code.

Richard.

> Thanks again.
> 
> Qing
Richard Biener Aug. 6, 2020, 8:37 a.m. UTC | #18
On Wed, 5 Aug 2020, Qing Zhao wrote:

> >> 
> >> From The SECURE project and GCC in GCC Cauldron 2018:
> >> 
> >> Speaker: Graham Markall
> >> 
> >> The SECURE project is a 15 month program funded by Innovate UK, to
> >> take well known security techniques from academia and make them
> >> generally available in standard compilers, specfically GCC and LLVM.
> >> An explicit objective is for those techniques to be incorporated in
> >> the upstream versions of compilers. The Cauldron takes place in the
> >> final month of the project and this talk will present the technical
> >> details of some of the techniques implemented, and review those that
> >> are yet to be implemented. A particular focus of this talk will be on
> >> verifying that the implemetnation is correct, which can be a bigger
> >> challenge than the implementation.
> >> 
> >> Techniques to be covered in the project include the following:
> >> 
> >> Stack and register erasure. Ensuring that on return from a function,
> >> no data is left lying on the stack or in registers. Particular
> >> challenges are in dealing with inlining, shrink wrapping and caching.
> >> 
> >> This patch implemens register erasure.
> > 
> > Part of it, yes. While I can see abnormal transfer of control is difficult exception handling is used too wide spread to be ignored. What's the plan there? 
> > 
> > So can we also see the other parts? In particular I wonder whether exposing just register clearing (in this fine-grained manner) is required and useful rather than thinking of a better interface for the whole thing?
> 
> You mean to provide an integrated interface for both stack and register 
> erasure for security purpose?
> 
> However, Is stack erasure at function return really a better idea than 
> zero-init auto-variables in the beginning of the function?
> 
> We had some discussion with Kees Cook several weeks ago on the idea of 
> stack erasure at function return, Kees provided the following comments:
> 
> "But back to why I don't think it's the right approach:
> 
> Based on the performance measurements of pattern-init and zero-init
> in Clang, MSVC, and the kernel plugin, it's clear that adding these
> initializations has measurable performance cost. Doing it at function
> exit means performing large unconditional wipes. Doing it at function
> entry means initializations can be dead-store eliminated and highly
> optimized. Given the current debates on the measurable performance
> difference between pattern and zero initialization (even in the face of
> existing dead-store elimination), I would expect wipe-on-function-exit to
> be outside the acceptable tolerance for performance impact. (Additionally,
> we've seen negative cache effects on wiping memory when the CPU is done
> using it, though this is more pronounced in heap wiping. Zeroing at
> free is about twice as expensive as zeroing at free time due to cache
> temporality. This is true for the stack as well, but it's not as high.)”
> 
> From my understanding, the major issue with stack erasure at function 
> result is the big performance overhead, And these performance overhead 
> cannot be reduced with compiler optimizations since those additional 
> wiping insns are inserted at the end of the routine.
> 
> Based on the previous discussion with Kees, I don’t think that stack 
> erasure at function return is a good idea, Instead, we might provide an 
> alternative approach:  zero/pattern init to auto-variables. (This 
> functionality has Been available in LLVM already) This will be another 
> patch we want to add to GCC for the security purpose in general.
> 
> So, I think for the current patch, -fzero-call-used-regs should be good 
> enough.
> 
> Any comments?

OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
it sounded more like a mitigation against information leaks which
then would be highly incomplete w/o spill slot clearing.  Like
we had that discussion on secure erase of memory that should not
be DSEd.

This needs to be reflected in the documentation and eventually
the option naming?  Like -frop-protection=... similar in spirit
to how we have -fcf-protection=... (though that as well is supposed
to provide ROP mitigation).

I'm not very familiar with ROP [mitigation] techinques, so I'm no
longer questioning usefulness of this patch but leave that to others
(and thus final approval).  I'm continuing to question the plethora
of target hooks you add and will ask for better user-level documentation.

Richard.
Jakub Jelinek Aug. 6, 2020, 8:41 a.m. UTC | #19
On Thu, Aug 06, 2020 at 10:31:27AM +0200, Richard Biener wrote:
> > For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.
> 
> That's an optimization, yes.

But, does the code need to care?
If one compiles:
void
foo ()
{
  register unsigned long long a __asm ("rax");
  register unsigned long long b __asm ("rsi");
  register unsigned long long c __asm ("r8");
  register unsigned long long d __asm ("r9");
  a = 0;
  b = 0;
  c = 0;
  d = 0;
  asm volatile ("" : : "r" (a), "r" (b), "r" (c), "r" (d));
}
then the backend uses *movdi_xor patterns which are emitted
as xorl instructions (i.e. just 32-bit).  If you need to emit them
at a spot where the flags register is or might be live, then
*movdi_internal is used instead, but that one will also emit
a 32-bit movl $0, %r8d etc. instruction (because (const_int 0) is
zero extended 32-bit integer).

	Jakub
Uros Bizjak Aug. 6, 2020, 9:31 a.m. UTC | #20
On Thu, Aug 6, 2020 at 10:42 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Aug 06, 2020 at 10:31:27AM +0200, Richard Biener wrote:
> > > For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.
> >
> > That's an optimization, yes.
>
> But, does the code need to care?

No, because this is only an implementation detail. The RTL code should
still use DImode clears. These are emitted using 32bit insns,
implicitly zero-extended to 64bits, so in effect they implement DImode
clears.

Uros.

> If one compiles:
> void
> foo ()
> {
>   register unsigned long long a __asm ("rax");
>   register unsigned long long b __asm ("rsi");
>   register unsigned long long c __asm ("r8");
>   register unsigned long long d __asm ("r9");
>   a = 0;
>   b = 0;
>   c = 0;
>   d = 0;
>   asm volatile ("" : : "r" (a), "r" (b), "r" (c), "r" (d));
> }
> then the backend uses *movdi_xor patterns which are emitted
> as xorl instructions (i.e. just 32-bit).  If you need to emit them
> at a spot where the flags register is or might be live, then
> *movdi_internal is used instead, but that one will also emit
> a 32-bit movl $0, %r8d etc. instruction (because (const_int 0) is
> zero extended 32-bit integer).
>
>         Jakub
>
Qing Zhao Aug. 6, 2020, 2:56 p.m. UTC | #21
> On Aug 6, 2020, at 3:31 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On Wed, 5 Aug 2020, Qing Zhao wrote:
> 
>> Hi, Richard,
>> 
>> Thanks a lot for your careful review and detailed comments.  
>> 
>> 
>>> On Aug 4, 2020, at 2:35 AM, Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>> wrote:
>>> 
>>> I have a few comments below - I'm not sure I'm qualified to fully
>>> review the rest though.
>> 
>> Could you let me know who will be the more qualified person to fully review the rest of middle-end change?
> 
> Jeff might be, but with the intended purpose (ROP mitigation AFAIU)
> it would be nice for other target maintainers to chime in (Segher for
> power maybe) for the question below...
> 
>>>>>>> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
>>>>>>> +     continue;
>>> 
>>> Why does the target need some extra say here?
>> 
>> Only target can decide which hard regs should be zeroed, and which hard regs are general purpose register. 
> 
> I'm mostly questioning the plethora of target hooks added and whether
> this details are a good granularity applying to more than just x86.
> Did I suggest to compute a hardreg set that the middle-end says was
> used and is not live and leave the rest to the target?

Yes, I agree that there might be too much details exposed to middle-end in the current design. 

A single target hook as you suggested:
 targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);

Might be a cleaner design.


Thanks.

Qing
Qing Zhao Aug. 6, 2020, 3:45 p.m. UTC | #22
> On Aug 6, 2020, at 3:37 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On Wed, 5 Aug 2020, Qing Zhao wrote:
> 
>>>> 
>>>> From The SECURE project and GCC in GCC Cauldron 2018:
>>>> 
>>>> Speaker: Graham Markall
>>>> 
>>>> The SECURE project is a 15 month program funded by Innovate UK, to
>>>> take well known security techniques from academia and make them
>>>> generally available in standard compilers, specfically GCC and LLVM.
>>>> An explicit objective is for those techniques to be incorporated in
>>>> the upstream versions of compilers. The Cauldron takes place in the
>>>> final month of the project and this talk will present the technical
>>>> details of some of the techniques implemented, and review those that
>>>> are yet to be implemented. A particular focus of this talk will be on
>>>> verifying that the implemetnation is correct, which can be a bigger
>>>> challenge than the implementation.
>>>> 
>>>> Techniques to be covered in the project include the following:
>>>> 
>>>> Stack and register erasure. Ensuring that on return from a function,
>>>> no data is left lying on the stack or in registers. Particular
>>>> challenges are in dealing with inlining, shrink wrapping and caching.
>>>> 
>>>> This patch implemens register erasure.
>>> 
>>> Part of it, yes. While I can see abnormal transfer of control is difficult exception handling is used too wide spread to be ignored. What's the plan there? 
>>> 
>>> So can we also see the other parts? In particular I wonder whether exposing just register clearing (in this fine-grained manner) is required and useful rather than thinking of a better interface for the whole thing?
>> 
>> You mean to provide an integrated interface for both stack and register 
>> erasure for security purpose?
>> 
>> However, Is stack erasure at function return really a better idea than 
>> zero-init auto-variables in the beginning of the function?
>> 
>> We had some discussion with Kees Cook several weeks ago on the idea of 
>> stack erasure at function return, Kees provided the following comments:
>> 
>> "But back to why I don't think it's the right approach:
>> 
>> Based on the performance measurements of pattern-init and zero-init
>> in Clang, MSVC, and the kernel plugin, it's clear that adding these
>> initializations has measurable performance cost. Doing it at function
>> exit means performing large unconditional wipes. Doing it at function
>> entry means initializations can be dead-store eliminated and highly
>> optimized. Given the current debates on the measurable performance
>> difference between pattern and zero initialization (even in the face of
>> existing dead-store elimination), I would expect wipe-on-function-exit to
>> be outside the acceptable tolerance for performance impact. (Additionally,
>> we've seen negative cache effects on wiping memory when the CPU is done
>> using it, though this is more pronounced in heap wiping. Zeroing at
>> free is about twice as expensive as zeroing at free time due to cache
>> temporality. This is true for the stack as well, but it's not as high.)”
>> 
>> From my understanding, the major issue with stack erasure at function 
>> result is the big performance overhead, And these performance overhead 
>> cannot be reduced with compiler optimizations since those additional 
>> wiping insns are inserted at the end of the routine.
>> 
>> Based on the previous discussion with Kees, I don’t think that stack 
>> erasure at function return is a good idea, Instead, we might provide an 
>> alternative approach:  zero/pattern init to auto-variables. (This 
>> functionality has Been available in LLVM already) This will be another 
>> patch we want to add to GCC for the security purpose in general.
>> 
>> So, I think for the current patch, -fzero-call-used-regs should be good 
>> enough.
>> 
>> Any comments?
> 
> OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
> it sounded more like a mitigation against information leaks which
> then would be highly incomplete w/o spill slot clearing.

With the “spill slot clearing”, do you mean the “stack erasure” or something else?

From the paper 

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

The call-used registers are used by the ROP hackers as following:

"Based on the practical experience of reading and writing ROP code. we find the features of ROP attacks as follows.

First, the destination of using gadget chains in usual is performing system call or system function to perform 
malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary
 would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode can be executed directly
 instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. In upper 
example, the system call is number 59 which is “execve” system call.

Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 
architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks 
using system function such as “read” or “mprotect”, on x64 system, the register would still be used to 
pass parameters, as mentioned in subsection B and C.”

We can see that call-used registers might be used by the ROP hackers to pass parameters to the system call.
If compiler can clean these registers before routine “return", then ROP attack will be invalid. 


>  Like
> we had that discussion on secure erase of memory that should not
> be DSEd.
> 
> This needs to be reflected in the documentation and eventually
> the option naming?  Like -frop-protection=... similar in spirit
> to how we have -fcf-protection=... (though that as well is supposed
> to provide ROP mitigation).

How about -frop-mitigation=[skip|used-gpr|all-gpr|used|all]?
> 
> I'm not very familiar with ROP [mitigation] techinques, so I'm no
> longer questioning usefulness of this patch but leave that to others
> (and thus final approval).  I'm continuing to question the plethora
> of target hooks you add and will ask for better user-level documentation.

Will think this more and come up with a better user-level documentation .

thanks.

Qing
> 
> Richard.
Kees Cook Aug. 6, 2020, 8:45 p.m. UTC | #23
On Thu, Aug 06, 2020 at 10:37:43AM +0200, Richard Biener wrote:
> OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
> it sounded more like a mitigation against information leaks which
> then would be highly incomplete w/o spill slot clearing.  Like
> we had that discussion on secure erase of memory that should not
> be DSEd.

I've viewed stack erasure as separate from register clearing. The
"when" of stack erasure tends to define which things are being defended
against. If the stack is being erased on function entry, you're defending
against all the various "uninitialized" variable attacks (which can be
info exposures, flow control redirection, etc). If it's on function exit,
this is more aimed at avoiding stale data and minimizing what's available
during an attack (and it also provides similar "uninit" defenses, just
in a different way). And FWIW, past benchmarks on this appear to indicate
erase-on-entry is more cache-friendly.
Qing Zhao Aug. 6, 2020, 10:32 p.m. UTC | #24
Hi, Richard,


> On Aug 5, 2020, at 4:35 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
>> 
>>>>>> +     continue;
>>>>>> +      if (fixed_regs[regno])
>>>>>> +     continue;
>>>>>> +      if (is_live_reg_at_exit (regno))
>>>>>> +     continue;
>> 
>> How can a call-used reg be live at exit?
> 
> Yes, this might not be needed, I will double check on this.

Just double checked this. And it turned out that this condition cannot be deleted.

a call-used reg might be the register that holds the return value and return to caller (so it’s live at exit).
For example, the EAX register of i386 is a call-used register and at the same time, it is the register that holds the return value.

Hope this is clear.

thanks.

Qing
Segher Boessenkool Aug. 6, 2020, 11:37 p.m. UTC | #25
Hi!

On Thu, Aug 06, 2020 at 10:31:27AM +0200, Richard Biener wrote:
> Jeff might be, but with the intended purpose (ROP mitigation AFAIU)
> it would be nice for other target maintainers to chime in (Segher for
> power maybe) for the question below...

It would be nice if this described anywhere what the benefit of this is,
including actual hard numbers.  I only see it is very costly, and I see
no benefit whatsoever.

> > >>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > >>>>> command-line option and
> > >>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:

"call-used" is such a bad name.  "call-clobbered" is better already, but
"volatile" (over calls) is most obvious I think.

There are at least four different kinds of volatile registers:

1) Argument registers are volatile, on most ABIs.
2) The *linker* (or dynamic linker!) may insert code that needs some
   registers for itself;
3) Registers only used for scratch space;
4) Registers used for returning the function value.

And these can overlap, and differ per function.

> > > Again - what's the intended use (and how does it fulful anything useful
> > > for that case)?

Yes, exactly.

> > >>>>> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > >>>>> +     continue;
> > > 
> > > Why does the target need some extra say here?
> > 
> > Only target can decide which hard regs should be zeroed, and which hard regs are general purpose register. 
> 
> I'm mostly questioning the plethora of target hooks added and whether
> this details are a good granularity applying to more than just x86.
> Did I suggest to compute a hardreg set that the middle-end says was
> used and is not live and leave the rest to the target?

It probably would be much easier to just have the target do *all* of
this, in one hook, or maybe even in the existing epilogue stuff.  The
resulting binary code will be very slow no matter what, so this should
not matter much at all.

> > >>>>> +      machine_mode mode
> > >>>>> +     = targetm.calls.zero_call_used_regno_mode (regno,
> > >>>>> +                                                reg_raw_mode[regno]);
> > > 
> > > In what case does the target ever need to adjust this (we're dealing
> > > with hard-regs only?)?
> > 
> > For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.
> 
> That's an optimization, yes.

I gues what is meant here is that the usual x86-64 insns to clear the
low 32 bits of a register actually clear the whole register?  It is a
huge security leak otherwise.  And, the generic code has nothing to do
with this, define hooks that ask the target to clear stuff, instead?

> > >>>>> +      reg = gen_rtx_REG (mode, regno);
> > >>>>> +      if (zero_rtx[(int)mode] == NULL_RTX)
> > >>>>> +     {
> > >>>>> +       zero_rtx[(int)mode] = reg;
> > >>>>> +       tmp = gen_rtx_SET (reg, const0_rtx);
> > >>>>> +       emit_insn (tmp);
> > >>>>> +     }
> > >>>>> +      else
> > >>>>> +     emit_move_insn (reg, zero_rtx[(int)mode]);
> > > 
> > > Not sure but I think the canonical zero to use is CONST0_RTX (mode)
> > > but I may be wrong.  
> > 
> > You mean “const0_rtx” should be “CONST0_RTX(mode)”? 
> > I will check on this.

If it is a CONST_INT, you should use const0_rtx; otherwise,
CONST0_RTX (mode) .  I have no idea what zero_rtx is, but there is
const_tiny_rtx already, and you shouldn't use that directly either.

> But why not simplify it all to a single hook
> 
>   targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);
> 
> ?

Yeah.  With a much better name though (it should say what it is for, or
describe at a *high level* what it does).

> > >>>>> start_sequence ();
> > >>>>> emit_note (NOTE_INSN_EPILOGUE_BEG);
> > >>>>> +
> > >>>>> +  gen_call_used_regs_seq ();
> > >>>>> +
> > > 
> > > The caller eventually performs shrink-wrapping - are you sure that
> > > doesn't mess up things?
> > 
> > My understanding is, in the standard epilogue, there is no handling of “call-used” registers.  Therefore, shrink-wrapping will not impact
> > “call-used” registers as well. 
> > Our patch only handles call-used registers, so, there should be no any interaction between this patch and shrink-wrapping.
> 
> I don't know (CCed Segher, he should eventually).

Shrink-wrapping often deals with the non-volatile registers, so that
doesn't matter much for this patch series.  But the epilogue can use
some volatile registers as well, including to hold sensitive info.  And
of course everything is different if you use separate shrink-wrapping,
but that work is done already when you get here (so it is too late?)


Anyway.  This all needs a good description in the user manual (is there?
I couldn't find any), explaining what exactly it does (user-visible),
and when you would want to use it, etc.  We need that before we can
review anything else in this patch sanely.


Segher
Richard Biener Aug. 7, 2020, 6:21 a.m. UTC | #26
On Thu, 6 Aug 2020, Kees Cook wrote:

> On Thu, Aug 06, 2020 at 10:37:43AM +0200, Richard Biener wrote:
> > OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
> > it sounded more like a mitigation against information leaks which
> > then would be highly incomplete w/o spill slot clearing.  Like
> > we had that discussion on secure erase of memory that should not
> > be DSEd.
> 
> I've viewed stack erasure as separate from register clearing. The
> "when" of stack erasure tends to define which things are being defended
> against. If the stack is being erased on function entry, you're defending
> against all the various "uninitialized" variable attacks (which can be
> info exposures, flow control redirection, etc). If it's on function exit,
> this is more aimed at avoiding stale data and minimizing what's available
> during an attack (and it also provides similar "uninit" defenses, just
> in a different way). And FWIW, past benchmarks on this appear to indicate
> erase-on-entry is more cache-friendly.

So I originally thought this was about leaking security sensitive data
to callers and thus we want to define API entries to not leak any
data from callees other than via the ABI defined return values or
global memory the callee chooses to populate.  Clearing registers
not involved in returning data is one part but then contents of such
registers could also reside in spill slots which means you have to
clear those as well.  And yes, even local automatic variables of the
callee fall into the category and thus 'stack-erasure' would be
required.  To appropriately have such a "security boundary" at
function return you _do_ have to do the clearing at function return
though.

But it's a completely different topic and it seems the patch was
not intended to help the folks that also ask for "secure"_memset
the compiler isn't supposed to optimize away as dead.

Richard.
Alexandre Oliva Aug. 7, 2020, 1:20 p.m. UTC | #27
On Jul 28, 2020, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:

>> 2. The main code generation part is moved from i386 backend to middle-end;
>> 3. Add 4 target-hooks;
>> 4. Implement these 4 target-hooks on i386 backend. 
>> 5. On a target that does not implement the target hook, issue error

I wonder...  How important is it that the registers be zeroed, rather
than just avoid leaking internal state from the function?

It occurred to me that we could implement this in an entirely
machine-independent way by just arranging for the option to change the
calling conventions for all registers that are not used by return to be
regarded as call-saved.  Then the prologue logic would save the incoming
value of the registers, and the epilogue would restore them, and we're
all set.  It might even cover propagation of exceptions out of the
function.


Even if zeroing registers is desirable, it might still be possible to
build upon the above to do that in a machine-independent fashion, using
the annotations used to output call frame info to identify the slots in
which the to-be-zeroed registers were saved, and store zeros there,
either by modifying the save insns, or by adding extra stores to the end
of the prologue, at least as a default implementation for a target hook,
that could be overridden with something that does the job in more
efficient but target-specific ways.
Qing Zhao Aug. 7, 2020, 4:06 p.m. UTC | #28
Hi, Segher,

Thanks for your comments.

> On Aug 6, 2020, at 6:37 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Thu, Aug 06, 2020 at 10:31:27AM +0200, Richard Biener wrote:
>> Jeff might be, but with the intended purpose (ROP mitigation AFAIU)
>> it would be nice for other target maintainers to chime in (Segher for
>> power maybe) for the question below...
> 
> It would be nice if this described anywhere what the benefit of this is,
> including actual hard numbers.  I only see it is very costly, and I see
> no benefit whatsoever.

I will add the motivation of this patch clearly in the next patch update. 
Here, for your reference, As I mentioned in other emails you might miss,
From my understanding (I am not a security expert though), this patch should serve two purpose:

1. Erase the registers upon return to avoid information leak;
2. ROP mitigation, for details on this, please refer to paper:

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

From the above paper, The call-used registers are used by the ROP hackers as following:

"Based on the practical experience of reading and writing ROP code. we find the features of ROP attacks as follows.

First, the destination of using gadget chains in usual is performing system call or system function to perform 
malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary
would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode can be executed directly
instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. In upper 
example, the system call is number 59 which is “execve” system call.

Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 
architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks 
using system function such as “read” or “mprotect”, on x64 system, the register would still be used to 
pass parameters, as mentioned in subsection B and C.”

We can see that call-used registers might be used by the ROP hackers to pass parameters to the system call.
If compiler can clean these registers before routine “return", then ROP attack will be invalid. 

Yes, there will be performance overhead from adding these register wiping insns. However, it’s necessary to
add overhead for security purpose.
Of course, on the other hand, We need to consider to minimize the performance overhead in our implementation. 


> 
>>>>>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>>>>>>>> command-line option and
>>>>>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> 
> "call-used" is such a bad name.  "call-clobbered" is better already, but
> "volatile" (over calls) is most obvious I think.

In our GCC compiler source code, we used the name “call-used” a lot, of course, “call-clobbered” is
also used frequently.  Do these names refer to the same set of registers, i.e, the register set that  
will be corrupted by function call?
If so, I am okay with name “call-clobbered” if this name sounds better. 

> 
> There are at least four different kinds of volatile registers:
> 
> 1) Argument registers are volatile, on most ABIs.
These are the registers that  need to be cleaned up upon function return for the ROP mitigation described in the paper
mentioned above.

> 2) The *linker* (or dynamic linker!) may insert code that needs some
>   registers for itself;
> 3) Registers only used for scratch space;
> 4) Registers used for returning the function value.

I think that the above 1,3,4 should be all covered by “call_used_regs”. 

Not sure about 2, could you explain a little bit more on 2 (The linker may insert code that needs some register for itself)? 

> 
> And these can overlap, and differ per function.
> 
>>>> Again - what's the intended use (and how does it fulful anything useful
>>>> for that case)?
> 
> Yes, exactly.
Please see my responds in the beginning. 

> 
>>>>>>>> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
>>>>>>>> +     continue;
>>>> 
>>>> Why does the target need some extra say here?
>>> 
>>> Only target can decide which hard regs should be zeroed, and which hard regs are general purpose register. 
>> 
>> I'm mostly questioning the plethora of target hooks added and whether
>> this details are a good granularity applying to more than just x86.
>> Did I suggest to compute a hardreg set that the middle-end says was
>> used and is not live and leave the rest to the target?
> 
> It probably would be much easier to just have the target do *all* of
> this, in one hook, or maybe even in the existing epilogue stuff.  The
> resulting binary code will be very slow no matter what, so this should
> not matter much at all.

I have agreed that moving the zeroing regs part entirely to target. Middle-end will only compute a hard regs set that need to be
zeroed and pass it to target.

> 
>>>>>>>> +      machine_mode mode
>>>>>>>> +     = targetm.calls.zero_call_used_regno_mode (regno,
>>>>>>>> +                                                reg_raw_mode[regno]);
>>>> 
>>>> In what case does the target ever need to adjust this (we're dealing
>>>> with hard-regs only?)?
>>> 
>>> For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.
>> 
>> That's an optimization, yes.
> 
> I gues what is meant here is that the usual x86-64 insns to clear the
> low 32 bits of a register actually clear the whole register?

This is my understanding. H.J.Lu might provide better explanation if needed.

>  It is a
> huge security leak otherwise.  And, the generic code has nothing to do
> with this, define hooks that ask the target to clear stuff, instead?

Yes, I think that these kind of details are not good to be exposed to middle-end.

> 
>>>>>>>> +      reg = gen_rtx_REG (mode, regno);
>>>>>>>> +      if (zero_rtx[(int)mode] == NULL_RTX)
>>>>>>>> +     {
>>>>>>>> +       zero_rtx[(int)mode] = reg;
>>>>>>>> +       tmp = gen_rtx_SET (reg, const0_rtx);
>>>>>>>> +       emit_insn (tmp);
>>>>>>>> +     }
>>>>>>>> +      else
>>>>>>>> +     emit_move_insn (reg, zero_rtx[(int)mode]);
>>>> 
>>>> Not sure but I think the canonical zero to use is CONST0_RTX (mode)
>>>> but I may be wrong.  
>>> 
>>> You mean “const0_rtx” should be “CONST0_RTX(mode)”? 
>>> I will check on this.
> 
> If it is a CONST_INT, you should use const0_rtx; otherwise,
> CONST0_RTX (mode) .  I have no idea what zero_rtx is, but there is
> const_tiny_rtx already, and you shouldn't use that directly either.

Okay.

> 
>> But why not simplify it all to a single hook
>> 
>>  targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);
>> 
>> ?
> 
> Yeah.  With a much better name though (it should say what it is for, or
> describe at a *high level* what it does).
Okay.

> 
>>>>>>>> start_sequence ();
>>>>>>>> emit_note (NOTE_INSN_EPILOGUE_BEG);
>>>>>>>> +
>>>>>>>> +  gen_call_used_regs_seq ();
>>>>>>>> +
>>>> 
>>>> The caller eventually performs shrink-wrapping - are you sure that
>>>> doesn't mess up things?
>>> 
>>> My understanding is, in the standard epilogue, there is no handling of “call-used” registers.  Therefore, shrink-wrapping will not impact
>>> “call-used” registers as well. 
>>> Our patch only handles call-used registers, so, there should be no any interaction between this patch and shrink-wrapping.
>> 
>> I don't know (CCed Segher, he should eventually).
> 
> Shrink-wrapping often deals with the non-volatile registers, so that
> doesn't matter much for this patch series.

Yes, that was my understanding as well. 

>  But the epilogue can use
> some volatile registers as well, including to hold sensitive info.  And
> of course everything is different if you use separate shrink-wrapping,
> but that work is done already when you get here (so it is too late?)

Could you please explain this part a little bit more?

> 
> 
> Anyway.  This all needs a good description in the user manual (is there?
> I couldn't find any), explaining what exactly it does (user-visible),
> and when you would want to use it, etc.  We need that before we can
> review anything else in this patch sanely.
Will do.

Qing
> 
> 
> Segher
Qing Zhao Aug. 7, 2020, 4:15 p.m. UTC | #29
> On Aug 7, 2020, at 1:21 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On Thu, 6 Aug 2020, Kees Cook wrote:
> 
>> On Thu, Aug 06, 2020 at 10:37:43AM +0200, Richard Biener wrote:
>>> OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
>>> it sounded more like a mitigation against information leaks which
>>> then would be highly incomplete w/o spill slot clearing.  Like
>>> we had that discussion on secure erase of memory that should not
>>> be DSEd.
>> 
>> I've viewed stack erasure as separate from register clearing. The
>> "when" of stack erasure tends to define which things are being defended
>> against. If the stack is being erased on function entry, you're defending
>> against all the various "uninitialized" variable attacks (which can be
>> info exposures, flow control redirection, etc). If it's on function exit,
>> this is more aimed at avoiding stale data and minimizing what's available
>> during an attack (and it also provides similar "uninit" defenses, just
>> in a different way). And FWIW, past benchmarks on this appear to indicate
>> erase-on-entry is more cache-friendly.
> 
> So I originally thought this was about leaking security sensitive data
> to callers and thus we want to define API entries to not leak any
> data from callees other than via the ABI defined return values or
> global memory the callee chooses to populate.  Clearing registers
> not involved in returning data is one part but then contents of such
> registers could also reside in spill slots which means you have to
> clear those as well.  And yes, even local automatic variables of the
> callee fall into the category and thus 'stack-erasure' would be
> required.  To appropriately have such a "security boundary" at
> function return you _do_ have to do the clearing at function return
> though.

In the following slides of The Secure Project and GCC:

https://gmarkall.files.wordpress.com/2018/09/secure_and_gcc.pdf <https://gmarkall.files.wordpress.com/2018/09/secure_and_gcc.pdf>

It  was mentioned that the the stack erase patch For GCC would be submitted to gcc upstream soon (in 2018).
I am wondering whether that patch has been submitted and discussed already?

Qing

> 
> But it's a completely different topic and it seems the patch was
> not intended to help the folks that also ask for "secure"_memset
> the compiler isn't supposed to optimize away as dead.
> 
> Richard.
Qing Zhao Aug. 7, 2020, 5:04 p.m. UTC | #30
Hi, Alexandre,

Thank you for the comments and suggestions.

> On Aug 7, 2020, at 8:20 AM, Alexandre Oliva <oliva@adacore.com> wrote:
> 
> On Jul 28, 2020, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
>>> 2. The main code generation part is moved from i386 backend to middle-end;
>>> 3. Add 4 target-hooks;
>>> 4. Implement these 4 target-hooks on i386 backend. 
>>> 5. On a target that does not implement the target hook, issue error
> 
> I wonder...  How important is it that the registers be zeroed, rather
> than just avoid leaking internal state from the function?

As I explained in other emails about the motivation of this patch:
 
From my understanding (I am not a security expert though), this patch should serve two purpose:

1. Erase the registers upon return to avoid information leak from the function;
2. ROP mitigation, for details on this, please refer to paper:

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

From the above paper, The call-used registers are used by the ROP hackers as following:

"Based on the practical experience of reading and writing ROP code. we find the features of ROP attacks as follows.

First, the destination of using gadget chains in usual is performing system call or system function to perform 
malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary
would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode can be executed directly
instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. In upper 
example, the system call is number 59 which is “execve” system call.

Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 
architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks 
using system function such as “read” or “mprotect”, on x64 system, the register would still be used to 
pass parameters, as mentioned in subsection B and C.”

We can see that call-used registers might be used by the ROP hackers to pass parameters to the system call.
If compiler can clean these registers before routine “return", then ROP attack will be invalid. 

So, I believe that the call-used registers (especially those registers that pass parameters) need to be zeroed
In order to mitigate the ROP attack. 

> 
> It occurred to me that we could implement this in an entirely
> machine-independent way by just arranging for the option to change the
> calling conventions for all registers that are not used by return to be
> regarded as call-saved.  Then the prologue logic would save the incoming
> value of the registers, and the epilogue would restore them, and we're
> all set.  It might even cover propagation of exceptions out of the
> function.
> 
The above approach will have the following two issues:
1. the performance overhead will double (because there will be both “save” and “restore” insns in the prologue and epilogue)
2. The ROP mitigation purpose cannot be addressed.

> 
> Even if zeroing registers is desirable, it might still be possible to
> build upon the above to do that in a machine-independent fashion, using
> the annotations used to output call frame info to identify the slots in
> which the to-be-zeroed registers were saved, and store zeros there,
> either by modifying the save insns, or by adding extra stores to the end
> of the prologue, at least as a default implementation for a target hook,
> that could be overridden with something that does the job in more
> efficient but target-specific ways.

One of the major thing we have to consider for the implementation of this patch is, 
minimizing the performance overhead as much as possible.

I think that moving how to zeroing the registers part to each target will be a better solution since each target has
Better idea on how to use the most efficient insns to do the work.

Thanks.

Qing

> 
> 
> -- 
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer
Segher Boessenkool Aug. 7, 2020, 10:59 p.m. UTC | #31
Hi!

On Fri, Aug 07, 2020 at 11:06:38AM -0500, Qing Zhao wrote:
> > It would be nice if this described anywhere what the benefit of this is,
> > including actual hard numbers.  I only see it is very costly, and I see
> > no benefit whatsoever.
> 
> I will add the motivation of this patch clearly in the next patch update. 
> Here, for your reference, As I mentioned in other emails you might miss,

Well, the GCC ML archive doesn't cross month boundaries, so things are
hard to look up if I have deleted my own copy already :-(

> From my understanding (I am not a security expert though), this patch should serve two purpose:
> 
> 1. Erase the registers upon return to avoid information leak;

But only some of the registers.

> 2. ROP mitigation, for details on this, please refer to paper:
> 
> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
> 
> https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

Do you have a link to this that people can actually read?

> From the above paper, The call-used registers are used by the ROP hackers as following:
> 
> "Based on the practical experience of reading and writing ROP code. we find the features of ROP attacks as follows.
> 
> First, the destination of using gadget chains in usual is performing system call or system function to perform 
> malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary
> would like to disable W ⊕ X.

That makes things easier, for sure, but is just a nicety really.

> Because once W ⊕ X has been disabled, shellcode can be executed directly
> instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. In upper 
> example, the system call is number 59 which is “execve” system call.
> 
> Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 
> architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks 
> using system function such as “read” or “mprotect”, on x64 system, the register would still be used to 
> pass parameters, as mentioned in subsection B and C.”
> 
> We can see that call-used registers might be used by the ROP hackers to pass parameters to the system call.
> If compiler can clean these registers before routine “return", then ROP attack will be invalid. 

So the idea is that clearing (or otherwise interfering with) the registers
used for parameter passing makes making useful ROP chains harder?

> Yes, there will be performance overhead from adding these register wiping insns. However, it’s necessary to
> add overhead for security purpose.

The point is the balance between how expensive it is, vs. how much it
makes it harder to exploit the code.

But of course any user can make that judgment themselves.  For us it
mostly matters what the cost is to targets that use it, to targets that
do not use it, and to the generic code, vs. what value we give to our
users :-)

> > "call-used" is such a bad name.  "call-clobbered" is better already, but
> > "volatile" (over calls) is most obvious I think.
> 
> In our GCC compiler source code, we used the name “call-used” a lot, of course, “call-clobbered” is
> also used frequently.  Do these names refer to the same set of registers, i.e, the register set that  
> will be corrupted by function call?

Anything that isn't "call-saved" or "fixed" is called "call-used",
essentially.  (And the relation with "fixed" isn't always clear).

> If so, I am okay with name “call-clobbered” if this name sounds better. 

It's more obvious, at least to me.

> > There are at least four different kinds of volatile registers:
> > 
> > 1) Argument registers are volatile, on most ABIs.
> These are the registers that  need to be cleaned up upon function return for the ROP mitigation described in the paper
> mentioned above.
> 
> > 2) The *linker* (or dynamic linker!) may insert code that needs some
> >   registers for itself;
> > 3) Registers only used for scratch space;
> > 4) Registers used for returning the function value.
> 
> I think that the above 1,3,4 should be all covered by “call_used_regs”. 

1 and 4 are the *same* (or overlap) on most ABIs.  3 can be as well, it
depends what the compiler is allowed to do; normally, if the compiler
wants a register, the parameter passing regs are among the cheapest it
can use.

2 you cannot touch usefully at all, for your purposes.

> Not sure about 2, could you explain a little bit more on 2 (The linker may insert code that needs some register for itself)? 

Sure.  The linker can decide it needs to insert some code to restore a
"global pointer" or similar in the function return path (or anything
else -- it just has to follow the ABI, which the generic compiler does
not know enough about at all).

> I have agreed that moving the zeroing regs part entirely to target. Middle-end will only compute a hard regs set that need to be
> zeroed and pass it to target.

The registers you *want* to interfere with are the parameter passing
registers, minus the ones used for the return value of the current
function; not *all* call-clobbered registers.

The generic compiler does not have enough information at all to do this
as far as I can see, and it would fit much better to what the backend
does anyway?

> >  It is a
> > huge security leak otherwise.  And, the generic code has nothing to do
> > with this, define hooks that ask the target to clear stuff, instead?
> 
> Yes, I think that these kind of details are not good to be exposed to middle-end.

I think you should make a hook that just does the whole thing.  There is
nothing useful (or even correct) the generic code can do.  (The command
line flag to do this could be generic, and the hook to actually generate
the code for it as well of course, but other than that, there are so
many more differences between targets, subtargets, and OSes here, and
most of those not expressed anywhere else yet, that it doesn't seem
worth it to artificially make the generic code handle any of this.  For
comparison, pretty much all of the "normal" prologue/epilogue handling
is done in target code already).

> >> But why not simplify it all to a single hook
> >> 
> >>  targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);
> >> 
> >> ?
> > 
> > Yeah.  With a much better name though (it should say what it is for, or
> > describe at a *high level* what it does).
> Okay.

So everything else I write here ius just a very long-winded way of
saying "Yes.  This." to this :-)

> >  But the epilogue can use
> > some volatile registers as well, including to hold sensitive info.  And
> > of course everything is different if you use separate shrink-wrapping,
> > but that work is done already when you get here (so it is too late?)
> 
> Could you please explain this part a little bit more?

For example, on PowerPC, to restore the return address we first have to
load it into a general purpose register (and then move it to LR).
Usually r0 is used, and r0 is call-clobbered (but not used for parameter
passing or return value passing).

The return address of course is very sensitive information (exposing any
return address makes ASLR useless immediately).  But this isn't in the
scope of this protection, I see.

Thanks for the explanations, much appreciated,


Segher
Qing Zhao Aug. 10, 2020, 4:34 p.m. UTC | #32
Hi, 

> On Aug 7, 2020, at 5:59 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
>> From my understanding (I am not a security expert though), this patch should serve two purpose:
>> 
>> 1. Erase the registers upon return to avoid information leak;
> 
> But only some of the registers.

All the call-used registers could be erased upon return with -fzero-call-used-regs=all.
> 
>> 2. ROP mitigation, for details on this, please refer to paper:
>> 
>> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
>> 
>> https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>
> 
> Do you have a link to this that people can actually read?

Sorry, I cannot find a free copy online. Looks like that I can only read the whole paper through ieee. ( I read the PDF file
through our company’s account).

> 
>> From the above paper, The call-used registers are used by the ROP hackers as following:
>> 
>> "Based on the practical experience of reading and writing ROP code. we find the features of ROP attacks as follows.
>> 
>> First, the destination of using gadget chains in usual is performing system call or system function to perform 
>> malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary
>> would like to disable W ⊕ X.
> 
> That makes things easier, for sure, but is just a nicety really.
> 
>> Because once W ⊕ X has been disabled, shellcode can be executed directly
>> instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. In upper 
>> example, the system call is number 59 which is “execve” system call.
>> 
>> Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 
>> architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks 
>> using system function such as “read” or “mprotect”, on x64 system, the register would still be used to 
>> pass parameters, as mentioned in subsection B and C.”
>> 
>> We can see that call-used registers might be used by the ROP hackers to pass parameters to the system call.
>> If compiler can clean these registers before routine “return", then ROP attack will be invalid. 
> 
> So the idea is that clearing (or otherwise interfering with) the registers
> used for parameter passing makes making useful ROP chains harder?

Yes, that’s my understanding.

> 
>> Yes, there will be performance overhead from adding these register wiping insns. However, it’s necessary to
>> add overhead for security purpose.
> 
> The point is the balance between how expensive it is, vs. how much it
> makes it harder to exploit the code.
> 
> But of course any user can make that judgment themselves.  For us it
> mostly matters what the cost is to targets that use it, to targets that
> do not use it, and to the generic code, vs. what value we give to our
> users :-)

We need to minimize the performance overhead during the implementation. 
At the same time, provide users options to minimize the overhead at the same time (for example the function level 
attribute, and the different level of zeros).

> 
>>> "call-used" is such a bad name.  "call-clobbered" is better already, but
>>> "volatile" (over calls) is most obvious I think.
>> 
>> In our GCC compiler source code, we used the name “call-used” a lot, of course, “call-clobbered” is
>> also used frequently.  Do these names refer to the same set of registers, i.e, the register set that  
>> will be corrupted by function call?
> 
> Anything that isn't "call-saved" or "fixed" is called "call-used",
> essentially.  (And the relation with "fixed" isn't always clear).
> 
>> If so, I am okay with name “call-clobbered” if this name sounds better. 
> 
> It's more obvious, at least to me.

Okay. 

> 
>>> There are at least four different kinds of volatile registers:
>>> 
>>> 1) Argument registers are volatile, on most ABIs.
>> These are the registers that  need to be cleaned up upon function return for the ROP mitigation described in the paper
>> mentioned above.
>> 
>>> 2) The *linker* (or dynamic linker!) may insert code that needs some
>>>  registers for itself;
>>> 3) Registers only used for scratch space;
>>> 4) Registers used for returning the function value.
>> 
>> I think that the above 1,3,4 should be all covered by “call_used_regs”. 
> 
> 1 and 4 are the *same* (or overlap) on most ABIs.  3 can be as well, it
> depends what the compiler is allowed to do; normally, if the compiler
> wants a register, the parameter passing regs are among the cheapest it
> can use.
So, are theyall covered by “call_used_reg” in GCC? 

> 2 you cannot touch usefully at all, for your purposes.
Okay.
> 
>> Not sure about 2, could you explain a little bit more on 2 (The linker may insert code that needs some register for itself)? 
> 
> Sure.  The linker can decide it needs to insert some code to restore a
> "global pointer" or similar in the function return path (or anything
> else -- it just has to follow the ABI, which the generic compiler does
> not know enough about at all).
Therefore, does the compiler know which registers with be needed by linker?

> 
>> I have agreed that moving the zeroing regs part entirely to target. Middle-end will only compute a hard regs set that need to be
>> zeroed and pass it to target.
> 
> The registers you *want* to interfere with are the parameter passing
> registers, minus the ones used for the return value of the current
> function; not *all* call-clobbered registers.

For the paper I mentioned, Yes, I agree with you. We only need to zero those registers that pass parameters. 
In addition to this purpose, shall we also consider the purpose of avoid information leaking through registers by erasing registers upon
function return?

> 
> The generic compiler does not have enough information at all to do this
> as far as I can see, and it would fit much better to what the backend
> does anyway?
You mean that the middle-end does not have enough information on which registers are passing parameters and which registers are returning
value? Only the back-ends have such information?

> 
>>> It is a
>>> huge security leak otherwise.  And, the generic code has nothing to do
>>> with this, define hooks that ask the target to clear stuff, instead?
>> 
>> Yes, I think that these kind of details are not good to be exposed to middle-end.
> 
> I think you should make a hook that just does the whole thing.  There is
> nothing useful (or even correct) the generic code can do.  (The command
> line flag to do this could be generic, and the hook to actually generate
> the code for it as well of course, but other than that, there are so
> many more differences between targets, subtargets, and OSes here, and
> most of those not expressed anywhere else yet, that it doesn't seem
> worth it to artificially make the generic code handle any of this.  For
> comparison, pretty much all of the "normal" prologue/epilogue handling
> is done in target code already).

Yes, agreed. 

> 
>>>> But why not simplify it all to a single hook
>>>> 
>>>> targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);
>>>> 
>>>> ?
>>> 
>>> Yeah.  With a much better name though (it should say what it is for, or
>>> describe at a *high level* what it does).
>> Okay.
> 
> So everything else I write here ius just a very long-winded way of
> saying "Yes.  This." to this :-)

Okay.

> 
>>> But the epilogue can use
>>> some volatile registers as well, including to hold sensitive info.  And
>>> of course everything is different if you use separate shrink-wrapping,
>>> but that work is done already when you get here (so it is too late?)
>> 
>> Could you please explain this part a little bit more?
> 
> For example, on PowerPC, to restore the return address we first have to
> load it into a general purpose register (and then move it to LR).
> Usually r0 is used, and r0 is call-clobbered (but not used for parameter
> passing or return value passing).
> 
> The return address of course is very sensitive information (exposing any
> return address makes ASLR useless immediately).  But this isn't in the
> scope of this protection, I see.

So, before returning, if we clean the content of r0, is it correct? Is it safer from the security point of view?

Thanks a lot for your info.

Qing
> 
> Thanks for the explanations, much appreciated,
> 
> 
> Segher
Qing Zhao Aug. 10, 2020, 7:51 p.m. UTC | #33
>> 
>>> If so, I am okay with name “call-clobbered” if this name sounds better. 
>> 
>> It's more obvious, at least to me.

In the current option list of GCC:  https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options <https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options>

There is one available option whose name is: -fcall-used-reg


-fcall-used-reg

Treat the register named reg as an allocable register that is clobbered by function calls. It may be allocated for temporaries or variables that do not live across a call. Functions compiled this way do not save and restore the register reg.

It is an error to use this flag with the frame pointer or stack pointer. Use of this flag for other registers that have fixed pervasive roles in the machine’s execution model produces disastrous results.

This flag does not have a negative form, because it specifies a three-way choice.

So, the name of this option adopted “call-used” instead of “call-clobbered”.  I think that for consistency, it might be still better to use “-fzero-call-used-regs” instead of “-fzero-call-clobbered-regs”?

Qing





> Okay. 
>
Alexandre Oliva Aug. 11, 2020, 2:39 a.m. UTC | #34
On Aug  7, 2020, Qing Zhao <QING.ZHAO@ORACLE.COM> wrote:

> So, I believe that the call-used registers (especially those registers
> that pass parameters) need to be zeroed
> In order to mitigate the ROP attack. 

Erhm, I don't get why it's important that they be zeroed.  It seems to
me that restoring their original values, or setting them to random
values, would be just as good defenses from having them set within the
function to perform a ROP attack than zeroing them.  The point is to get
rid of whatever value the attacker chose within the function.  One could
even argue that restoring the caller value is better than setting to
zero, because the result is not predictable from within the function.

OTOH, there's the flip side, that the function *could* be changed so as
to modify the stack slot in which the register is saved, if there's
hostile code running.  (it wouldn't be modified by "normal" code)

Code that sets the register to zero in the epilogue would be much harder
for an attacker to change indeed.


> I think that moving how to zeroing the registers part to each target
> will be a better solution since each target has
> Better idea on how to use the most efficient insns to do the work.

It's certainly good to allow machine-specific optimized code sequences,
but it would certainly be desirable to have a machine-independent
fallback.  It doesn't seem exceedingly hard to loop over the registers
and emit a (set (reg:M N) (const_int 0)) for each one that is to be
zeroed out.
Kees Cook Aug. 11, 2020, 5:57 a.m. UTC | #35
On Mon, Aug 10, 2020 at 11:39:26PM -0300, Alexandre Oliva wrote:
> Erhm, I don't get why it's important that they be zeroed.  It seems to
> me that restoring their original values, or setting them to random
> values, would be just as good defenses from having them set within the

In the performance analysis I looked at a while ago, doing the
register-self-xor is extremely fast to run (IIRC the cycle counts on x86
were absolutely tiny), and it's smaller for code size which minimized
the overall image footprint.

> [...]
> Code that sets the register to zero in the epilogue would be much harder
> for an attacker to change indeed.

Yes, a fixed value is a significantly better defensive position to take
for ROP. And specifically zero _tends_ to be the safest choice as it's
less "useful" to be used as a size, index, or pointer. And, no, it is
not perfect, but nothing can be if we're dealing with trying to defend
against arbitrary ROP gadget finding (or uninitialized stack contents,
where the same argument for "zero is best" also holds[1]).

-Kees

[1] https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html
Qing Zhao Aug. 11, 2020, 5:30 p.m. UTC | #36
Hi, Alexandre,

CC’ing Richard for his comments on this.


> On Aug 10, 2020, at 9:39 PM, Alexandre Oliva <oliva@adacore.com> wrote:
>> I think that moving how to zeroing the registers part to each target
>> will be a better solution since each target has
>> Better idea on how to use the most efficient insns to do the work.
> 
> It's certainly good to allow machine-specific optimized code sequences,
> but it would certainly be desirable to have a machine-independent
> fallback.  It doesn't seem exceedingly hard to loop over the registers
> and emit a (set (reg:M N) (const_int 0)) for each one that is to be
> zeroed out.

The current implementation already includes such machine-independent code, it should be very easy to add this.

Richard, what’s your opinion on this?
Do we need a machine-independent implementation to zeroing the registers for the default when the target does not provide a optimized
Implementation?

Thanks.

Qing

> 
>
Qing Zhao Aug. 19, 2020, 8:05 p.m. UTC | #37
Hi,

Based on all the previous discussion and more extensive study on ROP and its mitigation techniques these days, I came up with the following
High-level proposal as requested, please take a look and let me know what I should change in this high-level design:

> On Aug 6, 2020, at 6:37 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Anyway.  This all needs a good description in the user manual (is there?
> I couldn't find any), explaining what exactly it does (user-visible),
> and when you would want to use it, etc.  We need that before we can
> review anything else in this patch sanely.
> 
> 
> Segher

zeroing call-used registers for security purpose

8/19/2020
Qing Zhao
=========================================

**Motivation:
There are two purposes of this patch:

1. ROP mitigation:

ROP (Return-oriented programming, https://en.wikipedia.org/wiki/Return-oriented_programming) is 
one of the most popular code reuse attack technique, which executes gadget chains to perform malicious tasks.
A gadget is a carefully chosen machine instruction sequence that is already present in the machines' memory. 
Each gadget typically ends in a return instruction and is located in a subroutine within the existing program 
and/or shared library code.

There are two variations that use gadgets that end with indirect call (COP, Call Oriented Programming )
 and jump instruction (JOP, Jump-Oriented Programming). However, performing ROP without return 
instructions in reality is difficult because the gadgets of COP and JOP that can form a completed chain 
are almost nonexistent. 

As a result, gadgets based on return instructions remain the most popular.

One important feature of ROP attack is (Clean the Scratch Registers:A Way to Mitigate Return-Oriented
Programming Attacks https://ieeexplore.ieee.org/document/8445132):
the destination of using gadget chains usually call system functions to perform malicious behaviour,
on many of the mordern architectures, the registers would be used to pass parameters for those 
system functions.

So, cleaning the scratch registers that are used to pass parameters at return instructions should 
effectively mitigate ROP attack. 

2. Register Erasure:

In the SECURE project and GCC (https://gcc.gnu.org/wiki/cauldron2018#secure)

One of the well known security techniques is stack and register erasure. 
Ensuring that on return from a function, no data is left lying on the stack or in registers.

As mentioned in the slides (https://gmarkall.files.wordpress.com/2018/09/secure_and_gcc.pdf), 
there is a seperate project that tried to resolve the stack erasure problem. and the patch for
 stack erasure had been ready to submit. That specific patch does not handle register erasure problem. 

So, we will also address the register erasure problem with this patch along with the ROP mitigation. 

** Questions and Answers:

Q1. Which registers should be set to zeros at the return of the function?
A. the caller-saved, i.e, call-used, or call-clobbered registers.
   For ROP mitigation purpose, only the call-used registers that pass
parameters need to be zeroed. 
   For register erasure purpose, all the call-used registers might need to
be zeroed. we can provide multiple levels to user for controling the runtime
overhead. 

Q2. Why zeroing the registers other than randomalize them?
A. (From Kees Cook)
    In the performance analysis I looked at a while ago, doing the
register-self-xor is extremely fast to run (IIRC the cycle counts on x86
were absolutely tiny), and it's smaller for code size which minimized
the overall image footprint.
    a fixed value is a significantly better defensive position to take
for ROP. And specifically zero _tends_ to be the safest choice as it's
less "useful" to be used as a size, index, or pointer. And, no, it is
not perfect, but nothing can be if we're dealing with trying to defend
against arbitrary ROP gadget finding (or uninitialized stack contents,
where the same argument for "zero is best" also holds[1]).

-Kees
([1]https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html)

    So, from both run-time performance and code-size aspects, setting the
registers to zero is a better approach. 

** Proposal:

We will provide a new feature into GCC for the above security purposes. 

Add -fzero-call-used-regs=[skip|rop-mitigation|used-gpr|all-gpr|used|all] command-line option
and 
zero_call_used_regs("skip|used-arg-gpr|used-arg|arg|used-gpr|all-gpr|used|all") function attribues:

    1. -mzero-call-used-regs=skip and zero_call_used_regs("skip")

    Don't zero call-used registers upon function return. This is the default behavior.

    2. -mzero-call-used-regs=used-arg-gpr and zero_call_used_regs("used-arg-gpr")

    Zero used call-used general purpose registers that are used to pass parameters upon function return.
    
    3. -mzero-call-used-regs=used-arg and zero_call_used_regs("used-arg")

    Zero used call-used registers that are used to pass parameters upon function return.

    4. -mzero-call-used-regs=arg and zero_call_used_regs("arg")

    Zero all call-used registers that are used to pass parameters upon function return.

    5. -mzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")

    Zero used call-used general purpose registers upon function return.

    6. -mzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")

    Zero all call-used general purpose registers upon function return.

    7. -mzero-call-used-regs=used and zero_call_used_regs("used")

    Zero used call-used registers upon function return.

    8. -mzero-call-used-regs=all and zero_call_used_regs("all")

    Zero all call-used registers upon function return.


Zero call-used registers at function return to increase the program
security by either mitigating Return-Oriented Programming (ROP) or 
preventing information leak through registers.  

@samp{skip}, which is the default, doesn't zero call-used registers. 

@samp{used-arg-gpr} zeros used call-used general purpose registers that 
pass parameters. @samp{used-arg} zeros used call-used registers that 
pass parameters. @samp{arg} zeros all call-used registers that pass
parameters. These 3 choices are used for ROP mitigation. 

@samp{used-gpr} zeros call-used general purpose registers 
which are used in function.  @samp{all-gpr} zeros all
call-used registers.  @samp{used} zeros call-used registers which
are used in function.  @samp{all} zeros all call-used registers. 
These 4 choices are used for preventing information leak through 
registers. 

You can control this behavior for a specific function by using the function
attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
Segher Boessenkool Aug. 19, 2020, 10:57 p.m. UTC | #38
Hi!

On Wed, Aug 19, 2020 at 03:05:36PM -0500, Qing Zhao wrote:
> So, cleaning the scratch registers that are used to pass parameters at return instructions should 
> effectively mitigate ROP attack. 

But that is *very* expensive, in general.  Instead of doing just a
return instruction (which effectively costs 0 cycles, and is just one
insn), you now have to zero all call-clobbered register at every return
(typically many returns per function, and you are talking 10+ registers
even if only considering the simple integer registers).

Numbers on how expensive this is (for what arch, in code size and in
execution time) would be useful.  If it is so expensive that no one will
use it, it helps security at most none at all :-(

> Q1. Which registers should be set to zeros at the return of the function?
> A. the caller-saved, i.e, call-used, or call-clobbered registers.
>    For ROP mitigation purpose, only the call-used registers that pass
> parameters need to be zeroed. 
>    For register erasure purpose, all the call-used registers might need to
> be zeroed. we can provide multiple levels to user for controling the runtime
> overhead. 

The call-clobbered regs are the only ones you *can* touch.  That does
not mean you should clear them all (it doesn't help much at all in some
cases).  Only the backend knows.

>     So, from both run-time performance and code-size aspects, setting the
> registers to zero is a better approach. 

From a security perspective, this isn't clear though.  But that is a lot
of extra research ;-)


Segher
Qing Zhao Aug. 19, 2020, 11:27 p.m. UTC | #39
> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Wed, Aug 19, 2020 at 03:05:36PM -0500, Qing Zhao wrote:
>> So, cleaning the scratch registers that are used to pass parameters at return instructions should 
>> effectively mitigate ROP attack. 
> 
> But that is *very* expensive, in general.  Instead of doing just a
> return instruction (which effectively costs 0 cycles, and is just one
> insn), you now have to zero all call-clobbered register at every return
> (typically many returns per function, and you are talking 10+ registers
> even if only considering the simple integer registers).

Yes, the run-time overhead and also the code-size overhead are major concerns. We should minimize the overhead
as much as we can during implementation. However, such overhead cannot be completely avoided for the security purpose. 

In order to reduce the overhead for the ROP mitigation, I added 3 new values for -fzero-call-used-regs=used-arg-grp|used-arg|arg

For “used-arg-grp”, we only zero the integer registers that are used in the routine and can pass parameters; this should provide ROP mitigation
with the minimum overhead. 

For “used-arg”, in addition to “used-arg-grp”, the other registers (for example, FP registers) that can pass parameters will be zeroed. But I am not
very sure whether this option is really needed in practical. 

For “arg”, in addition to “used-arg”, all registers that pass parameters will be zeroed. Same as “used-arg”, I am not very sure whether we need this option
Or not. 

> 
> Numbers on how expensive this is (for what arch, in code size and in
> execution time) would be useful.  If it is so expensive that no one will
> use it, it helps security at most none at all :-(

CLEAR Linux project has been using a similar patch since GCC 8, the option it used is an equivalent to -fzero-call-used-regs=used-gpr.

-fzero-call-used-regs=used-arg-gpr in this new proposal will have smaller overhead than the one currently being used in CLEAR Linux.

Victor, do you have any data on the overhead of the option that currently is used by CLEAR project?

> 
>> Q1. Which registers should be set to zeros at the return of the function?
>> A. the caller-saved, i.e, call-used, or call-clobbered registers.
>>   For ROP mitigation purpose, only the call-used registers that pass
>> parameters need to be zeroed. 
>>   For register erasure purpose, all the call-used registers might need to
>> be zeroed. we can provide multiple levels to user for controling the runtime
>> overhead. 
> 
> The call-clobbered regs are the only ones you *can* touch.  That does
> not mean you should clear them all (it doesn't help much at all in some
> cases).  Only the backend knows.

I think that for ROP mitigation purpose, we only need to clear the call-used (i.e, call-clobbered) registers that are used in the current routine and
can pass parameters. 

But for preventing information leak from callee registers, we might need to clear all the call-used registers at return.


> 
>>    So, from both run-time performance and code-size aspects, setting the
>> registers to zero is a better approach. 
> 
> From a security perspective, this isn't clear though.  But that is a lot
> of extra research ;-)

There has been quite some discussion on this topic at

https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html <https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html>

From those old discussion, we can see that zero value should be good enough for the security purpose (though it’s not perfect).

Qing

> 
> 
> Segher
Richard Biener Aug. 24, 2020, 10:50 a.m. UTC | #40
On Tue, 11 Aug 2020, Qing Zhao wrote:

> Hi, Alexandre,
> 
> CC’ing Richard for his comments on this.
> 
> 
> > On Aug 10, 2020, at 9:39 PM, Alexandre Oliva <oliva@adacore.com> wrote:
> >> I think that moving how to zeroing the registers part to each target
> >> will be a better solution since each target has
> >> Better idea on how to use the most efficient insns to do the work.
> > 
> > It's certainly good to allow machine-specific optimized code sequences,
> > but it would certainly be desirable to have a machine-independent
> > fallback.  It doesn't seem exceedingly hard to loop over the registers
> > and emit a (set (reg:M N) (const_int 0)) for each one that is to be
> > zeroed out.
> 
> The current implementation already includes such machine-independent code, it should be very easy to add this.
> 
> Richard, what’s your opinion on this?
> Do we need a machine-independent implementation to zeroing the registers for the default when the target does not provide a optimized
> Implementation?

Well, at least silently doing nothing when the option is used would be 
bad.  So at least a diagnostic would be required.  Note since the
option is quite elaborate on what (sub-)set of regs is supposed to be
cleared I'm not sure an implementation not involving any target hook
is possible?

Richard.

> Thanks.
> 
> Qing
> 
> > 
> > 
> 
>
Li, Pan2 via Gcc-patches Aug. 24, 2020, 2:36 p.m. UTC | #41
-----Original Message-----
From: Segher Boessenkool <segher@kernel.crashing.org>
Date: Wednesday, August 19, 2020 at 5:58 PM
To: Qing Zhao <QING.ZHAO@ORACLE.COM>
Cc: Richard Biener <richard.guenther@gmail.com>, Jeff Law <law@redhat.com>, Uros Bizjak <ubizjak@gmail.com>, "H. J. Lu" <hjl.tools@gmail.com>, Jakub Jelinek <jakub@redhat.com>, GCC Patches <gcc-patches@gcc.gnu.org>, Kees Cook <keescook@chromium.org>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

    Hi!

    On Wed, Aug 19, 2020 at 03:05:36PM -0500, Qing Zhao wrote:
    > So, cleaning the scratch registers that are used to pass parameters at return instructions should 
    > effectively mitigate ROP attack. 

    But that is *very* expensive, in general.  Instead of doing just a
    return instruction (which effectively costs 0 cycles, and is just one
    insn), you now have to zero all call-clobbered register at every return
    (typically many returns per function, and you are talking 10+ registers
    even if only considering the simple integer registers).

    Numbers on how expensive this is (for what arch, in code size and in
    execution time) would be useful.  If it is so expensive that no one will
    use it, it helps security at most none at all :-(

It is used in some operating systems and packages such as 

https://github.com/clearlinux-pkgs/gettext/blob/master/gettext.spec#L138

export CFLAGS="$CFLAGS -O3 -ffat-lto-objects -flto=4 -fstack-protector-strong -mzero-caller-saved-regs=used "

There is no record that this flag creates a considerable penalty in execution time.

    > Q1. Which registers should be set to zeros at the return of the function?
    > A. the caller-saved, i.e, call-used, or call-clobbered registers.
    >    For ROP mitigation purpose, only the call-used registers that pass
    > parameters need to be zeroed. 
    >    For register erasure purpose, all the call-used registers might need to
    > be zeroed. we can provide multiple levels to user for controling the runtime
    > overhead. 

    The call-clobbered regs are the only ones you *can* touch.  That does
    not mean you should clear them all (it doesn't help much at all in some
    cases).  Only the backend knows.

    >     So, from both run-time performance and code-size aspects, setting the
    > registers to zero is a better approach. 

    From a security perspective, this isn't clear though.  But that is a lot
    of extra research ;-)

The paper from IEEE provide a clear example on how to use mzero-caller

I think the patch has a solid background and there are multiple projects that highlight the importance of cleaning as technique to prevent security issues in ROP attacks

Regards

Victor Rodriguez



    Segher
Li, Pan2 via Gcc-patches Aug. 24, 2020, 2:47 p.m. UTC | #42
From: Qing Zhao <QING.ZHAO@ORACLE.COM>
Date: Wednesday, August 19, 2020 at 6:28 PM
To: Segher Boessenkool <segher@kernel.crashing.org>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com>
Cc: Richard Biener <richard.guenther@gmail.com>, Jeff Law <law@redhat.com>, Uros Bizjak <ubizjak@gmail.com>, "H. J. Lu" <hjl.tools@gmail.com>, Jakub Jelinek <jakub@redhat.com>, GCC Patches <gcc-patches@gcc.gnu.org>, Kees Cook <keescook@chromium.org>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]




On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org<mailto:segher@kernel.crashing.org>> wrote:

Hi!

On Wed, Aug 19, 2020 at 03:05:36PM -0500, Qing Zhao wrote:

So, cleaning the scratch registers that are used to pass parameters at return instructions should
effectively mitigate ROP attack.

But that is *very* expensive, in general.  Instead of doing just a
return instruction (which effectively costs 0 cycles, and is just one
insn), you now have to zero all call-clobbered register at every return
(typically many returns per function, and you are talking 10+ registers
even if only considering the simple integer registers).

Yes, the run-time overhead and also the code-size overhead are major concerns. We should minimize the overhead
as much as we can during implementation. However, such overhead cannot be completely avoided for the security purpose.

In order to reduce the overhead for the ROP mitigation, I added 3 new values for -fzero-call-used-regs=used-arg-grp|used-arg|arg

For “used-arg-grp”, we only zero the integer registers that are used in the routine and can pass parameters; this should provide ROP mitigation
with the minimum overhead.

For “used-arg”, in addition to “used-arg-grp”, the other registers (for example, FP registers) that can pass parameters will be zeroed. But I am not
very sure whether this option is really needed in practical.

For “arg”, in addition to “used-arg”, all registers that pass parameters will be zeroed. Same as “used-arg”, I am not very sure whether we need this option
Or not.


Numbers on how expensive this is (for what arch, in code size and in
execution time) would be useful.  If it is so expensive that no one will
use it, it helps security at most none at all :-(

CLEAR Linux project has been using a similar patch since GCC 8, the option it used is an equivalent to -fzero-call-used-regs=used-gpr.

-fzero-call-used-regs=used-arg-gpr in this new proposal will have smaller overhead than the one currently being used in CLEAR Linux.

Victor, do you have any data on the overhead of the option that currently is used by CLEAR project?


This is a quick list of packages compiled with similar flag as you mention

https://gist.github.com/bryteise/f3469f318e82c626d20a83f557d897a2

The spec files can be located at:

https://github.com/clearlinux-pkgs

I don’t have any data on the overhead, the patch as you mention was implemented since GCC8 (2018) . The distro has been measure by community since then. I was looking for any major drop detected by community after this patches but I was not able to find it.

Maybe it will be worth to ask in the Clear Linux community project mailing list

Regards

Victor Rodriguez


Q1. Which registers should be set to zeros at the return of the function?
A. the caller-saved, i.e, call-used, or call-clobbered registers.
  For ROP mitigation purpose, only the call-used registers that pass
parameters need to be zeroed.
  For register erasure purpose, all the call-used registers might need to
be zeroed. we can provide multiple levels to user for controling the runtime
overhead.

The call-clobbered regs are the only ones you *can* touch.  That does
not mean you should clear them all (it doesn't help much at all in some
cases).  Only the backend knows.

I think that for ROP mitigation purpose, we only need to clear the call-used (i.e, call-clobbered) registers that are used in the current routine and
can pass parameters.

But for preventing information leak from callee registers, we might need to clear all the call-used registers at return.





   So, from both run-time performance and code-size aspects, setting the
registers to zero is a better approach.

From a security perspective, this isn't clear though.  But that is a lot
of extra research ;-)

There has been quite some discussion on this topic at

https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html

From those old discussion, we can see that zero value should be good enough for the security purpose (though it’s not perfect).

Qing



I saw the same discussion on latest ELC/OSSNA conference this year by LLVM community. The flag is getting a lot of attraction

Regards

Victor Rodriguez



Segher
Qing Zhao Aug. 24, 2020, 2:48 p.m. UTC | #43
> On Aug 24, 2020, at 5:50 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On Tue, 11 Aug 2020, Qing Zhao wrote:
> 
>> Hi, Alexandre,
>> 
>> CC’ing Richard for his comments on this.
>> 
>> 
>>> On Aug 10, 2020, at 9:39 PM, Alexandre Oliva <oliva@adacore.com> wrote:
>>>> I think that moving how to zeroing the registers part to each target
>>>> will be a better solution since each target has
>>>> Better idea on how to use the most efficient insns to do the work.
>>> 
>>> It's certainly good to allow machine-specific optimized code sequences,
>>> but it would certainly be desirable to have a machine-independent
>>> fallback.  It doesn't seem exceedingly hard to loop over the registers
>>> and emit a (set (reg:M N) (const_int 0)) for each one that is to be
>>> zeroed out.
>> 
>> The current implementation already includes such machine-independent code, it should be very easy to add this.
>> 
>> Richard, what’s your opinion on this?
>> Do we need a machine-independent implementation to zeroing the registers for the default when the target does not provide a optimized
>> Implementation?
> 
> Well, at least silently doing nothing when the option is used would be 
> bad.  So at least a diagnostic would be required.

Yes, this is the current behavior in the current implementation. 
>  Note since the
> option is quite elaborate on what (sub-)set of regs is supposed to be
> cleared I'm not sure an implementation not involving any target hook
> is possible?

Agreed.

Thanks 

Qing
> 
> Richard.
> 
>> Thanks.
>> 
>> Qing
>> 
>>> 
>>> 
>> 
>> 
> 
> -- 
> Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Segher Boessenkool Aug. 24, 2020, 5:49 p.m. UTC | #44
On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
> > On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > Numbers on how expensive this is (for what arch, in code size and in
> > execution time) would be useful.  If it is so expensive that no one will
> > use it, it helps security at most none at all :-(

Without numbers on this, no one can determine if it is a good tradeoff
for them.  And we (the GCC people) cannot know if it will be useful for
enough users that it will be worth the effort for us.  Which is why I
keep hammering on this point.

(The other side of the coin is how much this helps prevent exploitation;
numbers on that would be good to see, too.)

> >>    So, from both run-time performance and code-size aspects, setting the
> >> registers to zero is a better approach. 
> > 
> > From a security perspective, this isn't clear though.  But that is a lot
> > of extra research ;-)
> 
> There has been quite some discussion on this topic at
> 
> https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html <https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html>
> 
> From those old discussion, we can see that zero value should be good enough for the security purpose (though it’s not perfect).

And there has been zero proof or even any arguments from the security
angle, only "anything other than 0 is too expensive", which isn't
obviously true either (it isn't even cheaper than other small numbers,
on many archs).

A large fraction of function arguments is zero in valid executions, so
zeroing them out to try to prevent exploitation attempts might not help
so much.


Segher
Segher Boessenkool Aug. 24, 2020, 5:59 p.m. UTC | #45
[ Please quote correctly.  I fixed this up a bit. ]

On Mon, Aug 24, 2020 at 02:47:22PM +0000, Rodriguez Bahena, Victor wrote:
> > The call-clobbered regs are the only ones you *can* touch.  That does
> > not mean you should clear them all (it doesn't help much at all in some
> > cases).  Only the backend knows.
> 
> I think that for ROP mitigation purpose, we only need to clear the call-used (i.e, call-clobbered) registers that are used in the current routine and
> can pass parameters.

Which is more than you *can* do as well (consider return value registers
for example; there are more cases, in general; only the backend code can
know what is safe to do).


Segher
Qing Zhao Aug. 24, 2020, 6:02 p.m. UTC | #46
> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> Numbers on how expensive this is (for what arch, in code size and in
>>> execution time) would be useful.  If it is so expensive that no one will
>>> use it, it helps security at most none at all :-(
> 
> Without numbers on this, no one can determine if it is a good tradeoff
> for them.  And we (the GCC people) cannot know if it will be useful for
> enough users that it will be worth the effort for us.  Which is why I
> keep hammering on this point.
I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
For this testing? (Is CPU2017 good enough)?

> 
> (The other side of the coin is how much this helps prevent exploitation;
> numbers on that would be good to see, too.)

This can be well showed from the paper:

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

Please take a look at this paper. 

> 
>>>>   So, from both run-time performance and code-size aspects, setting the
>>>> registers to zero is a better approach. 
>>> 
>>> From a security perspective, this isn't clear though.  But that is a lot
>>> of extra research ;-)
>> 
>> There has been quite some discussion on this topic at
>> 
>> https://urldefense.com/v3/__https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html__;!!GqivPVa7Brio!PFjWvu3miQeS8XQehbw1moYxXTbbRvu9MTbjQxtxad_YQQGSdZg97Dl8-c2w5Y32$  <https://urldefense.com/v3/__https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html__;!!GqivPVa7Brio!PFjWvu3miQeS8XQehbw1moYxXTbbRvu9MTbjQxtxad_YQQGSdZg97Dl8-c2w5Y32$ >
>> 
>> From those old discussion, we can see that zero value should be good enough for the security purpose (though it’s not perfect).
> 
> And there has been zero proof or even any arguments from the security
> angle, only "anything other than 0 is too expensive", which isn't
> obviously true either (it isn't even cheaper than other small numbers,
> on many archs).
> 
> A large fraction of function arguments is zero in valid executions, so
> zeroing them out to try to prevent exploitation attempts might not help
> so much.

Please take a look at the paper:
"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

From the study, zeroing out the registers mitigate the ROP very well.

thanks.

Qing



> 
> 
> Segher
Qing Zhao Aug. 24, 2020, 6:48 p.m. UTC | #47
> On Aug 24, 2020, at 12:59 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> [ Please quote correctly.  I fixed this up a bit. ]
> 
> On Mon, Aug 24, 2020 at 02:47:22PM +0000, Rodriguez Bahena, Victor wrote:
>>> The call-clobbered regs are the only ones you *can* touch.  That does
>>> not mean you should clear them all (it doesn't help much at all in some
>>> cases).  Only the backend knows.
>> 
>> I think that for ROP mitigation purpose, we only need to clear the call-used (i.e, call-clobbered) registers that are used in the current routine and
>> can pass parameters.
> 
> Which is more than you *can* do as well (consider return value registers
> for example; there are more cases, in general; only the backend code can
> know what is safe to do).

Yes, So, we agreed to move the code generation implementation part into backend.

In Middle-end, we will only compute the hard register set based on call abi information and data flow information, also handle the command line option.

Qing
> 
> 
> Segher
Segher Boessenkool Aug. 24, 2020, 8:20 p.m. UTC | #48
Hi!

On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
> > On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
> >>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >>> Numbers on how expensive this is (for what arch, in code size and in
> >>> execution time) would be useful.  If it is so expensive that no one will
> >>> use it, it helps security at most none at all :-(
> > 
> > Without numbers on this, no one can determine if it is a good tradeoff
> > for them.  And we (the GCC people) cannot know if it will be useful for
> > enough users that it will be worth the effort for us.  Which is why I
> > keep hammering on this point.
> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
> For this testing? (Is CPU2017 good enough)?

I would use something more real-life, not 12 small pieces of code.

> > (The other side of the coin is how much this helps prevent exploitation;
> > numbers on that would be good to see, too.)
> 
> This can be well showed from the paper:
> 
> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
> 
> https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>
> 
> Please take a look at this paper. 

As I told you before, that isn't open information, I cannot reply to
any of that.


Segher
Segher Boessenkool Aug. 24, 2020, 8:26 p.m. UTC | #49
On Mon, Aug 24, 2020 at 01:48:02PM -0500, Qing Zhao wrote:
> 
> 
> > On Aug 24, 2020, at 12:59 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > 
> > [ Please quote correctly.  I fixed this up a bit. ]
> > 
> > On Mon, Aug 24, 2020 at 02:47:22PM +0000, Rodriguez Bahena, Victor wrote:
> >>> The call-clobbered regs are the only ones you *can* touch.  That does
> >>> not mean you should clear them all (it doesn't help much at all in some
> >>> cases).  Only the backend knows.
> >> 
> >> I think that for ROP mitigation purpose, we only need to clear the call-used (i.e, call-clobbered) registers that are used in the current routine and
> >> can pass parameters.
> > 
> > Which is more than you *can* do as well (consider return value registers
> > for example; there are more cases, in general; only the backend code can
> > know what is safe to do).
> 
> Yes, So, we agreed to move the code generation implementation part into backend.
> 
> In Middle-end, we will only compute the hard register set based on call abi information and data flow information, also handle the command line option.

You cannot in general figure out what registers you can clobber without
asking the backend.  You can figure out some that you *cannot* clobber,
but that isn't very useful.

Do you want to do this before or after the epilogue code is generated?


Segher
Qing Zhao Aug. 24, 2020, 8:43 p.m. UTC | #50
> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>> use it, it helps security at most none at all :-(
>>> 
>>> Without numbers on this, no one can determine if it is a good tradeoff
>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>> enough users that it will be worth the effort for us.  Which is why I
>>> keep hammering on this point.
>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>> For this testing? (Is CPU2017 good enough)?
> 
> I would use something more real-life, not 12 small pieces of code.

Then, what kind of real-life benchmark you are suggesting? 

> 
>>> (The other side of the coin is how much this helps prevent exploitation;
>>> numbers on that would be good to see, too.)
>> 
>> This can be well showed from the paper:
>> 
>> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
>> 
>> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$  <https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ >
>> 
>> Please take a look at this paper. 
> 
> As I told you before, that isn't open information, I cannot reply to
> any of that.

A little confused here, what’s you mean by “open information”? Is the information in a published paper not open information?

Qing
> 
> 
> Segher
Qing Zhao Aug. 24, 2020, 8:49 p.m. UTC | #51
> On Aug 24, 2020, at 3:26 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Mon, Aug 24, 2020 at 01:48:02PM -0500, Qing Zhao wrote:
>> 
>> 
>>> On Aug 24, 2020, at 12:59 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> 
>>> [ Please quote correctly.  I fixed this up a bit. ]
>>> 
>>> On Mon, Aug 24, 2020 at 02:47:22PM +0000, Rodriguez Bahena, Victor wrote:
>>>>> The call-clobbered regs are the only ones you *can* touch.  That does
>>>>> not mean you should clear them all (it doesn't help much at all in some
>>>>> cases).  Only the backend knows.
>>>> 
>>>> I think that for ROP mitigation purpose, we only need to clear the call-used (i.e, call-clobbered) registers that are used in the current routine and
>>>> can pass parameters.
>>> 
>>> Which is more than you *can* do as well (consider return value registers
>>> for example; there are more cases, in general; only the backend code can
>>> know what is safe to do).
>> 
>> Yes, So, we agreed to move the code generation implementation part into backend.
>> 
>> In Middle-end, we will only compute the hard register set based on call abi information and data flow information, also handle the command line option.
> 
> You cannot in general figure out what registers you can clobber without
> asking the backend.  You can figure out some that you *cannot* clobber,
> but that isn't very useful.
> 
> Do you want to do this before or after the epilogue code is generated?

static rtx_insn *
make_epilogue_seq (void)
{
  if (!targetm.have_epilogue ())
    return NULL;

  start_sequence ();
  emit_note (NOTE_INSN_EPILOGUE_BEG);

 +++++ gen_call_used_regs_seq ();                     // this is the place to emit the zeroing insn sequence

  rtx_insn *seq = targetm.gen_epilogue ();
…
}

Any comment on this?

thanks.

Qing




> 
> 
> Segher
Alexandre Oliva Aug. 25, 2020, 5:16 a.m. UTC | #52
On Aug 24, 2020, Richard Biener <rguenther@suse.de> wrote:

> since the option is quite elaborate on what (sub-)set of regs is
> supposed to be cleared I'm not sure an implementation not involving
> any target hook is possible?

I don't think this follows.  Machine-independent code has a pretty good
notion of what registers are call-saved or call-clobbered, which ones
could be changed in this regard for function-specific calling
conventions, which ones may be used by a function to hold its return
value, which ones are used within a function...

It *should* be possible to introduce this in machine-independent code,
emitting insns to set registers to zero and regarding them as holding
values to be returned from the function.  Machine-specific code could
use more efficient insns to get the same result, but I can't see good
reason to not have a generic fallback implementation with at least a
best-effort attempt to offer the desired feature.


Now, this is for the regular return path.  Is zeroing registers in
exception-propagation paths not relevant?

I thought it is, and I think we could have generic code that identifies
the registers that ought to be zeroed, issues CFI notes to get them
zeroed in the exception path, and requests a target hook to emit the
insns to zero them in the regular return path.
Uros Bizjak Aug. 25, 2020, 6:41 a.m. UTC | #53
On Mon, Aug 24, 2020 at 10:43 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> > On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >
> > Hi!
> >
> > On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
> >>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
> >>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >>>>> Numbers on how expensive this is (for what arch, in code size and in
> >>>>> execution time) would be useful.  If it is so expensive that no one will
> >>>>> use it, it helps security at most none at all :-(
> >>>
> >>> Without numbers on this, no one can determine if it is a good tradeoff
> >>> for them.  And we (the GCC people) cannot know if it will be useful for
> >>> enough users that it will be worth the effort for us.  Which is why I
> >>> keep hammering on this point.
> >> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
> >> For this testing? (Is CPU2017 good enough)?
> >
> > I would use something more real-life, not 12 small pieces of code.
>
> Then, what kind of real-life benchmark you are suggesting?
>
> >
> >>> (The other side of the coin is how much this helps prevent exploitation;
> >>> numbers on that would be good to see, too.)
> >>
> >> This can be well showed from the paper:
> >>
> >> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
> >>
> >> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$  <https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ >
> >>
> >> Please take a look at this paper.
> >
> > As I told you before, that isn't open information, I cannot reply to
> > any of that.
>
> A little confused here, what’s you mean by “open information”? Is the information in a published paper not open information?

No, because it is behind a paywall.

Uros.
Qing Zhao Aug. 25, 2020, 2:05 p.m. UTC | #54
> On Aug 25, 2020, at 1:41 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
>>> 
>>>>> (The other side of the coin is how much this helps prevent exploitation;
>>>>> numbers on that would be good to see, too.)
>>>> 
>>>> This can be well showed from the paper:
>>>> 
>>>> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
>>>> 
>>>> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ <https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ >
>>>> 
>>>> Please take a look at this paper.
>>> 
>>> As I told you before, that isn't open information, I cannot reply to
>>> any of that.
>> 
>> A little confused here, what’s you mean by “open information”? Is the information in a published paper not open information?
> 
> No, because it is behind a paywall.

Still don’t understand here:  this paper has been published in the proceeding of “ 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)”.
If you want to read the complete version online, you need to pay for it.

However, it’s still a published paper, and the information inside it should be “open information”. 

So, what’s the definition of “open information” you have?

I downloaded a PDF copy of this paper through my company’s paid account.  But I am not sure whether it’s legal for me to attach it to this mailing list?

Qing


> 
> Uros.
Li, Pan2 via Gcc-patches Aug. 25, 2020, 2:19 p.m. UTC | #55
On Tue, 2020-08-25 at 02:16 -0300, Alexandre Oliva wrote:
> On Aug 24, 2020, Richard Biener <rguenther@suse.de> wrote:
> 
> > since the option is quite elaborate on what (sub-)set of regs is
> > supposed to be cleared I'm not sure an implementation not involving
> > any target hook is possible?
> 
> I don't think this follows.  Machine-independent code has a pretty good
> notion of what registers are call-saved or call-clobbered, which ones
> could be changed in this regard for function-specific calling
> conventions, which ones may be used by a function to hold its return
> value, which ones are used within a function...
> 
> It *should* be possible to introduce this in machine-independent code,
> emitting insns to set registers to zero and regarding them as holding
> values to be returned from the function.  Machine-specific code could
> use more efficient insns to get the same result, but I can't see good
> reason to not have a generic fallback implementation with at least a
> best-effort attempt to offer the desired feature.
I think part of the problem here is you have to worry about stubs which can
change caller-saved registers.  Return path stubs aren't particularly common, but
they do exist -- 32 bit hpux for example :(

Jeff
Qing Zhao Aug. 25, 2020, 9:54 p.m. UTC | #56
> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>> use it, it helps security at most none at all :-(
>>> 
>>> Without numbers on this, no one can determine if it is a good tradeoff
>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>> enough users that it will be worth the effort for us.  Which is why I
>>> keep hammering on this point.
>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>> For this testing? (Is CPU2017 good enough)?
> 
> I would use something more real-life, not 12 small pieces of code.

There is some basic information about the benchmarks of CPU2017 in below link:

https://www.spec.org/cpu2017/Docs/overview.html#suites <https://www.spec.org/cpu2017/Docs/overview.html#suites>

GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And 526.blender_r is even larger than 502.gcc_r. 
And there are several other quite big benchmarks as well (perlbench, xalancbmk, parest, imagick, etc).

thanks.

Qing
Qing Zhao Aug. 25, 2020, 10:31 p.m. UTC | #57
> On Aug 25, 2020, at 9:05 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> 
>> On Aug 25, 2020, at 1:41 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> 
>>>> 
>>>>>> (The other side of the coin is how much this helps prevent exploitation;
>>>>>> numbers on that would be good to see, too.)
>>>>> 
>>>>> This can be well showed from the paper:
>>>>> 
>>>>> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
>>>>> 
>>>>> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ <https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ >
>>>>> 
>>>>> Please take a look at this paper.
>>>> 
>>>> As I told you before, that isn't open information, I cannot reply to
>>>> any of that.
>>> 
>>> A little confused here, what’s you mean by “open information”? Is the information in a published paper not open information?
>> 
>> No, because it is behind a paywall.
> 
> Still don’t understand here:  this paper has been published in the proceeding of “ 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)”.
> If you want to read the complete version online, you need to pay for it.
> 
> However, it’s still a published paper, and the information inside it should be “open information”. 
> 
> So, what’s the definition of “open information” you have?
> 
> I downloaded a PDF copy of this paper through my company’s paid account.  But I am not sure whether it’s legal for me to attach it to this mailing list?

After consulting, it turned out that I was not allowed to further forward the copy I downloaded through my company’s account to this alias. 
There is some more information on this paper online though:

https://www.semanticscholar.org/paper/Clean-the-Scratch-Registers:-A-Way-to-Mitigate-Rong-Xie/6f2ce4fd31baa0f6c02f9eb5c57b90d39fe5fa13

All the figures and tables in this paper are available in this link. 

In which, Figure 1 is an illustration  of a typical ROP attack, please pay special attention on the “Gadgets”, which are carefully chosen machine instruction sequences that are already present in the machine's memory, Each gadget typically ends in a return instruction and is located in a subroutine within the existing program and/or shared library code. Chained together, these gadgets allow an attacker to perform arbitrary operations on a machine employing defenses that thwart simpler attacks.

The paper identified the important features of ROP attack as following:

"First, the destination of using gadget chains in usual is performing system call or system fucntion to perform malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode can be executed directly instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. 

Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks using system function such as “read” or “mprotect”, on x64 system, the register would still be used to pass parameters, as mentioned in subsection B and C.”
As a result, the paper proposed the idea to zeroing scratch registers that pass parameters at the “return” insns to mitigate the ROP attack. 

Table III, Table IV and Table V are the results of “zeroing scratch register mitigate ROP attack”. From the tables, zeroing scratch registers can successfully mitigate the ROP on all those benchmarks. 

Table VI is the performance overhead of their implementation, it looks like very high, average 16.2X runtime overhead.  However, this implementation is not use compiler to statically generate zeroing sequence, instead, it used "dynamic binary instrumentation at runtime “ to check every instruction to 
1. Set/unset flags to check which scratch registers are used in the routine;
2. Whether the instruction is return instruction or not, if it’s is return, insert the zeroing used scratch register sequence before the “return” insn. 

Due to the above run-time dynamic instrumentation method, the high runtime overhead is expecting, I think.

If we use GCC to statically check the “used” information and add zeroing sequence before return insn, the run-time overhead will be much smaller. 

I will provide run-time overhead information with the 2nd version of the patch by using CPU2017 applications.

thanks.

Qing


> Qing
> 
> 
>> 
>> Uros.
Alexandre Oliva Aug. 26, 2020, 12:02 p.m. UTC | #58
On Aug 25, 2020, Jeff Law <law@redhat.com> wrote:

> On Tue, 2020-08-25 at 02:16 -0300, Alexandre Oliva wrote:
>> On Aug 24, 2020, Richard Biener <rguenther@suse.de> wrote:
>> 
>> > since the option is quite elaborate on what (sub-)set of regs is
>> > supposed to be cleared I'm not sure an implementation not involving
>> > any target hook is possible?
>> 
>> I don't think this follows.  Machine-independent code has a pretty good
>> notion of what registers are call-saved or call-clobbered, which ones
>> could be changed in this regard for function-specific calling
>> conventions, which ones may be used by a function to hold its return
>> value, which ones are used within a function...
>> 
>> It *should* be possible to introduce this in machine-independent code,
>> emitting insns to set registers to zero and regarding them as holding
>> values to be returned from the function.  Machine-specific code could
>> use more efficient insns to get the same result, but I can't see good
>> reason to not have a generic fallback implementation with at least a
>> best-effort attempt to offer the desired feature.
> I think part of the problem here is you have to worry about stubs which can
> change caller-saved registers.  Return path stubs aren't particularly common, but
> they do exist -- 32 bit hpux for example :(

This suggests that such targets might have to implement the
target-specific hook to deal with this, but does it detract in any way
from the notion of having generic code to fall back to on targets that
do NOT require any special handling?
Qing Zhao Aug. 26, 2020, 5:58 p.m. UTC | #59
> On Aug 26, 2020, at 7:02 AM, Alexandre Oliva <oliva@adacore.com> wrote:
> 
> On Aug 25, 2020, Jeff Law <law@redhat.com <mailto:law@redhat.com>> wrote:
> 
>> On Tue, 2020-08-25 at 02:16 -0300, Alexandre Oliva wrote:
>>> On Aug 24, 2020, Richard Biener <rguenther@suse.de> wrote:
>>> 
>>>> since the option is quite elaborate on what (sub-)set of regs is
>>>> supposed to be cleared I'm not sure an implementation not involving
>>>> any target hook is possible?
>>> 
>>> I don't think this follows.  Machine-independent code has a pretty good
>>> notion of what registers are call-saved or call-clobbered, which ones
>>> could be changed in this regard for function-specific calling
>>> conventions, which ones may be used by a function to hold its return
>>> value, which ones are used within a function...
>>> 
>>> It *should* be possible to introduce this in machine-independent code,
>>> emitting insns to set registers to zero and regarding them as holding
>>> values to be returned from the function.  Machine-specific code could
>>> use more efficient insns to get the same result, but I can't see good
>>> reason to not have a generic fallback implementation with at least a
>>> best-effort attempt to offer the desired feature.
>> I think part of the problem here is you have to worry about stubs which can
>> change caller-saved registers.  Return path stubs aren't particularly common, but
>> they do exist -- 32 bit hpux for example :(
> 
> This suggests that such targets might have to implement the
> target-specific hook to deal with this, but does it detract in any way
> from the notion of having generic code to fall back to on targets that
> do NOT require any special handling?

There are two issues I can see with adding a default generator in middle end:

1. In order to determine where a target should not use the generic code to emit the zeroing sequence, 
a new target hook to determine this has to be added;

2. In order to avoid the generated zeroing insns (which are simply insns that set registers) being deleted, 
We have to define a new insn “pro_epilogue_use” in the target. 
So, any target that want to use the default generator in middle end, must provide such a new target hook.

Based on the above 2, I don’t think that adding the default generator in middle end is a good idea.

Qing

> 
> -- 
> Alexandre Oliva, happy hacker
> https://urldefense.com/v3/__https://FSFLA.org/blogs/lxo/__;!!GqivPVa7Brio!Pee3_l4yYpNOUbnymMqrEM68oDGk-2Q3zebqLnQ255SX5go78t8Sq1RmM72wJP3a$ <https://urldefense.com/v3/__https://FSFLA.org/blogs/lxo/__;!!GqivPVa7Brio!Pee3_l4yYpNOUbnymMqrEM68oDGk-2Q3zebqLnQ255SX5go78t8Sq1RmM72wJP3a$> 
> Free Software Activist
> GNU Toolchain Engineer
Li, Pan2 via Gcc-patches Aug. 26, 2020, 6:36 p.m. UTC | #60
On Wed, 2020-08-26 at 09:02 -0300, Alexandre Oliva wrote:
> On Aug 25, 2020, Jeff Law <law@redhat.com> wrote:
> 
> > On Tue, 2020-08-25 at 02:16 -0300, Alexandre Oliva wrote:
> > > On Aug 24, 2020, Richard Biener <rguenther@suse.de> wrote:
> > > 
> > > > since the option is quite elaborate on what (sub-)set of regs is
> > > > supposed to be cleared I'm not sure an implementation not involving
> > > > any target hook is possible?
> > > 
> > > I don't think this follows.  Machine-independent code has a pretty good
> > > notion of what registers are call-saved or call-clobbered, which ones
> > > could be changed in this regard for function-specific calling
> > > conventions, which ones may be used by a function to hold its return
> > > value, which ones are used within a function...
> > > 
> > > It *should* be possible to introduce this in machine-independent code,
> > > emitting insns to set registers to zero and regarding them as holding
> > > values to be returned from the function.  Machine-specific code could
> > > use more efficient insns to get the same result, but I can't see good
> > > reason to not have a generic fallback implementation with at least a
> > > best-effort attempt to offer the desired feature.
> > I think part of the problem here is you have to worry about stubs which can
> > change caller-saved registers.  Return path stubs aren't particularly common, but
> > they do exist -- 32 bit hpux for example :(
> 
> This suggests that such targets might have to implement the
> target-specific hook to deal with this, but does it detract in any way
> from the notion of having generic code to fall back to on targets that
> do NOT require any special handling?
Agreed.  Sorry if I wasn't clear that generic code + a hook should be sufficient.

jeff
Alexandre Oliva Aug. 28, 2020, 7:47 a.m. UTC | #61
On Aug 26, 2020, Qing Zhao <qing.zhao@oracle.com> wrote:

> There are two issues I can see with adding a default generator in middle end:

> 1. In order to determine where a target should not use the generic
> code to emit the zeroing sequence,
> a new target hook to determine this has to be added;

Yeah, a target hook whose default is the generic code, and that targets
that need it, or that benefit from it, can override.  That's the point
of hooks, to enable overriding.  Why should this be an issue?

> 2. In order to avoid the generated zeroing insns (which are simply
> insns that set registers) being deleted,
> We have to define a new insn “pro_epilogue_use” in the target. 

Why won't a naked USE pattern do?  We already issue those in generic
code, for regs holding return values.  If we were to pretend that other
registers are also holding zeros as values to be returned, why shouldn't
that work for them as well?
Qing Zhao Aug. 28, 2020, 3:21 p.m. UTC | #62
> On Aug 28, 2020, at 2:47 AM, Alexandre Oliva <oliva@adacore.com> wrote:
> 
> On Aug 26, 2020, Qing Zhao <qing.zhao@oracle.com> wrote:
> 
>> There are two issues I can see with adding a default generator in middle end:
> 
>> 1. In order to determine where a target should not use the generic
>> code to emit the zeroing sequence,
>> a new target hook to determine this has to be added;
> 
> Yeah, a target hook whose default is the generic code, and that targets
> that need it, or that benefit from it, can override.  That's the point
> of hooks, to enable overriding.  Why should this be an issue?

A default handler will be invoked for all the targets. So, if the target does not provide any 
target-specific handler to override it. The default handler should be correct on this target. 

So, the default handler should be correct on all the targets assuming no override happening. 

Correct me if I am wrong with the above understanding.

Then, for example, for the 32 bit hpux, is a default handler without any special target handling 
correct on it? My understanding from the previous discussion is, we need some special handling 
On 32 bit hpux to make it correct, So, in order to make the default handler correct on 32 bit hpux,
We need to add another target hook, for example, targetm.has_return_stubs() to check whether
A target has such feature, then in the default handler, we can call this new target hook to check and
Then make sure the default handler is correct on 32 bit hpux. 

There might be other targets that might need other special handlings which we currently don’t know
Yet. Do we need to identify all those targets and all those special features, and then add new 
Target hook for each of the identified special feature?

Yes, theoretically, it’s doable to run testing on all the targets and to see which targets need special
Handling and what kind of special handling we need, however, is doing this really necessary?


> 
>> 2. In order to avoid the generated zeroing insns (which are simply
>> insns that set registers) being deleted,
>> We have to define a new insn “pro_epilogue_use” in the target. 
> 
> Why won't a naked USE pattern do?  We already issue those in generic
> code, for regs holding return values.  If we were to pretend that other
> registers are also holding zeros as values to be returned, why shouldn't
> that work for them as well?

From the current implementation based on X86, I see the following comments:

;; As USE insns aren't meaningful after reload, this is used instead
;; to prevent deleting instructions setting registers for PIC code
(define_insn "pro_epilogue_use"
  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]

My understanding is, the “USE” will not be useful after reload. So a new “pro_eplogue_use” should
be added.

HongJiu, could you please provide more information on this?

Thanks.

Qing

> 
> -- 
> Alexandre Oliva, happy hacker
> https://urldefense.com/v3/__https://FSFLA.org/blogs/lxo/__;!!GqivPVa7Brio!NzNvCeA4fLoYPOD4RHTzKJd3QtgXG8bY2zXVcztQohMQRn5yROpYDp9CRbjjtcRV$ 
> Free Software Activist
> GNU Toolchain Engineer
H.J. Lu Aug. 28, 2020, 3:33 p.m. UTC | #63
On Fri, Aug 28, 2020 at 8:22 AM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> > On Aug 28, 2020, at 2:47 AM, Alexandre Oliva <oliva@adacore.com> wrote:
> >
> > On Aug 26, 2020, Qing Zhao <qing.zhao@oracle.com> wrote:
> >
> >> There are two issues I can see with adding a default generator in middle end:
> >
> >> 1. In order to determine where a target should not use the generic
> >> code to emit the zeroing sequence,
> >> a new target hook to determine this has to be added;
> >
> > Yeah, a target hook whose default is the generic code, and that targets
> > that need it, or that benefit from it, can override.  That's the point
> > of hooks, to enable overriding.  Why should this be an issue?
>
> A default handler will be invoked for all the targets. So, if the target does not provide any
> target-specific handler to override it. The default handler should be correct on this target.
>
> So, the default handler should be correct on all the targets assuming no override happening.
>
> Correct me if I am wrong with the above understanding.
>
> Then, for example, for the 32 bit hpux, is a default handler without any special target handling
> correct on it? My understanding from the previous discussion is, we need some special handling
> On 32 bit hpux to make it correct, So, in order to make the default handler correct on 32 bit hpux,
> We need to add another target hook, for example, targetm.has_return_stubs() to check whether
> A target has such feature, then in the default handler, we can call this new target hook to check and
> Then make sure the default handler is correct on 32 bit hpux.
>
> There might be other targets that might need other special handlings which we currently don’t know
> Yet. Do we need to identify all those targets and all those special features, and then add new
> Target hook for each of the identified special feature?
>
> Yes, theoretically, it’s doable to run testing on all the targets and to see which targets need special
> Handling and what kind of special handling we need, however, is doing this really necessary?
>
>
> >
> >> 2. In order to avoid the generated zeroing insns (which are simply
> >> insns that set registers) being deleted,
> >> We have to define a new insn “pro_epilogue_use” in the target.
> >
> > Why won't a naked USE pattern do?  We already issue those in generic
> > code, for regs holding return values.  If we were to pretend that other
> > registers are also holding zeros as values to be returned, why shouldn't
> > that work for them as well?
>
> From the current implementation based on X86, I see the following comments:
>
> ;; As USE insns aren't meaningful after reload, this is used instead
> ;; to prevent deleting instructions setting registers for PIC code
> (define_insn "pro_epilogue_use"
>   [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
>
> My understanding is, the “USE” will not be useful after reload. So a new “pro_eplogue_use” should
> be added.
>
> HongJiu, could you please provide more information on this?

pro_epilogue_use is needed.  Otherwise, these zeroing instructions
will be removed after reload.
Qing Zhao Sept. 3, 2020, 2:29 p.m. UTC | #64
Hi,

Per request, I collected runtime performance data and code size data with CPU2017 on a X86 platform. 

*** Machine info:
model name>-----: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
$ lscpu | grep NUMA
NUMA node(s):          2
NUMA node0 CPU(s):     0-21,44-65
NUMA node1 CPU(s):     22-43,66-87

***CPU2017 benchmarks: 
all the benchmarks with C/C++, 9 Integer benchmarks, 10 FP benchmarks. 

***Configures:
Intrate and fprate, 22 copies. 

***Compiler options:
no : 				-g -O2 -march=native
used_gpr_arg:  	no + -fzero-call-used-regs=used-gpr-arg
used_arg:  	 	no + -fzero-call-used-regs=used-arg
all_arg:			no + -fzero-call-used-regs=all-arg
used_gpr:		no + -fzero-call-used-regs=used-gpr
all_gpr:			no + -fzero-call-used-regs=all-gpr
used:			no + -fzero-call-used-regs=used
all:				no + -fzero-call-used-regs=all

***each benchmark runs 3 times. 

***runtime performance data:
Please see the attached csv file


From the data, we can see that:
On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
Looks like the overhead of zeroing vector registers is much bigger. 

For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.

***code size increase data:

Please see the attached file 


From the data, we can see that:
The code size impact in general is very small, the biggest is “all_arg”, which is 1.06% for integer benchmark, and 1.13% for FP benchmarks.

So, from the data collected, I think that the run-time overhead and code size increase from this option are very reasonable. 

Let me know you comments and opinions.

thanks.

Qing

> On Aug 25, 2020, at 4:54 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> 
>> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> 
>> Hi!
>> 
>> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>>> use it, it helps security at most none at all :-(
>>>> 
>>>> Without numbers on this, no one can determine if it is a good tradeoff
>>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>>> enough users that it will be worth the effort for us.  Which is why I
>>>> keep hammering on this point.
>>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>>> For this testing? (Is CPU2017 good enough)?
>> 
>> I would use something more real-life, not 12 small pieces of code.
> 
> There is some basic information about the benchmarks of CPU2017 in below link:
> 
> https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$ <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$ <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> >
> 
> GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And 526.blender_r is even larger than 502.gcc_r. 
> And there are several other quite big benchmarks as well (perlbench, xalancbmk, parest, imagick, etc).
> 
> thanks.
> 
> Qing
Qing Zhao Sept. 3, 2020, 3:08 p.m. UTC | #65
Hi,

Looks like both attached .csv files were deleted during the email delivery procedure. Not sure what’s the reason for this.

Then I have to copy the text file here for you reference:

****benchmarks:
C       500.perlbench_r  
C       502.gcc_r     
C       505.mcf_r       
C++     520.omnetpp_r    
C++     523.xalancbmk_r  
C       525.x264_r        
C++     531.deepsjeng_r    
C++     541.leela_r        
C       557.xz_r       
                      

C++/C/Fortran   507.cactuBSSN_r      
C++     508.namd_r    
C++     510.parest_r     
C++/C   511.povray_r   
C       519.lbm_r     
Fortran/C       521.wrf_r 
C++/C   526.blender_r   
Fortran/C       527.cam4_r  
C       538.imagick_r  
C       544.nab_r    

***runtime overhead data and code size overhead data, I converted then to PDF files, hopefully this time I can attach it with the email:

thanks.

Qing






> On Sep 3, 2020, at 9:29 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> Hi,
> 
> Per request, I collected runtime performance data and code size data with CPU2017 on a X86 platform. 
> 
> *** Machine info:
> model name>-----: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> $ lscpu | grep NUMA
> NUMA node(s):          2
> NUMA node0 CPU(s):     0-21,44-65
> NUMA node1 CPU(s):     22-43,66-87
> 
> ***CPU2017 benchmarks: 
> all the benchmarks with C/C++, 9 Integer benchmarks, 10 FP benchmarks. 
> 
> ***Configures:
> Intrate and fprate, 22 copies. 
> 
> ***Compiler options:
> no : 				-g -O2 -march=native
> used_gpr_arg:  	no + -fzero-call-used-regs=used-gpr-arg
> used_arg:  	 	no + -fzero-call-used-regs=used-arg
> all_arg:			no + -fzero-call-used-regs=all-arg
> used_gpr:		no + -fzero-call-used-regs=used-gpr
> all_gpr:			no + -fzero-call-used-regs=all-gpr
> used:			no + -fzero-call-used-regs=used
> all:				no + -fzero-call-used-regs=all
> 
> ***each benchmark runs 3 times. 
> 
> ***runtime performance data:
> Please see the attached csv file
> 
> 
> From the data, we can see that:
> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
> Looks like the overhead of zeroing vector registers is much bigger. 
> 
> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
> 
> ***code size increase data:
> 
> Please see the attached file 
> 
> 
> From the data, we can see that:
> The code size impact in general is very small, the biggest is “all_arg”, which is 1.06% for integer benchmark, and 1.13% for FP benchmarks.
> 
> So, from the data collected, I think that the run-time overhead and code size increase from this option are very reasonable. 
> 
> Let me know you comments and opinions.
> 
> thanks.
> 
> Qing
> 
>> On Aug 25, 2020, at 4:54 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> 
>> 
>> 
>>> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> 
>>> Hi!
>>> 
>>> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>>>> use it, it helps security at most none at all :-(
>>>>> 
>>>>> Without numbers on this, no one can determine if it is a good tradeoff
>>>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>>>> enough users that it will be worth the effort for us.  Which is why I
>>>>> keep hammering on this point.
>>>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>>>> For this testing? (Is CPU2017 good enough)?
>>> 
>>> I would use something more real-life, not 12 small pieces of code.
>> 
>> There is some basic information about the benchmarks of CPU2017 in below link:
>> 
>> https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$<https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$<https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> >
>> 
>> GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And 526.blender_r is even larger than 502.gcc_r. 
>> And there are several other quite big benchmarks as well (perlbench, xalancbmk, parest, imagick, etc).
>> 
>> thanks.
>> 
>> Qing
Qing Zhao Sept. 3, 2020, 4:19 p.m. UTC | #66
Looks like that the PDF attachments do not work with this alias either. 
H.J. LU helped me to upload the performance data and code size data to the following wiki page:

https://gitlab.com/x86-gcc/gcc/-/wikis/Zero-call-used-registers-data

Please refer to this link for the data.

thanks.

Qing

> On Sep 3, 2020, at 10:08 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> Hi,
> 
> Looks like both attached .csv files were deleted during the email delivery procedure. Not sure what’s the reason for this.
> 
> Then I have to copy the text file here for you reference:
> 
> ****benchmarks:
> C       500.perlbench_r  
> C       502.gcc_r     
> C       505.mcf_r       
> C++     520.omnetpp_r    
> C++     523.xalancbmk_r  
> C       525.x264_r        
> C++     531.deepsjeng_r    
> C++     541.leela_r        
> C       557.xz_r       
> 
> 
> C++/C/Fortran   507.cactuBSSN_r      
> C++     508.namd_r    
> C++     510.parest_r     
> C++/C   511.povray_r   
> C       519.lbm_r     
> Fortran/C       521.wrf_r 
> C++/C   526.blender_r   
> Fortran/C       527.cam4_r  
> C       538.imagick_r  
> C       544.nab_r    
> 
> ***runtime overhead data and code size overhead data, I converted then to PDF files, hopefully this time I can attach it with the email:
> 
> thanks.
> 
> Qing
> 
> 
> 
> 
> 
> 
>> On Sep 3, 2020, at 9:29 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> 
>> Hi,
>> 
>> Per request, I collected runtime performance data and code size data with CPU2017 on a X86 platform. 
>> 
>> *** Machine info:
>> model name>-----: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>> $ lscpu | grep NUMA
>> NUMA node(s):          2
>> NUMA node0 CPU(s):     0-21,44-65
>> NUMA node1 CPU(s):     22-43,66-87
>> 
>> ***CPU2017 benchmarks: 
>> all the benchmarks with C/C++, 9 Integer benchmarks, 10 FP benchmarks. 
>> 
>> ***Configures:
>> Intrate and fprate, 22 copies. 
>> 
>> ***Compiler options:
>> no : 				-g -O2 -march=native
>> used_gpr_arg:  	no + -fzero-call-used-regs=used-gpr-arg
>> used_arg:  	 	no + -fzero-call-used-regs=used-arg
>> all_arg:			no + -fzero-call-used-regs=all-arg
>> used_gpr:		no + -fzero-call-used-regs=used-gpr
>> all_gpr:			no + -fzero-call-used-regs=all-gpr
>> used:			no + -fzero-call-used-regs=used
>> all:				no + -fzero-call-used-regs=all
>> 
>> ***each benchmark runs 3 times. 
>> 
>> ***runtime performance data:
>> Please see the attached csv file
>> 
>> 
>> From the data, we can see that:
>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
>> Looks like the overhead of zeroing vector registers is much bigger. 
>> 
>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>> 
>> ***code size increase data:
>> 
>> Please see the attached file 
>> 
>> 
>> From the data, we can see that:
>> The code size impact in general is very small, the biggest is “all_arg”, which is 1.06% for integer benchmark, and 1.13% for FP benchmarks.
>> 
>> So, from the data collected, I think that the run-time overhead and code size increase from this option are very reasonable. 
>> 
>> Let me know you comments and opinions.
>> 
>> thanks.
>> 
>> Qing
>> 
>>> On Aug 25, 2020, at 4:54 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>> 
>>> 
>>> 
>>>> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>> 
>>>> Hi!
>>>> 
>>>> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>>>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>>>>> use it, it helps security at most none at all :-(
>>>>>> 
>>>>>> Without numbers on this, no one can determine if it is a good tradeoff
>>>>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>>>>> enough users that it will be worth the effort for us.  Which is why I
>>>>>> keep hammering on this point.
>>>>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>>>>> For this testing? (Is CPU2017 good enough)?
>>>> 
>>>> I would use something more real-life, not 12 small pieces of code.
>>> 
>>> There is some basic information about the benchmarks of CPU2017 in below link:
>>> 
>>> https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$<https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$<https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> >
>>> 
>>> GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And 526.blender_r is even larger than 502.gcc_r. 
>>> And there are several other quite big benchmarks as well (perlbench, xalancbmk, parest, imagick, etc).
>>> 
>>> thanks.
>>> 
>>> Qing
>
Kees Cook Sept. 3, 2020, 5:13 p.m. UTC | #67
On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
> Looks like the overhead of zeroing vector registers is much bigger. 
> 
> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.

That looks great; thanks for doing those tests!

(And it seems like these benchmarks are kind of a "worst case" scenario
with regard to performance, yes? As in it's mostly tight call loops?)
Qing Zhao Sept. 3, 2020, 5:43 p.m. UTC | #68
> On Sep 3, 2020, at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
> 
> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
>> Looks like the overhead of zeroing vector registers is much bigger. 
>> 
>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
> 
> That looks great; thanks for doing those tests!
> 
> (And it seems like these benchmarks are kind of a "worst case" scenario
> with regard to performance, yes? As in it's mostly tight call loops?)

The top 3 benchmarks that have the most overhead from this option are: 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
All of them are C++ benchmarks. 
I guess that the most important reason is  the smaller routine size in general (especially at the hot execution path or loops).
As a result, the overhead of these additional zeroing instructions in each routine will be relatively higher.  

Qing

> 
> -- 
> Kees Cook
Ramana Radhakrishnan Sept. 3, 2020, 5:48 p.m. UTC | #69
On Thu, Sep 3, 2020 at 6:13 PM Kees Cook via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
> > On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks.
> > If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average.
> > Looks like the overhead of zeroing vector registers is much bigger.
> >
> > For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>
> That looks great; thanks for doing those tests!
>
> (And it seems like these benchmarks are kind of a "worst case" scenario
> with regard to performance, yes? As in it's mostly tight call loops?)


That's true of some of them but definitely not all - the GCC benchmark
springs to mind in SPEC as having quite a flat profile, so I'd take a
look there and probe a bit more in that one to see what happens. Don't
ask me what else , that's all I have in my cache this evening :)

I'd also query the "average" slowdown metric in those numbers as
something that's being measured in a different way here. IIRC the SPEC
scores for int and FP are computed with a geometric mean of the
individual ratios of each of the benchmark. Thus I don't think the
average of the slowdowns is enough to talk about slowdowns for the
benchmark suite. A quick calculation of the arithmetic mean of column
B in my head suggests that it's the arithmetic mean of all the
slowdowns ?

i.e. Slowdown (Geometric Mean (x, y, z, ....))  != Arithmetic mean (
Slowdown (x), Slowdown (y) .....)

So another metric to look at would be to look at the Slowdown of your
estimated (probably non-reportable) SPEC scores as well to get a more
"spec like" metric.

regards
Ramana
>
> --
> Kees Cook
Qing Zhao Sept. 3, 2020, 7:20 p.m. UTC | #70
> On Sep 3, 2020, at 12:48 PM, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
> 
> On Thu, Sep 3, 2020 at 6:13 PM Kees Cook via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> 
>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks.
>>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average.
>>> Looks like the overhead of zeroing vector registers is much bigger.
>>> 
>>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>> 
>> That looks great; thanks for doing those tests!
>> 
>> (And it seems like these benchmarks are kind of a "worst case" scenario
>> with regard to performance, yes? As in it's mostly tight call loops?)
> 
> 
> That's true of some of them but definitely not all - the GCC benchmark
> springs to mind in SPEC as having quite a flat profile, so I'd take a
> look there and probe a bit more in that one to see what happens. Don't
> ask me what else , that's all I have in my cache this evening :)
> 
> I'd also query the "average" slowdown metric in those numbers as
> something that's being measured in a different way here. IIRC the SPEC
> scores for int and FP are computed with a geometric mean of the
> individual ratios of each of the benchmark. Thus I don't think the
> average of the slowdowns is enough to talk about slowdowns for the
> benchmark suite. A quick calculation of the arithmetic mean of column
> B in my head suggests that it's the arithmetic mean of all the
> slowdowns ?
> 
> i.e. Slowdown (Geometric Mean (x, y, z, ....))  != Arithmetic mean (
> Slowdown (x), Slowdown (y) .....)
> 
> So another metric to look at would be to look at the Slowdown of your
> estimated (probably non-reportable) SPEC scores as well to get a more
> "spec like" metric.

Please take a look at the new csv file at:

https://gitlab.com/x86-gcc/gcc/-/wikis/Zero-call-used-registers-data <https://gitlab.com/x86-gcc/gcc/-/wikis/Zero-call-used-registers-data>

I just uploaded the slowdown data computed based on Est.SPECrate(R)2017_int_base and Est.SPECrate(R)2017_fp_base. All data are computed against “no”. 

Compare this slowdown data to the one I computed previously as (Arithmetic mean (Slowdown(x), slowdown(y)…), the numbers do change a little bit, however, the basic information provided from the data keeps the same as before. 

Let me know if you have further comments.

thanks.

Qing


> 
> regards
> Ramana
>> 
>> --
>> Kees Cook
Li, Pan2 via Gcc-patches Sept. 4, 2020, 1:23 a.m. UTC | #71
-----Original Message-----
From: Qing Zhao <QING.ZHAO@oracle.com>
Date: Thursday, September 3, 2020 at 12:55 PM
To: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>, Jakub Jelinek <jakub@redhat.com>, Uros Bizjak <ubizjak@gmail.com>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]



    > On Sep 3, 2020, at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
    > 
    > On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
    >> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
    >> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
    >> Looks like the overhead of zeroing vector registers is much bigger. 
    >> 
    >> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
    > 
    > That looks great; thanks for doing those tests!
    > 
    > (And it seems like these benchmarks are kind of a "worst case" scenario
    > with regard to performance, yes? As in it's mostly tight call loops?)

    The top 3 benchmarks that have the most overhead from this option are: 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
    All of them are C++ benchmarks. 
    I guess that the most important reason is  the smaller routine size in general (especially at the hot execution path or loops).
    As a result, the overhead of these additional zeroing instructions in each routine will be relatively higher.  

    Qing

I think that overhead is expected in benchmarks like 541.leela_r, according to https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html is a benchmark for Artificial Intelligence (Monte Carlo simulation, game tree search & pattern recognition). The addition of fzero-call-used-regs will represent an overhead each time the functions are being call and in areas like game tree search is high. 

Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept). 

Regards

Victor 


    > 
    > -- 
    > Kees Cook
Qing Zhao Sept. 4, 2020, 2:18 p.m. UTC | #72
> On Sep 3, 2020, at 8:23 PM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com> wrote:
> 
> 
> 
> -----Original Message-----
> From: Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>>
> Date: Thursday, September 3, 2020 at 12:55 PM
> To: Kees Cook <keescook@chromium.org <mailto:keescook@chromium.org>>
> Cc: Segher Boessenkool <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>>, Jakub Jelinek <jakub@redhat.com <mailto:jakub@redhat.com>>, Uros Bizjak <ubizjak@gmail.com <mailto:ubizjak@gmail.com>>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com <mailto:victor.rodriguez.bahena@intel.com>>, GCC Patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>>
> Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> 
> 
> 
>> On Sep 3, 2020, at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
>> 
>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
>>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
>>> Looks like the overhead of zeroing vector registers is much bigger. 
>>> 
>>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>> 
>> That looks great; thanks for doing those tests!
>> 
>> (And it seems like these benchmarks are kind of a "worst case" scenario
>> with regard to performance, yes? As in it's mostly tight call loops?)
> 
>    The top 3 benchmarks that have the most overhead from this option are: 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
>    All of them are C++ benchmarks. 
>    I guess that the most important reason is  the smaller routine size in general (especially at the hot execution path or loops).
>    As a result, the overhead of these additional zeroing instructions in each routine will be relatively higher.  
> 
>    Qing
> 
> I think that overhead is expected in benchmarks like 541.leela_r, according to https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$ <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$>  is a benchmark for Artificial Intelligence (Monte Carlo simulation, game tree search & pattern recognition). The addition of fzero-call-used-regs will represent an overhead each time the functions are being call and in areas like game tree search is high. 
> 
> Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept). 

From the performance data, we can see that the runtime overhead of clearing only_used registers is very reasonable, even for 541.leela_r, 531.deepsjent_r, and 511.povray.   If try to clear all registers whatever used or not in the current routine, the overhead will be increased dramatically. 

So, my question is:

From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?  
From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.

Thanks.

Qing


> 
> Regards
> 
> Victor 
> 
> 
>> 
>> -- 
>> Kees Cook
Segher Boessenkool Sept. 4, 2020, 3:18 p.m. UTC | #73
Hi!

On Mon, Aug 24, 2020 at 03:49:50PM -0500, Qing Zhao wrote:
> > Do you want to do this before or after the epilogue code is generated?
> 
> static rtx_insn *
> make_epilogue_seq (void)
> {
>   if (!targetm.have_epilogue ())
>     return NULL;
> 
>   start_sequence ();
>   emit_note (NOTE_INSN_EPILOGUE_BEG);
> 
>  +++++ gen_call_used_regs_seq ();                     // this is the place to emit the zeroing insn sequence
> 
>   rtx_insn *seq = targetm.gen_epilogue ();
> …
> }
> 
> Any comment on this?

So, before.  This is problematic if the epilogue uses any of those
registers: if the epilogue expects some value there, you just destroyed
it; and, conversely, if the epilogue writes such a reg, your zeroing is
useless.


You probably have to do this for every target separately?  But it is not
enough to handle it in the epilogue, you also need to make sure it is
done on every path that returns *without* epilogue.


Segher
Segher Boessenkool Sept. 4, 2020, 3:26 p.m. UTC | #74
On Mon, Aug 24, 2020 at 03:43:11PM -0500, Qing Zhao wrote:
> > On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> For this testing? (Is CPU2017 good enough)?
> > 
> > I would use something more real-life, not 12 small pieces of code.
> 
> Then, what kind of real-life benchmark you are suggesting? 

Picking benchmark code is Hard (and that is your job, not mine, sorry).
Maybe firefox or openoffice or whatever.  Some *bigger* code.  Real-life
code.

> >> Please take a look at this paper. 
> > 
> > As I told you before, that isn't open information, I cannot reply to
> > any of that.
> 
> A little confused here, what’s you mean by “open information”? Is the information in a published paper not open information?

I am not allowed to quote it here.


Segher
Segher Boessenkool Sept. 4, 2020, 3:43 p.m. UTC | #75
On Thu, Sep 03, 2020 at 10:13:35AM -0700, Kees Cook wrote:
> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
> > On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
> > If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
> > Looks like the overhead of zeroing vector registers is much bigger. 
> > 
> > For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
> 
> That looks great; thanks for doing those tests!
> 
> (And it seems like these benchmarks are kind of a "worst case" scenario
> with regard to performance, yes? As in it's mostly tight call loops?)

I call this very expensive, already, and it is benchmarked on a target
where this should be very cheap (it has few registers) :-/


Segher
Qing Zhao Sept. 4, 2020, 5:18 p.m. UTC | #76
> On Sep 4, 2020, at 10:43 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Thu, Sep 03, 2020 at 10:13:35AM -0700, Kees Cook wrote:
>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
>>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
>>> Looks like the overhead of zeroing vector registers is much bigger. 
>>> 
>>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>> 
>> That looks great; thanks for doing those tests!
>> 
>> (And it seems like these benchmarks are kind of a "worst case" scenario
>> with regard to performance, yes? As in it's mostly tight call loops?)
> 
> I call this very expensive, already,

Yes, I think that 17.56% on average is quite expensive. That’s the data for -fzero-call-used-regs=all, the worst case i.e, clearing all the call-used registers at the return.

However, if we only clear USED registers, the worst case is 1.72% on average.  This overhead is very reasonable. 
Furthermore, if we only clear used_gpr_arg, i.e used general purpose registers that pass parameters, this should be enough to be used for mitigation ROP, the overhead is even smaller, it’s 0.84% on average. 


> and it is benchmarked on a target
> where this should be very cheap (it has few registers) :-/

It’s a tradeoff to improve the software security with some runtime overhead. 

For compiler, we should provide such option to the users to satisfy their security need even though the runtime overhead.  Of course, during compiler implementation, we will do our best to minimize the runtime overhead.

Qing



> 
> 
> Segher
H.J. Lu Sept. 4, 2020, 5:34 p.m. UTC | #77
On Fri, Sep 4, 2020 at 8:18 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Mon, Aug 24, 2020 at 03:49:50PM -0500, Qing Zhao wrote:
> > > Do you want to do this before or after the epilogue code is generated?
> >
> > static rtx_insn *
> > make_epilogue_seq (void)
> > {
> >   if (!targetm.have_epilogue ())
> >     return NULL;
> >
> >   start_sequence ();
> >   emit_note (NOTE_INSN_EPILOGUE_BEG);
> >
> >  +++++ gen_call_used_regs_seq ();                     // this is the place to emit the zeroing insn sequence
> >
> >   rtx_insn *seq = targetm.gen_epilogue ();
> > …
> > }
> >
> > Any comment on this?
>
> So, before.  This is problematic if the epilogue uses any of those
> registers: if the epilogue expects some value there, you just destroyed
> it; and, conversely, if the epilogue writes such a reg, your zeroing is
> useless.
>
>
> You probably have to do this for every target separately?  But it is not
> enough to handle it in the epilogue, you also need to make sure it is
> done on every path that returns *without* epilogue.

This feature is designed for normal return with epilogue.
Segher Boessenkool Sept. 4, 2020, 6:04 p.m. UTC | #78
On Fri, Sep 04, 2020 at 12:18:12PM -0500, Qing Zhao wrote:
> > I call this very expensive, already,
> 
> Yes, I think that 17.56% on average is quite expensive. That’s the data for -fzero-call-used-regs=all, the worst case i.e, clearing all the call-used registers at the return.
> 
> However, if we only clear USED registers, the worst case is 1.72% on average.  This overhead is very reasonable. 

No, that is the number I meant.  2% overhead is extremely much, unless
this is magically super effective, and actually protects many things
from exploitation (that aren't already protected some other way, SSP for
example).

> > and it is benchmarked on a target
> > where this should be very cheap (it has few registers) :-/
> 
> It’s a tradeoff to improve the software security with some runtime overhead. 

Yes.  Which is why I asked for numbers of both sides of the equation:
how much it costs, vs. how much value it brings.

> For compiler, we should provide such option to the users to satisfy their security need even though the runtime overhead.  Of course, during compiler implementation, we will do our best to minimize the runtime overhead.

There also is a real cost to the compiler *developers*.  Which is my
prime worry here.  If this gives users at most marginal value, then it
is real cost to us, but nothing to hold up to that.


Segher
Segher Boessenkool Sept. 4, 2020, 6:09 p.m. UTC | #79
On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
> > You probably have to do this for every target separately?  But it is not
> > enough to handle it in the epilogue, you also need to make sure it is
> > done on every path that returns *without* epilogue.
> 
> This feature is designed for normal return with epilogue.

Very many normal returns do *not* pass through an epilogue, but are
simple_return.  Disabling that is *much* more expensive than that 2%.


Segher
H.J. Lu Sept. 4, 2020, 6:52 p.m. UTC | #80
On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
> > > You probably have to do this for every target separately?  But it is not
> > > enough to handle it in the epilogue, you also need to make sure it is
> > > done on every path that returns *without* epilogue.
> >
> > This feature is designed for normal return with epilogue.
>
> Very many normal returns do *not* pass through an epilogue, but are
> simple_return.  Disabling that is *much* more expensive than that 2%.

Sibcall isn't covered.  What other cases don't have an epilogue?
Qing Zhao Sept. 4, 2020, 7 p.m. UTC | #81
> On Sep 4, 2020, at 1:04 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 04, 2020 at 12:18:12PM -0500, Qing Zhao wrote:
>>> I call this very expensive, already,
>> 
>> Yes, I think that 17.56% on average is quite expensive. That’s the data for -fzero-call-used-regs=all, the worst case i.e, clearing all the call-used registers at the return.
>> 
>> However, if we only clear USED registers, the worst case is 1.72% on average.  This overhead is very reasonable. 
> 
> No, that is the number I meant.  2% overhead is extremely much, unless
> this is magically super effective, and actually protects many things
> from exploitation (that aren't already protected some other way, SSP for
> example).

Then how about the 0.81% overhead on average for -fzero-call-used-regs=used_gpr_arg? 

This option can be used to effectively mitigate ROP attack. 

and currently,   Clear Linux project has been using a similar option as this one since GCC 8 (similar as -fzero-call-used-regs=used_gpr). 


>>> and it is benchmarked on a target
>>> where this should be very cheap (it has few registers) :-/
>> 
>> It’s a tradeoff to improve the software security with some runtime overhead. 
> 
> Yes.  Which is why I asked for numbers of both sides of the equation:
> how much it costs, vs. how much value it brings.

Reasonable. 

> 
>> For compiler, we should provide such option to the users to satisfy their security need even though the runtime overhead.  Of course, during compiler implementation, we will do our best to minimize the runtime overhead.
> 
> There also is a real cost to the compiler *developers*.  Which is my
> prime worry here.  If this gives users at most marginal value, then it
> is real cost to us, but nothing to hold up to that.

Here, you mean the future maintenance  cost  for this part of the code?

Qing
> 
> 
> Segher
Li, Pan2 via Gcc-patches Sept. 7, 2020, 1:06 p.m. UTC | #82
From: Qing Zhao <QING.ZHAO@ORACLE.COM>
Date: Friday, September 4, 2020 at 9:19 AM
To: "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com>, Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>, Jakub Jelinek <jakub@redhat.com>, Uros Bizjak <ubizjak@gmail.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]




On Sep 3, 2020, at 8:23 PM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com<mailto:victor.rodriguez.bahena@intel.com>> wrote:



-----Original Message-----
From: Qing Zhao <QING.ZHAO@oracle.com<mailto:QING.ZHAO@oracle.com>>
Date: Thursday, September 3, 2020 at 12:55 PM
To: Kees Cook <keescook@chromium.org<mailto:keescook@chromium.org>>
Cc: Segher Boessenkool <segher@kernel.crashing.org<mailto:segher@kernel.crashing.org>>, Jakub Jelinek <jakub@redhat.com<mailto:jakub@redhat.com>>, Uros Bizjak <ubizjak@gmail.com<mailto:ubizjak@gmail.com>>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com<mailto:victor.rodriguez.bahena@intel.com>>, GCC Patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]




On Sep 3, 2020, at 12:13 PM, Kees Cook <keescook@chromium.org<mailto:keescook@chromium.org>> wrote:

On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:

On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks.
If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average.
Looks like the overhead of zeroing vector registers is much bigger.

For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.

That looks great; thanks for doing those tests!

(And it seems like these benchmarks are kind of a "worst case" scenario
with regard to performance, yes? As in it's mostly tight call loops?)

   The top 3 benchmarks that have the most overhead from this option are: 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
   All of them are C++ benchmarks.
   I guess that the most important reason is  the smaller routine size in general (especially at the hot execution path or loops).
   As a result, the overhead of these additional zeroing instructions in each routine will be relatively higher.

   Qing

I think that overhead is expected in benchmarks like 541.leela_r, according to https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$<https://urldefense.com/v3/__https:/www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$>  is a benchmark for Artificial Intelligence (Monte Carlo simulation, game tree search & pattern recognition). The addition of fzero-call-used-regs will represent an overhead each time the functions are being call and in areas like game tree search is high.

Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept).

From the performance data, we can see that the runtime overhead of clearing only_used registers is very reasonable, even for 541.leela_r, 531.deepsjent_r, and 511.povray.   If try to clear all registers whatever used or not in the current routine, the overhead will be increased dramatically.

So, my question is:

From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?
From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.

You are right, it does not provide additional security


Thanks.

Qing



Regards

Victor




--
Kees Cook
Segher Boessenkool Sept. 7, 2020, 2:06 p.m. UTC | #83
On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
> On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
> > > > You probably have to do this for every target separately?  But it is not
> > > > enough to handle it in the epilogue, you also need to make sure it is
> > > > done on every path that returns *without* epilogue.
> > >
> > > This feature is designed for normal return with epilogue.
> >
> > Very many normal returns do *not* pass through an epilogue, but are
> > simple_return.  Disabling that is *much* more expensive than that 2%.
> 
> Sibcall isn't covered.  What other cases don't have an epilogue?

Shrink-wrapped stuff.  Quite important for performance.  Not something
you can throw away.


Segher
Segher Boessenkool Sept. 7, 2020, 2:36 p.m. UTC | #84
On Fri, Sep 04, 2020 at 02:00:41PM -0500, Qing Zhao wrote:
> >> However, if we only clear USED registers, the worst case is 1.72% on average.  This overhead is very reasonable. 
> > 
> > No, that is the number I meant.  2% overhead is extremely much, unless
> > this is magically super effective, and actually protects many things
> > from exploitation (that aren't already protected some other way, SSP for
> > example).
> 
> Then how about the 0.81% overhead on average for -fzero-call-used-regs=used_gpr_arg? 

That is still quite a lot.

> This option can be used to effectively mitigate ROP attack. 

Nice assertion.  Show it!

> > Yes.  Which is why I asked for numbers of both sides of the equation:
> > how much it costs, vs. how much value it brings.
> 
> Reasonable. 

I'm glad you agree :-)

> >> For compiler, we should provide such option to the users to satisfy their security need even though the runtime overhead.  Of course, during compiler implementation, we will do our best to minimize the runtime overhead.
> > 
> > There also is a real cost to the compiler *developers*.  Which is my
> > prime worry here.  If this gives users at most marginal value, then it
> > is real cost to us, but nothing to hold up to that.
> 
> Here, you mean the future maintenance  cost  for this part of the code?

Not just that.  *All* support costs, and consider all other
optimisations it will interfere with, etc.


Segher
Segher Boessenkool Sept. 7, 2020, 2:44 p.m. UTC | #85
On Fri, Sep 04, 2020 at 01:23:14AM +0000, Rodriguez Bahena, Victor wrote:
> Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept). 

The overhead is of course bearable for most programs / users, but what
is the return?  For what percentage of programs are ROP attacks no
longer possible, for example.


Segher
H.J. Lu Sept. 7, 2020, 3:58 p.m. UTC | #86
On Mon, Sep 7, 2020 at 7:06 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
> > On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
> > <segher@kernel.crashing.org> wrote:
> > > On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
> > > > > You probably have to do this for every target separately?  But it is not
> > > > > enough to handle it in the epilogue, you also need to make sure it is
> > > > > done on every path that returns *without* epilogue.
> > > >
> > > > This feature is designed for normal return with epilogue.
> > >
> > > Very many normal returns do *not* pass through an epilogue, but are
> > > simple_return.  Disabling that is *much* more expensive than that 2%.
> >
> > Sibcall isn't covered.  What other cases don't have an epilogue?
>
> Shrink-wrapped stuff.  Quite important for performance.  Not something
> you can throw away.
>

Qing, can you check how it interacts with shrink-wrap?
Qing Zhao Sept. 8, 2020, 2:55 p.m. UTC | #87
> On Sep 7, 2020, at 9:36 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 04, 2020 at 02:00:41PM -0500, Qing Zhao wrote:
>>>> However, if we only clear USED registers, the worst case is 1.72% on average.  This overhead is very reasonable. 
>>> 
>>> No, that is the number I meant.  2% overhead is extremely much, unless
>>> this is magically super effective, and actually protects many things
>>> from exploitation (that aren't already protected some other way, SSP for
>>> example).
>> 
>> Then how about the 0.81% overhead on average for -fzero-call-used-regs=used_gpr_arg? 
> 
> That is still quite a lot.
> 
>> This option can be used to effectively mitigate ROP attack. 
> 
> Nice assertion.  Show it!

As I mentioned multiple times,  one important background of this patch is this  paper which was published at 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP):

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks”

https://ieeexplore.ieee.org/document/8445132

Downloading this paper form IEEE needs a fee. I have downloaded it from my company’s account, however, After consulting, it turned out that I was not allowed to further forward the copy I downloaded through my company’s account to this alias. 

However, There is some more information on this paper online though:

https://www.semanticscholar.org/paper/Clean-the-Scratch-Registers:-A-Way-to-Mitigate-Rong-Xie/6f2ce4fd31baa0f6c02f9eb5c57b90d39fe5fa13

All the figures and tables in this paper are available in this link. 

In which, Table III, Table IV and Table V are the results of “zeroing scratch register mitigate ROP attack”. From the tables, zeroing scratch registers can successfully mitigate the ROP on all those benchmarks. 

What other information you need to show the effective of mitigation ROP attack?

> 
>>> Yes.  Which is why I asked for numbers of both sides of the equation:
>>> how much it costs, vs. how much value it brings.
> On Aug 25, 2020, at 9:05 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> 
>> On Aug 25, 2020, at 1:41 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> 
>>>> 
>>>>>> (The other side of the coin is how much this helps prevent exploitation;
>>>>>> numbers on that would be good to see, too.)
>>>>> 
>>>>> This can be well showed from the paper:
>>>>> 
>>>>> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
>>>>> 
>>>>> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ <https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ >
>>>>> 
>>>>> Please take a look at this paper.
>>>> 
>>>> As I told you before, that isn't open information, I cannot reply to
>>>> any of that.
>>> 
>>> A little confused here, what’s you mean by “open information”? Is the information in a published paper not open information?
>> 
>> No, because it is behind a paywall.
> 
> Still don’t understand here:  this paper has been published in the proceeding of “ 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)”.
> If you want to read the complete version online, you need to pay for it.
> 
> However, it’s still a published paper, and the information inside it should be “open information”. 
> 
> So, what’s the definition of “open information” you have?
> 
> I downloaded a PDF copy of this paper through my company’s paid account.  But I am not sure whether it’s legal for me to attach it to this mailing list?

After consulting, it turned out that I was not allowed to further forward the copy I downloaded through my company’s account to this alias. 
There is some more information on this paper online though:

https://urldefense.com/v3/__https://www.semanticscholar.org/paper/Clean-the-Scratch-Registers:-A-Way-to-Mitigate-Rong-Xie/6f2ce4fd31baa0f6c02f9eb5c57b90d39fe5fa13__;!!GqivPVa7Brio!I4MGz7_DH7Dtcfzmgz7MxfDNnuJO-CiNo1jUcp4OOQOiPi4uEEOfuoT7_1SSMt1D$ 

All the figures and tables in this paper are available in this link. 

In which, Figure 1 is an illustration  of a typical ROP attack, please pay special attention on the “Gadgets”, which are carefully chosen machine instruction sequences that are already present in the machine's memory, Each gadget typically ends in a return instruction and is located in a subroutine within the existing program and/or shared library code. Chained together, these gadgets allow an attacker to perform arbitrary operations on a machine employing defenses that thwart simpler attacks.

The paper identified the important features of ROP attack as following:

"First, the destination of using gadget chains in usual is performing system call or system fucntion to perform malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode can be executed directly instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. 

Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks using system function such as “read” or “mprotect”, on x64 system, the register would still be used to pass parameters, as mentioned in subsection B and C.”
As a result, the paper proposed the idea to zeroing scratch registers that pass parameters at the “return” insns to mitigate the ROP attack. 

Table III, Table IV and Table V are the results of “zeroing scratch register mitigate ROP attack”. From the tables, zeroing scratch registers can successfully mitigate the ROP on all those benchmarks. 

Table VI is the performance overhead of their implementation, it looks like very high, average 16.2X runtime overhead.  However, this implementation is not use compiler to statically generate zeroing sequence, instead, it used "dynamic binary instrumentation at runtime “ to check every instruction to 
1. Set/unset flags to check which scratch registers are used in the routine;
2. Whether the instruction is return instruction or not, if it’s is return, insert the zeroing used scratch register sequence before the “return” insn. 

Due to the above run-time dynamic instrumentation method, the high runtime overhead is expecting, I think.

If we use GCC to statically check the “used” information and add zeroing sequence before return insn, the run-time overhead will be much smaller. 

I will provide run-time overhead information with the 2nd version of the patch by using CPU2017 applications.

thanks.

Qing


> Qing
> 
> 
>> 
>> Uros.
>> 
>> Reasonable. 
> 
> I'm glad you agree :-)
> 
>>>> For compiler, we should provide such option to the users to satisfy their security need even though the runtime overhead.  Of course, during compiler implementation, we will do our best to minimize the runtime overhead.
>>> 
>>> There also is a real cost to the compiler *developers*.  Which is my
>>> prime worry here.  If this gives users at most marginal value, then it
>>> is real cost to us, but nothing to hold up to that.
>> 
>> Here, you mean the future maintenance  cost  for this part of the code?
> 
> Not just that.  *All* support costs, and consider all other
> optimisations it will interfere with, etc.

Many new features need these kinds of cost, as long as the new feature is necessary to provide important feature to the users.

From my understanding, this is a feature asked by kernel security people to improve kernel's security. And this feature has been in CLEAR LINUX since 2018 to improve kernel security on x86. 

thanks.

Qing
> 
> 
> Segher
Qing Zhao Sept. 8, 2020, 3 p.m. UTC | #88
> On Sep 7, 2020, at 8:06 AM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com> wrote:
> 
>  
>  
> From: Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>>
> Date: Friday, September 4, 2020 at 9:19 AM
> To: "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com <mailto:victor.rodriguez.bahena@intel.com>>, Kees Cook <keescook@chromium.org <mailto:keescook@chromium.org>>
> Cc: Segher Boessenkool <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>>, Jakub Jelinek <jakub@redhat.com <mailto:jakub@redhat.com>>, Uros Bizjak <ubizjak@gmail.com <mailto:ubizjak@gmail.com>>, GCC Patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>>
> Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>  
>  
> 
> 
>> On Sep 3, 2020, at 8:23 PM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com <mailto:victor.rodriguez.bahena@intel.com>> wrote:
>>  
>> 
>> 
>> -----Original Message-----
>> From: Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>>
>> Date: Thursday, September 3, 2020 at 12:55 PM
>> To: Kees Cook <keescook@chromium.org <mailto:keescook@chromium.org>>
>> Cc: Segher Boessenkool <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>>, Jakub Jelinek <jakub@redhat.com <mailto:jakub@redhat.com>>, Uros Bizjak <ubizjak@gmail.com <mailto:ubizjak@gmail.com>>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com <mailto:victor.rodriguez.bahena@intel.com>>, GCC Patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>>
>> Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>> 
>> 
>> 
>> 
>>> On Sep 3, 2020, at 12:13 PM, Kees Cook <keescook@chromium.org <mailto:keescook@chromium.org>> wrote:
>>> 
>>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>>> 
>>>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
>>>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
>>>> Looks like the overhead of zeroing vector registers is much bigger. 
>>>> 
>>>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>>> 
>>> That looks great; thanks for doing those tests!
>>> 
>>> (And it seems like these benchmarks are kind of a "worst case" scenario
>>> with regard to performance, yes? As in it's mostly tight call loops?)
>> 
>>    The top 3 benchmarks that have the most overhead from this option are: 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
>>    All of them are C++ benchmarks. 
>>    I guess that the most important reason is  the smaller routine size in general (especially at the hot execution path or loops).
>>    As a result, the overhead of these additional zeroing instructions in each routine will be relatively higher.  
>> 
>>    Qing
>> 
>> I think that overhead is expected in benchmarks like 541.leela_r, according to https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$ <https://urldefense.com/v3/__https:/www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$>  is a benchmark for Artificial Intelligence (Monte Carlo simulation, game tree search & pattern recognition). The addition of fzero-call-used-regs will represent an overhead each time the functions are being call and in areas like game tree search is high. 
>> 
>> Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept). 
>  
> From the performance data, we can see that the runtime overhead of clearing only_used registers is very reasonable, even for 541.leela_r, 531.deepsjent_r, and 511.povray.   If try to clear all registers whatever used or not in the current routine, the overhead will be increased dramatically. 
>  
> So, my question is:
>  
> From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?  
> From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.
>  
> You are right, it does not provide additional security

Then, is it necessary to provide 

-fzero-call-used-regs=all-arg|all-gpr|all   to the user?

Can we just delete these 3 sub options?


Qing


>  
>  
> Thanks.
>  
> Qing
>  
>  
>> 
>> Regards
>> 
>> Victor 
>> 
>> 
>> 
>>> 
>>> -- 
>>> Kees Cook
> 
> 
>
Patrick McGehearty Sept. 8, 2020, 3:05 p.m. UTC | #89
My understanding is this feature/flag is not intended to be "default on".
It is intended to be used in security sensitive environments such
as the Linux kernel where it was requested by kernel security experts.
I'm not understanding the objection here if the feature is requested
by security teams and the average cost is modest.

My background is in performance and application optimization. I agree
that for typical computation oriented, non-secure applications, I would
not use the feature, but for system applications that have the ability
to cross protection boundaries, it seems to be clearly a worthwhile
feature.

- patrick


On 9/7/2020 9:44 AM, Segher Boessenkool wrote:
> On Fri, Sep 04, 2020 at 01:23:14AM +0000, Rodriguez Bahena, Victor wrote:
>> Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept).
> The overhead is of course bearable for most programs / users, but what
> is the return?  For what percentage of programs are ROP attacks no
> longer possible, for example.
>
>
> Segher
Qing Zhao Sept. 8, 2020, 4:43 p.m. UTC | #90
> On Sep 7, 2020, at 10:58 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> On Mon, Sep 7, 2020 at 7:06 AM Segher Boessenkool
> <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>> wrote:
>> 
>> On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
>>> On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
>>> <segher@kernel.crashing.org> wrote:
>>>> On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
>>>>>> You probably have to do this for every target separately?  But it is not
>>>>>> enough to handle it in the epilogue, you also need to make sure it is
>>>>>> done on every path that returns *without* epilogue.
>>>>> 
>>>>> This feature is designed for normal return with epilogue.
>>>> 
>>>> Very many normal returns do *not* pass through an epilogue, but are
>>>> simple_return.  Disabling that is *much* more expensive than that 2%.
>>> 
>>> Sibcall isn't covered.  What other cases don't have an epilogue?
>> 
>> Shrink-wrapped stuff.  Quite important for performance.  Not something
>> you can throw away.
>> 
> 
> Qing, can you check how it interacts with shrink-wrap?

We have some discussion on shrink-wrapping previously.  And we agreed on  the following at that time:

"Shrink-wrapping often deals with the non-volatile registers, so that
doesn't matter much for this patch series.”

On the other hand, we deal with volatile registers in this patch, so from the registers point of view, there is NO overlap between this
Patch and the shrink-wrapping. 

So, what’s the other possible issues when this patch interacting with shrink-wrapping?

When I checked the gcc source code on shrink-wrapping as following (gcc/function.c):


…….
  rtx_insn *epilogue_seq = make_epilogue_seq ();

  /* Try to perform a kind of shrink-wrapping, making sure the
     prologue/epilogue is emitted only around those parts of the
     function that require it.  */
  try_shrink_wrapping (&entry_edge, prologue_seq);

  /* If the target can handle splitting the prologue/epilogue into separate
     components, try to shrink-wrap these components separately.  */
  try_shrink_wrapping_separate (entry_edge->dest);

  /* If that did anything for any component we now need the generate the
     "main" prologue again.  Because some targets require some of these
     to be called in a specific order (i386 requires the split prologue
     to be first, for example), we create all three sequences again here.
     If this does not work for some target, that target should not enable
     separate shrink-wrapping.  */
  if (crtl->shrink_wrapped_separate)
    {
      split_prologue_seq = make_split_prologue_seq ();
      prologue_seq = make_prologue_seq ();
      epilogue_seq = make_epilogue_seq ();
    }
…….

My understanding from the above is:

1. “try_shrink_wrapping” should NOT interact with make_epilogue_seq since only “prologue_seq” will not touched. 
2. “try_shrink_wrapping_seperate”  might interact with epilogue, however, if there is anything changed with “try_shrink_wrapping_seperate”,
    make_epilogue_seq() will be called again, and then the zeroing sequence will be generated still at the end of the routine. 

So, from the above, I didn’t see any obvious issues.

But I might miss some important  issues here, please let me know what I am missing here?

Thanks a lot for any help.

Qing



> 
> -- 
> H.J.
Richard Sandiford Sept. 10, 2020, 12:11 p.m. UTC | #91
Patrick McGehearty via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> My understanding is this feature/flag is not intended to be "default on".
> It is intended to be used in security sensitive environments such
> as the Linux kernel where it was requested by kernel security experts.
> I'm not understanding the objection here if the feature is requested
> by security teams and the average cost is modest.

Agreed.  And of course, “is modest” here means “is modest in the eyes
of the people who want to use it”.

IMO it's been established at this point that the feature is useful
enough to some people.  It might be too expensive for others,
but that's OK.

I've kind-of lost track of where we stand given all the subthreads.
If we've now decided which suboptions we want to support, would it
make sense to start a new thread with the current patch, and then
just concentrate on code review for that subthread?

Thanks,
Richard
Qing Zhao Sept. 10, 2020, 2:34 p.m. UTC | #92
Richard,

Thank you!

> On Sep 10, 2020, at 7:11 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Patrick McGehearty via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> My understanding is this feature/flag is not intended to be "default on".
>> It is intended to be used in security sensitive environments such
>> as the Linux kernel where it was requested by kernel security experts.
>> I'm not understanding the objection here if the feature is requested
>> by security teams and the average cost is modest.
> 
> Agreed.  And of course, “is modest” here means “is modest in the eyes
> of the people who want to use it”.
> 
> IMO it's been established at this point that the feature is useful
> enough to some people.  It might be too expensive for others,
> but that's OK.
> 
> I've kind-of lost track of where we stand given all the subthreads.
> If we've now decided which suboptions we want to support,

From the performance data, we saw that clearing ALL registers cost too much more without any additional benefit, so, I’d like to delete all those sub-options including “ALL”, i.e, all-arg, all-gpr, all.

Now, the option will be:

-fzero-call-used-regs=skip|gpr-arg|all-arg|gpr|all

Add -fzero-call-used-regs=[skip|gpr-arg|all-arg|gpr|all] command-line option
and
zero_call_used_regs("skip|gpr-arg|all-arg|gpr|all") function attribues:

    1. -mzero-call-used-regs=skip and zero_call_used_regs("skip")

    Don't zero call-used registers upon function return. This is the default behavior.

    2. -mzero-call-used-regs=gpr-arg and zero_call_used_regs("gpr-arg")

    Upon function return,  zero call-used general purpose registers that are used in the routine and might pass parameters.

    3. -mzero-call-used-regs=used-arg and zero_call_used_regs(“all-arg")

    Upon function return, zero call-used registers that are used in the routine and might pass parameters.
    4. -mzero-call-used-regs=used-gpr and zero_call_used_regs("gpr")

    Upon function return, zero call-used general purpose registers that are used in the routine.

    5. -mzero-call-used-regs=used and zero_call_used_regs(“all")

    Upon function return, zero call-used registers that are used in the routine.

Let me know any objection or comment. 

> would it
> make sense to start a new thread with the current patch, and then
> just concentrate on code review for that subthread?

I will start the new thread after my new patch is ready.

Thanks again.

Qing
> 
> Thanks,
> Richard
Li, Pan2 via Gcc-patches Sept. 10, 2020, 2:59 p.m. UTC | #93
-----Original Message-----
From: Qing Zhao <QING.ZHAO@ORACLE.COM>
Date: Thursday, September 10, 2020 at 9:34 AM
To: Richard Sandiford <richard.sandiford@arm.com>, kees Cook <keescook@chromium.org>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com>
Cc: Patrick McGehearty via Gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

    Richard,

    Thank you!

    > On Sep 10, 2020, at 7:11 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
    > 
    > Patrick McGehearty via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
    >> My understanding is this feature/flag is not intended to be "default on".
    >> It is intended to be used in security sensitive environments such
    >> as the Linux kernel where it was requested by kernel security experts.
    >> I'm not understanding the objection here if the feature is requested
    >> by security teams and the average cost is modest.
    > 
    > Agreed.  And of course, “is modest” here means “is modest in the eyes
    > of the people who want to use it”.
    > 
    > IMO it's been established at this point that the feature is useful
    > enough to some people.  It might be too expensive for others,
    > but that's OK.
    > 
    > I've kind-of lost track of where we stand given all the subthreads.
    > If we've now decided which suboptions we want to support,

    From the performance data, we saw that clearing ALL registers cost too much more without any additional benefit, so, I’d like to delete all those sub-options including “ALL”, i.e, all-arg, all-gpr, all.

    Now, the option will be:

    -fzero-call-used-regs=skip|gpr-arg|all-arg|gpr|all

    Add -fzero-call-used-regs=[skip|gpr-arg|all-arg|gpr|all] command-line option
    and
    zero_call_used_regs("skip|gpr-arg|all-arg|gpr|all") function attribues:

        1. -mzero-call-used-regs=skip and zero_call_used_regs("skip")

        Don't zero call-used registers upon function return. This is the default behavior.

        2. -mzero-call-used-regs=gpr-arg and zero_call_used_regs("gpr-arg")

        Upon function return,  zero call-used general purpose registers that are used in the routine and might pass parameters.

        3. -mzero-call-used-regs=used-arg and zero_call_used_regs(“all-arg")

        Upon function return, zero call-used registers that are used in the routine and might pass parameters.
        4. -mzero-call-used-regs=used-gpr and zero_call_used_regs("gpr")

        Upon function return, zero call-used general purpose registers that are used in the routine.

        5. -mzero-call-used-regs=used and zero_call_used_regs(“all")

        Upon function return, zero call-used registers that are used in the routine.

    Let me know any objection or comment. 

+1

    > would it
    > make sense to start a new thread with the current patch, and then
    > just concentrate on code review for that subthread?

    I will start the new thread after my new patch is ready.

    Thanks again.

    Qing
    > 
    > Thanks,
    > Richard
Kees Cook Sept. 10, 2020, 7:07 p.m. UTC | #94
[tried to clean up quoting...]

On Tue, Sep 08, 2020 at 10:00:09AM -0500, Qing Zhao wrote:
> 
> > On Sep 7, 2020, at 8:06 AM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com> wrote:
> > 
> >>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
> >>> So, my question is:
> >>>
> >>> From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?  
> >>> From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.
> >  
> > You are right, it does not provide additional security
> 
> Then, is it necessary to provide 
> 
> -fzero-call-used-regs=all-arg|all-gpr|all   to the user?
> 
> Can we just delete these 3 sub options?

Well... I'd say there is some benefit (remember that ROP gadgets are
built from function trailers, so there is rarely a concern over what the
rest of the function is doing). Generally, they are chained together
based on just the last couple instructions:

 *useful action*
 *ret*

So with ...=used this turns into:

 *useful action*
 *clear some registers*
 *ret*

Which may still be helpful (if, for example, the state being built by
the attacker is using registers _not_ in the cleared list). However:

 *useful action*
 *clear all registers*
 *ret*

Means that suddenly the ROP chain cannot use *any* of the caller-saved
registers to hold state.

So, while ...=used is likely going to block a lot, ...=all will block
even more. I'd prefer to have both available, if for no other reason
than to compare the ROP gadget availability for any given binary (e.g.
if some future attack is found that bypasses ...=used, does it also
bypass ...=all?)
Segher Boessenkool Sept. 10, 2020, 9:56 p.m. UTC | #95
On Tue, Sep 08, 2020 at 09:55:19AM -0500, Qing Zhao wrote:
> Downloading this paper form IEEE needs a fee.

Yes, and we cannot discuss it here.

> What other information you need to show the effective of mitigation ROP attack?

Anything that we *can* talk about.  Stuff we cannot talk about does not
let us progress in one way or the other.


Segher
Segher Boessenkool Sept. 10, 2020, 10:05 p.m. UTC | #96
On Tue, Sep 08, 2020 at 11:43:30AM -0500, Qing Zhao wrote:
> > On Sep 7, 2020, at 10:58 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> > On Mon, Sep 7, 2020 at 7:06 AM Segher Boessenkool
> > <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>> wrote:
> >> On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
> >>> On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
> >>> <segher@kernel.crashing.org> wrote:
> >>>> Very many normal returns do *not* pass through an epilogue, but are
> >>>> simple_return.  Disabling that is *much* more expensive than that 2%.
> >>> 
> >>> Sibcall isn't covered.  What other cases don't have an epilogue?
> >> 
> >> Shrink-wrapped stuff.  Quite important for performance.  Not something
> >> you can throw away.
> > 
> > Qing, can you check how it interacts with shrink-wrap?
> >> 

<snip>

> But I might miss some important  issues here, please let me know what I am missing here?

Start looking at handle_simple_exit()?  targetm.gen_simple_return()...


Segher
Qing Zhao Sept. 10, 2020, 10:40 p.m. UTC | #97
> On Sep 10, 2020, at 2:07 PM, Kees Cook <keescook@chromium.org> wrote:
> 
> [tried to clean up quoting...]
> 
> On Tue, Sep 08, 2020 at 10:00:09AM -0500, Qing Zhao wrote:
>> 
>>> On Sep 7, 2020, at 8:06 AM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com> wrote:
>>> 
>>>>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>>>>> So, my question is:
>>>>> 
>>>>> From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?  
>>>>> From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.
>>> 
>>> You are right, it does not provide additional security
>> 
>> Then, is it necessary to provide 
>> 
>> -fzero-call-used-regs=all-arg|all-gpr|all   to the user?
>> 
>> Can we just delete these 3 sub options?
> 
> Well... I'd say there is some benefit (remember that ROP gadgets are
> built from function trailers, so there is rarely a concern over what the
> rest of the function is doing). Generally, they are chained together
> based on just the last couple instructions:
> 
> *useful action*
> *ret*
> 
> So with ...=used this turns into:
> 
> *useful action*
> *clear some registers*
> *ret*
> 
> Which may still be helpful (if, for example, the state being built by
> the attacker is using registers _not_ in the cleared list). However:
> 
> *useful action*
> *clear all registers*
> *ret*
> 
> Means that suddenly the ROP chain cannot use *any* of the caller-saved
> registers to hold state.
> 
> So, while ...=used is likely going to block a lot, ...=all will block
> even more. I'd prefer to have both available,

Okay. I am fine with this. 

My biggest concern is the much bigger run-time overhead from zeroing those unused-registers.
Might need to mention the big run-time overhead in the users’s manual.

Qing
> if for no other reason
> than to compare the ROP gadget availability for any given binary (e.g.
> if some future attack is found that bypasses ...=used, does it also
> bypass ...=all?)
> 
> -- 
> Kees Cook
Qing Zhao Sept. 10, 2020, 10:50 p.m. UTC | #98
> On Sep 10, 2020, at 5:05 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Tue, Sep 08, 2020 at 11:43:30AM -0500, Qing Zhao wrote:
>>> On Sep 7, 2020, at 10:58 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, Sep 7, 2020 at 7:06 AM Segher Boessenkool
>>> <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>> wrote:
>>>> On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
>>>>> On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
>>>>> <segher@kernel.crashing.org> wrote:
>>>>>> Very many normal returns do *not* pass through an epilogue, but are
>>>>>> simple_return.  Disabling that is *much* more expensive than that 2%.
>>>>> 
>>>>> Sibcall isn't covered.  What other cases don't have an epilogue?
>>>> 
>>>> Shrink-wrapped stuff.  Quite important for performance.  Not something
>>>> you can throw away.
>>> 
>>> Qing, can you check how it interacts with shrink-wrap?
>>>> 
> 
> <snip>
> 
>> But I might miss some important  issues here, please let me know what I am missing here?
> 
> Start looking at handle_simple_exit()?  targetm.gen_simple_return()…

Yes, I have been looking at this since this morning. 
You are right, we also need to insert zeroing sequence before  this simple_return which the current patch missed.

I am currently try to resolve this issue with the following idea:

In the routine “thread_prologue_and_epilogue_insns”,  After both “make_epilogue_seq” and “try_shrink_wrapping” finished, 

Scan every exit block to see whether the last insn is a ANY_RETURN_P(insn), 
If YES, generate the zero sequence before this RETURN insn. 

Then we should take care all the exit path that returns.

Do you see any issue from this idea? 

Thanks a lot for your help.

Qing

> 
> 
> Segher
Richard Sandiford Sept. 11, 2020, 10:06 a.m. UTC | #99
Kees Cook via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> [tried to clean up quoting...]
>
> On Tue, Sep 08, 2020 at 10:00:09AM -0500, Qing Zhao wrote:
>> 
>> > On Sep 7, 2020, at 8:06 AM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com> wrote:
>> > 
>> >>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>> >>> So, my question is:
>> >>>
>> >>> From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?  
>> >>> From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.
>> >  
>> > You are right, it does not provide additional security
>> 
>> Then, is it necessary to provide 
>> 
>> -fzero-call-used-regs=all-arg|all-gpr|all   to the user?
>> 
>> Can we just delete these 3 sub options?
>
> Well... I'd say there is some benefit (remember that ROP gadgets are
> built from function trailers, so there is rarely a concern over what the
> rest of the function is doing). Generally, they are chained together
> based on just the last couple instructions:
>
>  *useful action*
>  *ret*
>
> So with ...=used this turns into:
>
>  *useful action*
>  *clear some registers*
>  *ret*
>
> Which may still be helpful (if, for example, the state being built by
> the attacker is using registers _not_ in the cleared list). However:
>
>  *useful action*
>  *clear all registers*
>  *ret*
>
> Means that suddenly the ROP chain cannot use *any* of the caller-saved
> registers to hold state.
>
> So, while ...=used is likely going to block a lot, ...=all will block
> even more. I'd prefer to have both available, if for no other reason
> than to compare the ROP gadget availability for any given binary (e.g.
> if some future attack is found that bypasses ...=used, does it also
> bypass ...=all?)

This might have already been discussed/answered, sorry, but:
when there's a choice, is there an obvious winner between:

(1) clearing call-clobbered registers and then restoring call-preserved ones
(2) restoring call-preserved registers and then clearing call-clobbered ones
    
Is one option more likely to be useful to attackers than the other?

(For some frames, it might be necessary to use a small number of
call-clobbered registers to perform the restore sequence, so (1)
wouldn't be fully achievable in all cases.)

Thanks,
Richard
Segher Boessenkool Sept. 11, 2020, 4:14 p.m. UTC | #100
On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote:
> This might have already been discussed/answered, sorry, but:
> when there's a choice, is there an obvious winner between:
> 
> (1) clearing call-clobbered registers and then restoring call-preserved ones
> (2) restoring call-preserved registers and then clearing call-clobbered ones
>     
> Is one option more likely to be useful to attackers than the other?
> 
> (For some frames, it might be necessary to use a small number of
> call-clobbered registers to perform the restore sequence, so (1)
> wouldn't be fully achievable in all cases.)

The same is true for what you have to do *after* restoring registers, as
I said before.  Clearing all is not correct in all cases, and also it is
not useful in all cases (code right after it might write the registers
again.

This really is very (sub-)target-specific, it cannot be done by generic
code on its own *at all*.


Segher
Qing Zhao Sept. 11, 2020, 4:52 p.m. UTC | #101
> On Sep 11, 2020, at 11:14 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote:
>> This might have already been discussed/answered, sorry, but:
>> when there's a choice, is there an obvious winner between:
>> 
>> (1) clearing call-clobbered registers and then restoring call-preserved ones
>> (2) restoring call-preserved registers and then clearing call-clobbered ones
>> 
>> Is one option more likely to be useful to attackers than the other?

for mitigating ROP purpose, I think that (2) is better than (1). i.e, the clearing
call-clobbered register sequence will be immediately before “ret” instruction. 
This will prevent the gadget from doing any useful things.

>> 
>> (For some frames, it might be necessary to use a small number of
>> call-clobbered registers to perform the restore sequence, so (1)
>> wouldn't be fully achievable in all cases.)
> 

Yes, looks like that (1) is also not correct.

> The same is true for what you have to do *after* restoring registers, as
> I said before.  Clearing all is not correct in all cases, and also it is
> not useful in all cases (code right after it might write the registers
> again.

I don’t understand why it’s not correct if we clearing call-clobbered registers 
AFTER restoring call-preserved registers?

Even though we might need to use some call-clobbered registers to restore 
the call-preserved registers, after the restoring is done, we can use data flow
to make sure the call-clobbered registers not lived at that point anymore, then
Clearing those not-lived call-clobbered registers immediately before “ret”.

For me, this should be correct. 

Let me know anything I am missing here.

Thanks.

Qing



> 
> This really is very (sub-)target-specific, it cannot be done by generic
> code on its own *at all*.
> 
> 
> Segher
Segher Boessenkool Sept. 11, 2020, 5:13 p.m. UTC | #102
On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
> I don’t understand why it’s not correct if we clearing call-clobbered registers 
> AFTER restoring call-preserved registers?

Because the compiler backend (or the linker!  Or the dynamic linker!
Etc.) can use volatile registers for their own purposes.

Like, on Power, r11 and r12 are used for various calling convention
purposes; they are also used for other purposes; and r0 is used as some
all-purpose volatile (it typically holds the return address near the
end of a function).

"Call-clobbered" is pretty meaningless.  It only holds meaning for a
function calling another, and then only to know which registers lose
their value then.  It has no semantics for other cases, like a function
that will return soonish, as here.


Segher
Segher Boessenkool Sept. 11, 2020, 5:18 p.m. UTC | #103
On Thu, Sep 10, 2020 at 05:50:40PM -0500, Qing Zhao wrote:
> >>>> Shrink-wrapped stuff.  Quite important for performance.  Not something
> >>>> you can throw away.

^^^ !!! ^^^

> > Start looking at handle_simple_exit()?  targetm.gen_simple_return()…
> 
> Yes, I have been looking at this since this morning. 
> You are right, we also need to insert zeroing sequence before  this simple_return which the current patch missed.

Please run the performance loss numbers again after you have something
more realistic :-(

> I am currently try to resolve this issue with the following idea:
> 
> In the routine “thread_prologue_and_epilogue_insns”,  After both “make_epilogue_seq” and “try_shrink_wrapping” finished, 
> 
> Scan every exit block to see whether the last insn is a ANY_RETURN_P(insn), 
> If YES, generate the zero sequence before this RETURN insn. 
> 
> Then we should take care all the exit path that returns.
> 
> Do you see any issue from this idea? 

You need to let the backend decide what to do, for this as well as for
all other cases.  I do not know how often I will have to repeat that.

There also is separate shrink-wrapping, which you haven't touched on at
all yet.  Joy.


Segher
Richard Sandiford Sept. 11, 2020, 5:32 p.m. UTC | #104
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 11, 2020, at 11:14 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> 
>> On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote:
>>> This might have already been discussed/answered, sorry, but:
>>> when there's a choice, is there an obvious winner between:
>>> 
>>> (1) clearing call-clobbered registers and then restoring call-preserved ones
>>> (2) restoring call-preserved registers and then clearing call-clobbered ones
>>> 
>>> Is one option more likely to be useful to attackers than the other?
>
> for mitigating ROP purpose, I think that (2) is better than (1). i.e, the clearing
> call-clobbered register sequence will be immediately before “ret” instruction. 
> This will prevent the gadget from doing any useful things.

OK.  The reason I was asking was that (from the naive perspective of
someone not well versed in this stuff): if the effect of one of the
register restores is itself a useful gadget, the clearing wouldn't
protect against it.  But if the register restores are not part of the
intended effect, it seemed that having them immediately before the
ret might make the gadget harder to use than clearing registers would,
because the side-effects of restores would be harder to control than the
(predictable) effect of clearing registers.

But like I say, this is very much not my area of expertise, so that's
probably missing the point in a major way. ;-)

I think the original patch plugged into pass_thread_prologue_and_epilogue,
is that right?  If we go for (2), then I think it would be better to do
it at the start of pass_late_compilation instead.  (Some targets wouldn't
cope with doing it later.)  The reason for doing it so late is that the
set of used “volatile”/caller-saved registers is not fixed at prologue
and epilogue generation: later optimisation passes can introduce uses
of volatile registers that weren't used previously.  (Sorry if this
has already been suggested.)

Unlike Segher, I think this can/should be done in target-independent
code as far as possible (like the patch seemed to do).

Thanks,
Richard
Qing Zhao Sept. 11, 2020, 7:40 p.m. UTC | #105
> On Sep 11, 2020, at 12:13 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
>> I don’t understand why it’s not correct if we clearing call-clobbered registers 
>> AFTER restoring call-preserved registers?
> 
> Because the compiler backend (or the linker!  Or the dynamic linker!
> Etc.) can use volatile registers for their own purposes.

For the following sequence at the end of a routine:

*...*
“restore call-preserved registers”
*clear call-clobbered registers"*
*ret*

“Clear call-clobbered registers” will only clear the call-clobbered registers that are not live at the end of the routine.
If the call-clobbered register is live at the end of the routine, for example, holding the return value,
It will NOT be cleared at all.  

If the call-clobbered register has some other usage after the routine return, then the backend should know this and will not
clear it. Then we will resolve this issue, right?


> 
> Like, on Power, r11 and r12 are used for various calling convention
> purposes; they are also used for other purposes; and r0 is used as some
> all-purpose volatile (it typically holds the return address near the
> end of a function).

In the new version of the patch,  the implementation of clearing call-clobbered registers is done in backend, middle end only 
computes a hard register set based on user option, source attribute, data flow information, and function abi information, and
Then pass this hard register set to the target hook to generate the clearing sequence.  The backend will have all the details
on the special situations you mentioned. 

Let me know any more concerns here.

thanks.

Qing

> 
> "Call-clobbered" is pretty meaningless.  It only holds meaning for a
> function calling another, and then only to know which registers lose
> their value then.  It has no semantics for other cases, like a function
> that will return soonish, as here.
> 
> 
> Segher
Qing Zhao Sept. 11, 2020, 7:53 p.m. UTC | #106
> On Sep 11, 2020, at 12:18 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Thu, Sep 10, 2020 at 05:50:40PM -0500, Qing Zhao wrote:
>>>>>> Shrink-wrapped stuff.  Quite important for performance.  Not something
>>>>>> you can throw away.
> 
> ^^^ !!! ^^^
> 
>>> Start looking at handle_simple_exit()?  targetm.gen_simple_return()…
>> 
>> Yes, I have been looking at this since this morning. 
>> You are right, we also need to insert zeroing sequence before  this simple_return which the current patch missed.
> 
> Please run the performance loss numbers again after you have something
> more realistic :-(

Yes, I will collect the performance data with the new patch. 

> 
>> I am currently try to resolve this issue with the following idea:
>> 
>> In the routine “thread_prologue_and_epilogue_insns”,  After both “make_epilogue_seq” and “try_shrink_wrapping” finished, 
>> 
>> Scan every exit block to see whether the last insn is a ANY_RETURN_P(insn), 
>> If YES, generate the zero sequence before this RETURN insn. 
>> 
>> Then we should take care all the exit path that returns.
>> 
>> Do you see any issue from this idea? 
> 
> You need to let the backend decide what to do, for this as well as for
> all other cases.  I do not know how often I will have to repeat that.

Yes, the new patch will separate the whole task into two parts:

A. Compute the hard register set based on user option, source code attribute, data flow information, function abi information, 
     The result will be “need_zeroed_register_set”, and then pass this hard reg set to the target hook.
B. Each target will have it’s own implementation of emitting the zeroing sequence based on the “need_zeroed_register_set”.


> 
> There also is separate shrink-wrapping, which you haven't touched on at
> all yet.  Joy.

Yes, in addition to shrink-wrapping, I also noticed that there are other places that generate “simple_return” or “return” that are not in
The epilogue, for example, in “dbr” phase (delay_slots phase), in “mach” phase (machine reorg phase), etc. 

So, only generate zeroing sequence in epilogue is not enough. 

Hongjiu and I discussed this more, and we came up with a new implementation, I will describe this new implementation in another email later.

Thanks.

Qing
> 
> 
> Segher
Segher Boessenkool Sept. 11, 2020, 8:01 p.m. UTC | #107
On Fri, Sep 11, 2020 at 06:32:56PM +0100, Richard Sandiford wrote:
> Unlike Segher, I think this can/should be done in target-independent
> code as far as possible (like the patch seemed to do).

My problem with that is that it is both incorrect *and* inefficient.  It
writes registers it should not touch; and some of those will be written
with other values later again anyway; and if the goal is to clear as
many parameter passing registers as possible, so why does it touch
others at all?  This makes no sense.

Only the backend knows which registers it can write when.


Segher
Segher Boessenkool Sept. 11, 2020, 8:05 p.m. UTC | #108
On Fri, Sep 11, 2020 at 02:40:06PM -0500, Qing Zhao wrote:
> > On Sep 11, 2020, at 12:13 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
> >> I don’t understand why it’s not correct if we clearing call-clobbered registers 
> >> AFTER restoring call-preserved registers?
> > 
> > Because the compiler backend (or the linker!  Or the dynamic linker!
> > Etc.) can use volatile registers for their own purposes.
> 
> For the following sequence at the end of a routine:
> 
> *...*
> “restore call-preserved registers”
> *clear call-clobbered registers"*
> *ret*
> 
> “Clear call-clobbered registers” will only clear the call-clobbered registers that are not live at the end of the routine.

And they can be written again right after the routine, by linker-
generated code for example.  This is a waste.

> In the new version of the patch,  the implementation of clearing call-clobbered registers is done in backend, middle end only 
> computes a hard register set based on user option, source attribute, data flow information, and function abi information, and
> Then pass this hard register set to the target hook to generate the clearing sequence.  The backend will have all the details
> on the special situations you mentioned. 
> 
> Let me know any more concerns here.

I cannot find that patch?


Segher
Qing Zhao Sept. 11, 2020, 8:14 p.m. UTC | #109
> On Sep 11, 2020, at 12:32 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> On Sep 11, 2020, at 11:14 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> 
>>> On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote:
>>>> This might have already been discussed/answered, sorry, but:
>>>> when there's a choice, is there an obvious winner between:
>>>> 
>>>> (1) clearing call-clobbered registers and then restoring call-preserved ones
>>>> (2) restoring call-preserved registers and then clearing call-clobbered ones
>>>> 
>>>> Is one option more likely to be useful to attackers than the other?
>> 
>> for mitigating ROP purpose, I think that (2) is better than (1). i.e, the clearing
>> call-clobbered register sequence will be immediately before “ret” instruction. 
>> This will prevent the gadget from doing any useful things.
> 
> OK.  The reason I was asking was that (from the naive perspective of
> someone not well versed in this stuff): if the effect of one of the
> register restores is itself a useful gadget, the clearing wouldn't
> protect against it.  But if the register restores are not part of the
> intended effect, it seemed that having them immediately before the
> ret might make the gadget harder to use than clearing registers would,
> because the side-effects of restores would be harder to control than the
> (predictable) effect of clearing registers.
> 
> But like I say, this is very much not my area of expertise, so that's
> probably missing the point in a major way. ;-)

I am not an expert on the security area either. :-)

My understanding of how this scheme helps ROP is:  the attacker usually uses scratch register to pass
parameters to the sys call in the gadget, if clearing the scratch registers immediately before “ret”, then 
The parameters that are passed to sys call will be destroyed, therefore, the attack will likely failed.

So, clearing the scratch registers immediately before “ret” will be very helpful to mitigate ROP.

> 
> I think the original patch plugged into pass_thread_prologue_and_epilogue,
> is that right?

Yes.

>  If we go for (2), then I think it would be better to do
> it at the start of pass_late_compilation instead.  (Some targets wouldn't
> cope with doing it later.)  The reason for doing it so late is that the
> set of used “volatile”/caller-saved registers is not fixed at prologue
> and epilogue generation: later optimisation passes can introduce uses
> of volatile registers that weren't used previously.  (Sorry if this
> has already been suggested.)

Yes, I agree.

I thought that it might be better to move this task at the very late of the RTL stage, i.e, before “final” phase. 

Another solution is (discussed with Hongjiu):

1. Define a new target hook:

targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET need_zeroed_hardregs, bool gpr_only)

2. Add the following routine in middle end:

rtx_insn *
generate_return_rtx (bool simple_return_p)
{
  if (targetm.return_with_zeroing)
    {
      Compute the hardregs set for clearing into “need_zeroed_hardregs”;
     return targetm.return_with_zeroing (simple_return_p, need_zeroed_hardregs, gpr_only);
   }
 else
    {
     if (simple_return_p)
       return targetm.gen_simple_return ( );
    else
       return targetm.gen_return ();
  }
}

Then replace all call to “targetm.gen_simple_return” and “targetm.gen_return” to “generate_return_rtx()”.

3. In the target, 
Implement “return_with_zeroing”.


Let me know your comments on this.

Thanks a lot.

Qing
> 
> Unlike Segher, I think this can/should be done in target-independent
> code as far as possible (like the patch seemed to do).
> 
> Thanks,
> Richard
Qing Zhao Sept. 11, 2020, 8:17 p.m. UTC | #110
> On Sep 11, 2020, at 3:05 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 11, 2020 at 02:40:06PM -0500, Qing Zhao wrote:
>>> On Sep 11, 2020, at 12:13 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
>>>> I don’t understand why it’s not correct if we clearing call-clobbered registers 
>>>> AFTER restoring call-preserved registers?
>>> 
>>> Because the compiler backend (or the linker!  Or the dynamic linker!
>>> Etc.) can use volatile registers for their own purposes.
>> 
>> For the following sequence at the end of a routine:
>> 
>> *...*
>> “restore call-preserved registers”
>> *clear call-clobbered registers"*
>> *ret*
>> 
>> “Clear call-clobbered registers” will only clear the call-clobbered registers that are not live at the end of the routine.
> 
> And they can be written again right after the routine, by linker-
> generated code for example.  This is a waste.
> 
>> In the new version of the patch,  the implementation of clearing call-clobbered registers is done in backend, middle end only 
>> computes a hard register set based on user option, source attribute, data flow information, and function abi information, and
>> Then pass this hard register set to the target hook to generate the clearing sequence.  The backend will have all the details
>> on the special situations you mentioned. 
>> 
>> Let me know any more concerns here.
> 
> I cannot find that patch?

Haven’t finished yet. -:).

Qing
> 
> 
> Segher
Segher Boessenkool Sept. 11, 2020, 8:36 p.m. UTC | #111
On Fri, Sep 11, 2020 at 03:17:19PM -0500, Qing Zhao wrote:
> > On Sep 11, 2020, at 3:05 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > On Fri, Sep 11, 2020 at 02:40:06PM -0500, Qing Zhao wrote:
> >>> On Sep 11, 2020, at 12:13 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >>> On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
> >>>> I don’t understand why it’s not correct if we clearing call-clobbered registers 
> >>>> AFTER restoring call-preserved registers?
> >>> 
> >>> Because the compiler backend (or the linker!  Or the dynamic linker!
> >>> Etc.) can use volatile registers for their own purposes.
> >> 
> >> For the following sequence at the end of a routine:
> >> 
> >> *...*
> >> “restore call-preserved registers”
> >> *clear call-clobbered registers"*
> >> *ret*
> >> 
> >> “Clear call-clobbered registers” will only clear the call-clobbered registers that are not live at the end of the routine.
> > 
> > And they can be written again right after the routine, by linker-
> > generated code for example.  This is a waste.
> > 
> >> In the new version of the patch,  the implementation of clearing call-clobbered registers is done in backend, middle end only 
> >> computes a hard register set based on user option, source attribute, data flow information, and function abi information, and
> >> Then pass this hard register set to the target hook to generate the clearing sequence.  The backend will have all the details
> >> on the special situations you mentioned. 
> >> 
> >> Let me know any more concerns here.
> > 
> > I cannot find that patch?
> 
> Haven’t finished yet. -:).

Ah okay :-)

If you have, please send it in a new thread (not as a reply)?  So that
it will be much easirer to handle :-)


Segher
Segher Boessenkool Sept. 11, 2020, 9:03 p.m. UTC | #112
Hi!

On Fri, Sep 11, 2020 at 03:14:57PM -0500, Qing Zhao wrote:
> My understanding of how this scheme helps ROP is:  the attacker usually uses scratch register to pass

Help obstruct ROP ;-)

> parameters to the sys call in the gadget, if clearing the scratch registers immediately before “ret”, then 
> The parameters that are passed to sys call will be destroyed, therefore, the attack will likely failed.

But you do not need more than one non-zero argument for execv*, and that
is usually the same register as the normal return value register; all
other registers *should* be zero for a simple execv*("/bin/sh", ...)!

(There is also the system call number register, rax on x86-64, but if
overwriting that would be any effective, you could just do that one
always and everywhere.  This is only an effective defence if there are
no gadgets that do the system call an attacker wants, and he has to
construct that sequence himself; but it very effective and cheap then).


Segher
Qing Zhao Sept. 11, 2020, 9:12 p.m. UTC | #113
> On Sep 11, 2020, at 3:36 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 11, 2020 at 03:17:19PM -0500, Qing Zhao wrote:
>>> On Sep 11, 2020, at 3:05 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> On Fri, Sep 11, 2020 at 02:40:06PM -0500, Qing Zhao wrote:
>>>>> On Sep 11, 2020, at 12:13 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>> On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
>>>>>> I don’t understand why it’s not correct if we clearing call-clobbered registers 
>>>>>> AFTER restoring call-preserved registers?
>>>>> 
>>>>> Because the compiler backend (or the linker!  Or the dynamic linker!
>>>>> Etc.) can use volatile registers for their own purposes.
>>>> 
>>>> For the following sequence at the end of a routine:
>>>> 
>>>> *...*
>>>> “restore call-preserved registers”
>>>> *clear call-clobbered registers"*
>>>> *ret*
>>>> 
>>>> “Clear call-clobbered registers” will only clear the call-clobbered registers that are not live at the end of the routine.
>>> 
>>> And they can be written again right after the routine, by linker-
>>> generated code for example.  This is a waste.
>>> 
>>>> In the new version of the patch,  the implementation of clearing call-clobbered registers is done in backend, middle end only 
>>>> computes a hard register set based on user option, source attribute, data flow information, and function abi information, and
>>>> Then pass this hard register set to the target hook to generate the clearing sequence.  The backend will have all the details
>>>> on the special situations you mentioned. 
>>>> 
>>>> Let me know any more concerns here.
>>> 
>>> I cannot find that patch?
>> 
>> Haven’t finished yet. -:).
> 
> Ah okay :-)
> 
> If you have, please send it in a new thread (not as a reply)?  So that
> it will be much easirer to handle :-)

Okay. Will do.

Qing
> 
> 
> Segher
Qing Zhao Sept. 11, 2020, 9:29 p.m. UTC | #114
> On Sep 11, 2020, at 4:03 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Fri, Sep 11, 2020 at 03:14:57PM -0500, Qing Zhao wrote:
>> My understanding of how this scheme helps ROP is:  the attacker usually uses scratch register to pass
> 
> Help obstruct ROP ;-)
Thanks for catching my mistake.
> 
>> parameters to the sys call in the gadget, if clearing the scratch registers immediately before “ret”, then 
>> The parameters that are passed to sys call will be destroyed, therefore, the attack will likely failed.
> 
> But you do not need more than one non-zero argument for execv*, and that
> is usually the same register as the normal return value register; all
> other registers *should* be zero for a simple execv*("/bin/sh", ...)!
> 
> (There is also the system call number register, rax on x86-64, but if
> overwriting that would be any effective, you could just do that one
> always and everywhere.  This is only an effective defence if there are
> no gadgets that do the system call an attacker wants, and he has to
> construct that sequence himself; but it very effective and cheap then).

In the above, do you mean only clearing “rax” on x86-64 should be effective enough? 

Qing
> 
> 
> Segher
Richard Sandiford Sept. 11, 2020, 9:44 p.m. UTC | #115
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford <richard.sandiford@arm.com> >>  If we go for (2), then I think it would be better to do
>> it at the start of pass_late_compilation instead.  (Some targets wouldn't
>> cope with doing it later.)  The reason for doing it so late is that the
>> set of used “volatile”/caller-saved registers is not fixed at prologue
>> and epilogue generation: later optimisation passes can introduce uses
>> of volatile registers that weren't used previously.  (Sorry if this
>> has already been suggested.)
>
> Yes, I agree.
>
> I thought that it might be better to move this task at the very late of the RTL stage, i.e, before “final” phase. 
>
> Another solution is (discussed with Hongjiu):
>
> 1. Define a new target hook:
>
> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET need_zeroed_hardregs, bool gpr_only)
>
> 2. Add the following routine in middle end:
>
> rtx_insn *
> generate_return_rtx (bool simple_return_p)
> {
>   if (targetm.return_with_zeroing)
>     {
>       Compute the hardregs set for clearing into “need_zeroed_hardregs”;
>      return targetm.return_with_zeroing (simple_return_p, need_zeroed_hardregs, gpr_only);
>    }
>  else
>     {
>      if (simple_return_p)
>        return targetm.gen_simple_return ( );
>     else
>        return targetm.gen_return ();
>   }
> }
>
> Then replace all call to “targetm.gen_simple_return” and “targetm.gen_return” to “generate_return_rtx()”.
>
> 3. In the target, 
> Implement “return_with_zeroing”.
>
>
> Let me know your comments on this.

I think having a separate pass is better.  We don't normally know
at the point of generating the return which registers will need
to be cleared.  So IMO the pass should just search for all the
returns in a function and insert the zeroing instructions before
each one.

Having a target hook sounds good, but I think it should have a
default definition that just uses the move patterns to zero each
selected register.  I expect the default will be good enough for
most targets.

Thanks,
Richard
Segher Boessenkool Sept. 11, 2020, 9:51 p.m. UTC | #116
On Fri, Sep 11, 2020 at 04:29:16PM -0500, Qing Zhao wrote:
> > On Sep 11, 2020, at 4:03 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> The parameters that are passed to sys call will be destroyed, therefore, the attack will likely failed.
> > 
> > But you do not need more than one non-zero argument for execv*, and that
> > is usually the same register as the normal return value register; all
> > other registers *should* be zero for a simple execv*("/bin/sh", ...)!
> > 
> > (There is also the system call number register, rax on x86-64, but if
> > overwriting that would be any effective, you could just do that one
> > always and everywhere.  This is only an effective defence if there are
> > no gadgets that do the system call an attacker wants, and he has to
> > construct that sequence himself; but it very effective and cheap then).
> 
> In the above, do you mean only clearing “rax” on x86-64 should be effective enough? 

(rax=0 is "read", you might want to do another value, but that's just
details.)

"This is only an effective defence if there are
no gadgets that do the system call an attacker wants, and he has to
construct that sequence himself; but it very effective and cheap then)."

It is definitely *not* effective if there are gadgets that set rax to
a value the attacker wants and then do a syscall.  It of course is quite
effective in breaking a ROP chain of (set rax) -> (syscall).  How
effective it is in practice, I have no idea.

My point was that your proposed scheme does not protect the other
syscall parameters very much either.

And, hrm, rax is actually the first return value.  On most ABIs the
same registers are used for arguments and for return values, I got
confused.  Sorry.  So this cannot be very effective for x86-64 no
matter what.


Segher
Qing Zhao Sept. 11, 2020, 10:24 p.m. UTC | #117
> On Sep 11, 2020, at 4:44 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford <richard.sandiford@arm.com> >>  If we go for (2), then I think it would be better to do
>>> it at the start of pass_late_compilation instead.  (Some targets wouldn't
>>> cope with doing it later.)  The reason for doing it so late is that the
>>> set of used “volatile”/caller-saved registers is not fixed at prologue
>>> and epilogue generation: later optimisation passes can introduce uses
>>> of volatile registers that weren't used previously.  (Sorry if this
>>> has already been suggested.)
>> 
>> Yes, I agree.
>> 
>> I thought that it might be better to move this task at the very late of the RTL stage, i.e, before “final” phase. 
>> 
>> Another solution is (discussed with Hongjiu):
>> 
>> 1. Define a new target hook:
>> 
>> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET need_zeroed_hardregs, bool gpr_only)
>> 
>> 2. Add the following routine in middle end:
>> 
>> rtx_insn *
>> generate_return_rtx (bool simple_return_p)
>> {
>>  if (targetm.return_with_zeroing)
>>    {
>>      Compute the hardregs set for clearing into “need_zeroed_hardregs”;
>>     return targetm.return_with_zeroing (simple_return_p, need_zeroed_hardregs, gpr_only);
>>   }
>> else
>>    {
>>     if (simple_return_p)
>>       return targetm.gen_simple_return ( );
>>    else
>>       return targetm.gen_return ();
>>  }
>> }
>> 
>> Then replace all call to “targetm.gen_simple_return” and “targetm.gen_return” to “generate_return_rtx()”.
>> 
>> 3. In the target, 
>> Implement “return_with_zeroing”.
>> 
>> 
>> Let me know your comments on this.
> 
> I think having a separate pass is better.  We don't normally know
> at the point of generating the return which registers will need
> to be cleared.  

At the point of generating the return, we can compute the “need_zeroed_hardregs” HARD_REG_SET 
by using data flow information, the function abi of the routine, and also the user option and source code 
attribute information together. These information should be available at each point when generating the return.


> So IMO the pass should just search for all the
> returns in a function and insert the zeroing instructions before
> each one.

I was considering this approach too for some time, however, there is one major issue with this as 
Segher mentioned, The middle end does not know some details on the registers, lacking such 
detailed information might result incorrect code generation at middle end.

For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be 
used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end
doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be 
generated. 

Segher also mentioned that on Power, there are some scratch registers also are used for 
Other purpose, clearing them before return is not correct. 


> 
> Having a target hook sounds good, but I think it should have a
> default definition that just uses the move patterns to zero each
> selected register.  I expect the default will be good enough for
> most targets.

Based on the above, I think that generating the zeroing instructions at middle end is not correct. 

Thanks.

Qing
> 
> Thanks,
> Richard
Qing Zhao Sept. 11, 2020, 10:41 p.m. UTC | #118
> On Sep 11, 2020, at 4:51 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 11, 2020 at 04:29:16PM -0500, Qing Zhao wrote:
>>> On Sep 11, 2020, at 4:03 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>> The parameters that are passed to sys call will be destroyed, therefore, the attack will likely failed.
>>> 
>>> But you do not need more than one non-zero argument for execv*, and that
>>> is usually the same register as the normal return value register; all
>>> other registers *should* be zero for a simple execv*("/bin/sh", ...)!
>>> 
>>> (There is also the system call number register, rax on x86-64, but if
>>> overwriting that would be any effective, you could just do that one
>>> always and everywhere.  This is only an effective defence if there are
>>> no gadgets that do the system call an attacker wants, and he has to
>>> construct that sequence himself; but it very effective and cheap then).
>> 
>> In the above, do you mean only clearing “rax” on x86-64 should be effective enough? 
> 
> (rax=0 is "read", you might want to do another value, but that's just
> details.)
> 
> "This is only an effective defence if there are
> no gadgets that do the system call an attacker wants, and he has to
> construct that sequence himself; but it very effective and cheap then)."
> 
> It is definitely *not* effective if there are gadgets that set rax to
> a value the attacker wants and then do a syscall.

You mean the following gadget:


Gadget 1:

mov  rax,  value
syscall
ret

Qing

> It of course is quite
> effective in breaking a ROP chain of (set rax) -> (syscall).  How
> effective it is in practice, I have no idea.
> 
> My point was that your proposed scheme does not protect the other
> syscall parameters very much either.
> 
> And, hrm, rax is actually the first return value.  On most ABIs the
> same registers are used for arguments and for return values, I got
> confused.  Sorry.  So this cannot be very effective for x86-64 no
> matter what.
> 
> 
> Segher
Richard Sandiford Sept. 11, 2020, 10:56 p.m. UTC | #119
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 11, 2020, at 4:44 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford <richard.sandiford@arm.com> >>  If we go for (2), then I think it would be better to do
>>>> it at the start of pass_late_compilation instead.  (Some targets wouldn't
>>>> cope with doing it later.)  The reason for doing it so late is that the
>>>> set of used “volatile”/caller-saved registers is not fixed at prologue
>>>> and epilogue generation: later optimisation passes can introduce uses
>>>> of volatile registers that weren't used previously.  (Sorry if this
>>>> has already been suggested.)
>>> 
>>> Yes, I agree.
>>> 
>>> I thought that it might be better to move this task at the very late of the RTL stage, i.e, before “final” phase. 
>>> 
>>> Another solution is (discussed with Hongjiu):
>>> 
>>> 1. Define a new target hook:
>>> 
>>> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET need_zeroed_hardregs, bool gpr_only)
>>> 
>>> 2. Add the following routine in middle end:
>>> 
>>> rtx_insn *
>>> generate_return_rtx (bool simple_return_p)
>>> {
>>>  if (targetm.return_with_zeroing)
>>>    {
>>>      Compute the hardregs set for clearing into “need_zeroed_hardregs”;
>>>     return targetm.return_with_zeroing (simple_return_p, need_zeroed_hardregs, gpr_only);
>>>   }
>>> else
>>>    {
>>>     if (simple_return_p)
>>>       return targetm.gen_simple_return ( );
>>>    else
>>>       return targetm.gen_return ();
>>>  }
>>> }
>>> 
>>> Then replace all call to “targetm.gen_simple_return” and “targetm.gen_return” to “generate_return_rtx()”.
>>> 
>>> 3. In the target, 
>>> Implement “return_with_zeroing”.
>>> 
>>> 
>>> Let me know your comments on this.
>> 
>> I think having a separate pass is better.  We don't normally know
>> at the point of generating the return which registers will need
>> to be cleared.  
>
> At the point of generating the return, we can compute the “need_zeroed_hardregs” HARD_REG_SET 
> by using data flow information, the function abi of the routine, and also the user option and source code 
> attribute information together. These information should be available at each point when generating the return.

Like I mentioned earlier though, passes that run after
pass_thread_prologue_and_epilogue can use call-clobbered registers that
weren't previously used.  For example, on x86_64, the function might
not use %r8 when the prologue, epilogue and returns are generated,
but pass_regrename might later introduce a new use of %r8.  AIUI,
the “used” version of the new command-line option is supposed to clear
%r8 in these circumstances, but it wouldn't do so if the data was
collected at the point that the return is generated.

That's why I think it's more robust to do this later (at the beginning
of pass_late_compilation) and insert the zeroing before returns that
already exist.

>> So IMO the pass should just search for all the
>> returns in a function and insert the zeroing instructions before
>> each one.
>
> I was considering this approach too for some time, however, there is one major issue with this as 
> Segher mentioned, The middle end does not know some details on the registers, lacking such 
> detailed information might result incorrect code generation at middle end.
>
> For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be 
> used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end
> doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be 
> generated. 
>
> Segher also mentioned that on Power, there are some scratch registers also are used for 
> Other purpose, clearing them before return is not correct. 

But the dataflow information has to be correct between
pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
any pass in that region could clobber the registers in the same way.

To get the registers that are live before the return, you can start with
the registers that are live out from the block that contains the return,
then “simulate” the return instruction backwards to get the set of
registers that are live before the return instruction
(see df_simulate_one_insn_backwards).

In the x86_64 case you mention, the pattern is:

(define_insn "*simple_return_indirect_internal<mode>"
  [(simple_return)
   (use (match_operand:W 0 "register_operand" "r"))]
  "reload_completed"
  …)

This (use …) tells the df machinery that the instruction needs
operand 0 (= ecx).  The process above would therefore realise
that ecx can't be clobbered.

Thanks,
Richard
Qing Zhao Sept. 14, 2020, 2:56 p.m. UTC | #120
Hi, Richard,

> On Sep 11, 2020, at 5:56 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 11, 2020, at 4:44 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford <richard.sandiford@arm.com> >>  If we go for (2), then I think it would be better to do
>>>>> it at the start of pass_late_compilation instead.  (Some targets wouldn't
>>>>> cope with doing it later.)  The reason for doing it so late is that the
>>>>> set of used “volatile”/caller-saved registers is not fixed at prologue
>>>>> and epilogue generation: later optimisation passes can introduce uses
>>>>> of volatile registers that weren't used previously.  (Sorry if this
>>>>> has already been suggested.)
>>>> 
>>>> Yes, I agree.
>>>> 
>>>> I thought that it might be better to move this task at the very late of the RTL stage, i.e, before “final” phase. 
>>>> 
>>>> Another solution is (discussed with Hongjiu):
>>>> 
>>>> 1. Define a new target hook:
>>>> 
>>>> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET need_zeroed_hardregs, bool gpr_only)
>>>> 
>>>> 2. Add the following routine in middle end:
>>>> 
>>>> rtx_insn *
>>>> generate_return_rtx (bool simple_return_p)
>>>> {
>>>> if (targetm.return_with_zeroing)
>>>>   {
>>>>     Compute the hardregs set for clearing into “need_zeroed_hardregs”;
>>>>    return targetm.return_with_zeroing (simple_return_p, need_zeroed_hardregs, gpr_only);
>>>>  }
>>>> else
>>>>   {
>>>>    if (simple_return_p)
>>>>      return targetm.gen_simple_return ( );
>>>>   else
>>>>      return targetm.gen_return ();
>>>> }
>>>> }
>>>> 
>>>> Then replace all call to “targetm.gen_simple_return” and “targetm.gen_return” to “generate_return_rtx()”.
>>>> 
>>>> 3. In the target, 
>>>> Implement “return_with_zeroing”.
>>>> 
>>>> 
>>>> Let me know your comments on this.
>>> 
>>> I think having a separate pass is better.  We don't normally know
>>> at the point of generating the return which registers will need
>>> to be cleared.  
>> 
>> At the point of generating the return, we can compute the “need_zeroed_hardregs” HARD_REG_SET 
>> by using data flow information, the function abi of the routine, and also the user option and source code 
>> attribute information together. These information should be available at each point when generating the return.
> 
> Like I mentioned earlier though, passes that run after
> pass_thread_prologue_and_epilogue can use call-clobbered registers that
> weren't previously used.  For example, on x86_64, the function might
> not use %r8 when the prologue, epilogue and returns are generated,
> but pass_regrename might later introduce a new use of %r8.  AIUI,
> the “used” version of the new command-line option is supposed to clear
> %r8 in these circumstances, but it wouldn't do so if the data was
> collected at the point that the return is generated.

Thanks for the information.

> 
> That's why I think it's more robust to do this later (at the beginning
> of pass_late_compilation) and insert the zeroing before returns that
> already exist.

Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
New pass as late as possible?

Can I put it immediately before “pass_final”? What’s the latest place I can put it?


> 
>>> So IMO the pass should just search for all the
>>> returns in a function and insert the zeroing instructions before
>>> each one.
>> 
>> I was considering this approach too for some time, however, there is one major issue with this as 
>> Segher mentioned, The middle end does not know some details on the registers, lacking such 
>> detailed information might result incorrect code generation at middle end.
>> 
>> For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be 
>> used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end
>> doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be 
>> generated. 
>> 
>> Segher also mentioned that on Power, there are some scratch registers also are used for 
>> Other purpose, clearing them before return is not correct. 
> 
> But the dataflow information has to be correct between
> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
> any pass in that region could clobber the registers in the same way.

You mean, the data flow information will be not correct after pass_free_cfg? 
 “pass_delay_slots” is after “pass_free_cfg”,  and there might be new “return” generated in “pass_delay_slots”, 
If we want to generate zeroing for the new “return” which was generated in “pass_delay_slots”, can we correctly to do so?

> 
> To get the registers that are live before the return, you can start with
> the registers that are live out from the block that contains the return,
> then “simulate” the return instruction backwards to get the set of
> registers that are live before the return instruction
> (see df_simulate_one_insn_backwards).

Okay. 
Currently, I am using the following to check whether a reg is live out the block that contains the return:

/* Check whether the hard register REGNO is live at the exit block
 * of the current routine.  */
static bool
is_live_reg_at_exit (unsigned int regno)
{
  edge e;
  edge_iterator ei;

  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
    {
      bitmap live_out = df_get_live_out (e->src);
      if (REGNO_REG_SET_P (live_out, regno))
        return true;
    }

  return false;
}

Is this correct?

> 
> In the x86_64 case you mention, the pattern is:
> 
> (define_insn "*simple_return_indirect_internal<mode>"
>  [(simple_return)
>   (use (match_operand:W 0 "register_operand" "r"))]
>  "reload_completed"
>  …)
> 
> This (use …) tells the df machinery that the instruction needs
> operand 0 (= ecx).  The process above would therefore realise
> that ecx can't be clobbered.

Okay, I see.  The df will reflect this information, no need for special handling here. 

However, for the cases on Power as Segher mentioned, there are also some scratch registers used for
Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power?

Thanks

Qing
> 
> Thanks,
> Richard
Richard Sandiford Sept. 14, 2020, 4:33 p.m. UTC | #121
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> Like I mentioned earlier though, passes that run after
>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>> weren't previously used.  For example, on x86_64, the function might
>> not use %r8 when the prologue, epilogue and returns are generated,
>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>> the “used” version of the new command-line option is supposed to clear
>> %r8 in these circumstances, but it wouldn't do so if the data was
>> collected at the point that the return is generated.
>
> Thanks for the information.
>
>> 
>> That's why I think it's more robust to do this later (at the beginning
>> of pass_late_compilation) and insert the zeroing before returns that
>> already exist.
>
> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
> New pass as late as possible?

If we insert the zeroing before pass_delay_slots and describe the
result correctly, pass_delay_slots should do the right thing.

Describing the result correctly includes ensuring that the cleared
registers are treated as live on exit from the function, so that the
zeroing doesn't get deleted again, or skipped by pass_delay_slots.

> Can I put it immediately before “pass_final”? What’s the latest place
> I can put it?

Like you say here…

>>>> So IMO the pass should just search for all the
>>>> returns in a function and insert the zeroing instructions before
>>>> each one.
>>> 
>>> I was considering this approach too for some time, however, there is one major issue with this as 
>>> Segher mentioned, The middle end does not know some details on the registers, lacking such 
>>> detailed information might result incorrect code generation at middle end.
>>> 
>>> For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be 
>>> used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end
>>> doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be 
>>> generated. 
>>> 
>>> Segher also mentioned that on Power, there are some scratch registers also are used for 
>>> Other purpose, clearing them before return is not correct. 
>> 
>> But the dataflow information has to be correct between
>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>> any pass in that region could clobber the registers in the same way.
>
> You mean, the data flow information will be not correct after pass_free_cfg? 
>  “pass_delay_slots” is after “pass_free_cfg”,  and there might be new “return” generated in “pass_delay_slots”, 
> If we want to generate zeroing for the new “return” which was generated in “pass_delay_slots”, can we correctly to do so?

…the zeroing has to be done before pass_free_cfg, because the information
isn't reliable after that point.  I think it would make sense to do it
before pass_compute_alignments, because inserting the zeros will affect
alignment.

>> To get the registers that are live before the return, you can start with
>> the registers that are live out from the block that contains the return,
>> then “simulate” the return instruction backwards to get the set of
>> registers that are live before the return instruction
>> (see df_simulate_one_insn_backwards).
>
> Okay. 
> Currently, I am using the following to check whether a reg is live out the block that contains the return:
>
> /* Check whether the hard register REGNO is live at the exit block
>  * of the current routine.  */
> static bool
> is_live_reg_at_exit (unsigned int regno)
> {
>   edge e;
>   edge_iterator ei;
>
>   FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>     {
>       bitmap live_out = df_get_live_out (e->src);
>       if (REGNO_REG_SET_P (live_out, regno))
>         return true;
>     }
>
>   return false;
> }
>
> Is this correct?

df_get_live_out is the right way to get the set of live registers
on exit from a block.  But if we search for return instructions
and find a return instruction R, we should do:

  basic_block bb = BLOCK_FOR_INSN (R);
  auto_bitmap live_regs;
  bitmap_copy (regs, df_get_live_out (bb));
  df_simulate_one_insn_backwards (bb, R, live_regs);

and then use LIVE_REGS as the set of registers that are live before R,
and so can't be clobbered.

For extra safety, you could/should also check targetm.hard_regno_scratch_ok
to see whether there's a target-specific reason why a register can't
be clobbered.

>> In the x86_64 case you mention, the pattern is:
>> 
>> (define_insn "*simple_return_indirect_internal<mode>"
>>  [(simple_return)
>>   (use (match_operand:W 0 "register_operand" "r"))]
>>  "reload_completed"
>>  …)
>> 
>> This (use …) tells the df machinery that the instruction needs
>> operand 0 (= ecx).  The process above would therefore realise
>> that ecx can't be clobbered.
>
> Okay, I see.  The df will reflect this information, no need for special handling here. 
>
> However, for the cases on Power as Segher mentioned, there are also some scratch registers used for
> Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power?

Segher would be better placed to answer that, but I think the process
above has to give a conservatively-accurate list of live registers.
If it misses a register, the other late rtl passes could clobber
that same register.

Thanks,
Richard
Qing Zhao Sept. 14, 2020, 6:50 p.m. UTC | #122
> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> Like I mentioned earlier though, passes that run after
>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>> weren't previously used.  For example, on x86_64, the function might
>>> not use %r8 when the prologue, epilogue and returns are generated,
>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>> the “used” version of the new command-line option is supposed to clear
>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>> collected at the point that the return is generated.
>> 
>> Thanks for the information.
>> 
>>> 
>>> That's why I think it's more robust to do this later (at the beginning
>>> of pass_late_compilation) and insert the zeroing before returns that
>>> already exist.
>> 
>> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
>> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
>> New pass as late as possible?
> 
> If we insert the zeroing before pass_delay_slots and describe the
> result correctly, pass_delay_slots should do the right thing.
> 
> Describing the result correctly includes ensuring that the cleared
> registers are treated as live on exit from the function, so that the
> zeroing doesn't get deleted again, or skipped by pass_delay_slots.

In the current implementation for x86, when we generating a zeroing insn as the following:

(insn 18 16 19 2 (set (reg:SI 1 dx)
        (const_int 0 [0])) "t10.c":11:1 -1
     (nil))
(insn 19 18 20 2 (unspec_volatile [
            (reg:SI 1 dx)
        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
     (nil))

i.e, after each zeroing insn, the register that is zeroed is marked as “UNSPECV_PRO_EPILOGUE_USE”, 
By doing this, we can avoid this zeroing insn from being deleted or skipped. 

Is doing this enough to describe the result correctly?
Is there other thing we need to do in addition to this?

> 
>> Can I put it immediately before “pass_final”? What’s the latest place
>> I can put it?
> 
> Like you say here…
> 
>>>>> So IMO the pass should just search for all the
>>>>> returns in a function and insert the zeroing instructions before
>>>>> each one.
>>>> 
>>>> I was considering this approach too for some time, however, there is one major issue with this as 
>>>> Segher mentioned, The middle end does not know some details on the registers, lacking such 
>>>> detailed information might result incorrect code generation at middle end.
>>>> 
>>>> For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be 
>>>> used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end
>>>> doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be 
>>>> generated. 
>>>> 
>>>> Segher also mentioned that on Power, there are some scratch registers also are used for 
>>>> Other purpose, clearing them before return is not correct. 
>>> 
>>> But the dataflow information has to be correct between
>>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>>> any pass in that region could clobber the registers in the same way.
>> 
>> You mean, the data flow information will be not correct after pass_free_cfg? 
>> “pass_delay_slots” is after “pass_free_cfg”,  and there might be new “return” generated in “pass_delay_slots”, 
>> If we want to generate zeroing for the new “return” which was generated in “pass_delay_slots”, can we correctly to do so?
> 
> …the zeroing has to be done before pass_free_cfg, because the information
> isn't reliable after that point.  I think it would make sense to do it
> before pass_compute_alignments, because inserting the zeros will affect
> alignment.

Okay. 

Then there is another problem:  what about the new “return”s that are generated at pass_delay_slots?

Should we generate the zeroing for these new returns? Since the data flow information might not be correct at this pass,
It looks like that there is no correct way to add the zeroing insn for these new “return”, then, what should we do about this?

> 
>>> To get the registers that are live before the return, you can start with
>>> the registers that are live out from the block that contains the return,
>>> then “simulate” the return instruction backwards to get the set of
>>> registers that are live before the return instruction
>>> (see df_simulate_one_insn_backwards).
>> 
>> Okay. 
>> Currently, I am using the following to check whether a reg is live out the block that contains the return:
>> 
>> /* Check whether the hard register REGNO is live at the exit block
>> * of the current routine.  */
>> static bool
>> is_live_reg_at_exit (unsigned int regno)
>> {
>>  edge e;
>>  edge_iterator ei;
>> 
>>  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>>    {
>>      bitmap live_out = df_get_live_out (e->src);
>>      if (REGNO_REG_SET_P (live_out, regno))
>>        return true;
>>    }
>> 
>>  return false;
>> }
>> 
>> Is this correct?
> 
> df_get_live_out is the right way to get the set of live registers
> on exit from a block.  But if we search for return instructions
> and find a return instruction R, we should do:
> 
>  basic_block bb = BLOCK_FOR_INSN (R);
>  auto_bitmap live_regs;
>  bitmap_copy (regs, df_get_live_out (bb));
>  df_simulate_one_insn_backwards (bb, R, live_regs);

> 
> and then use LIVE_REGS as the set of registers that are live before R,
> and so can't be clobbered.

Okay. Thanks for the info.
> 
> For extra safety, you could/should also check targetm.hard_regno_scratch_ok
> to see whether there's a target-specific reason why a register can't
> be clobbered.

/* Return true if is OK to use a hard register REGNO as scratch register
   in peephole2.  */
DEFHOOK
(hard_regno_scratch_ok,


Is this checking only valid for pass_peephole2?

> 
>>> In the x86_64 case you mention, the pattern is:
>>> 
>>> (define_insn "*simple_return_indirect_internal<mode>"
>>> [(simple_return)
>>>  (use (match_operand:W 0 "register_operand" "r"))]
>>> "reload_completed"
>>> …)
>>> 
>>> This (use …) tells the df machinery that the instruction needs
>>> operand 0 (= ecx).  The process above would therefore realise
>>> that ecx can't be clobbered.
>> 
>> Okay, I see.  The df will reflect this information, no need for special handling here. 
>> 
>> However, for the cases on Power as Segher mentioned, there are also some scratch registers used for
>> Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power?
> 
> Segher would be better placed to answer that, but I think the process
> above has to give a conservatively-accurate list of live registers.
> If it misses a register, the other late rtl passes could clobber
> that same register.

Segher, can you comment on this? 

thanks.

Qing
> 
> Thanks,
> Richard
Richard Sandiford Sept. 14, 2020, 7:20 p.m. UTC | #123
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> Like I mentioned earlier though, passes that run after
>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>> weren't previously used.  For example, on x86_64, the function might
>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>>> the “used” version of the new command-line option is supposed to clear
>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>> collected at the point that the return is generated.
>>> 
>>> Thanks for the information.
>>> 
>>>> 
>>>> That's why I think it's more robust to do this later (at the beginning
>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>> already exist.
>>> 
>>> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
>>> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
>>> New pass as late as possible?
>> 
>> If we insert the zeroing before pass_delay_slots and describe the
>> result correctly, pass_delay_slots should do the right thing.
>> 
>> Describing the result correctly includes ensuring that the cleared
>> registers are treated as live on exit from the function, so that the
>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>
> In the current implementation for x86, when we generating a zeroing insn as the following:
>
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>         (const_int 0 [0])) "t10.c":11:1 -1
>      (nil))
> (insn 19 18 20 2 (unspec_volatile [
>             (reg:SI 1 dx)
>         ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>      (nil))
>
> i.e, after each zeroing insn, the register that is zeroed is marked as “UNSPECV_PRO_EPILOGUE_USE”, 
> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>
> Is doing this enough to describe the result correctly?
> Is there other thing we need to do in addition to this?

I guess that works, but I think it would be better to abstract
EPILOGUE_USES into a new target-independent wrapper function that
(a) returns true if EPILOGUE_USES itself returns true and (b) returns
true for registers that need to be zero on return, if the zeroing
instructions have already been inserted.  The places that currently
test EPILOGUE_USES should then test this new wrapper function instead.

After inserting the zeroing instructions, the pass should recompute the
live-out sets based on this.

>>>> But the dataflow information has to be correct between
>>>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>>>> any pass in that region could clobber the registers in the same way.
>>> 
>>> You mean, the data flow information will be not correct after pass_free_cfg? 
>>> “pass_delay_slots” is after “pass_free_cfg”,  and there might be new “return” generated in “pass_delay_slots”, 
>>> If we want to generate zeroing for the new “return” which was generated in “pass_delay_slots”, can we correctly to do so?
>> 
>> …the zeroing has to be done before pass_free_cfg, because the information
>> isn't reliable after that point.  I think it would make sense to do it
>> before pass_compute_alignments, because inserting the zeros will affect
>> alignment.
>
> Okay. 
>
> Then there is another problem:  what about the new “return”s that are generated at pass_delay_slots?
>
> Should we generate the zeroing for these new returns? Since the data flow information might not be correct at this pass,
> It looks like that there is no correct way to add the zeroing insn for these new “return”, then, what should we do about this?

pass_delay_slots isn't a problem.  It doesn't change *what* happens
on each return path, it just changes how the instructions to achieve
it are arranged.

So e.g. if every path through the function clears register R before
pass_delay_slots, and if that clearing is represented as being necessary,
then every path through the function will clear register R after the pass
as well.

>> For extra safety, you could/should also check targetm.hard_regno_scratch_ok
>> to see whether there's a target-specific reason why a register can't
>> be clobbered.
>
> /* Return true if is OK to use a hard register REGNO as scratch register
>    in peephole2.  */
> DEFHOOK
> (hard_regno_scratch_ok,
>
>
> Is this checking only valid for pass_peephole2?

No, that comment looks out of date.  The hook is already used in
postreload, for example.

Thanks,
Richard
Qing Zhao Sept. 14, 2020, 8:24 p.m. UTC | #124
> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> Like I mentioned earlier though, passes that run after
>>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>>> weren't previously used.  For example, on x86_64, the function might
>>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>>>> the “used” version of the new command-line option is supposed to clear
>>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>>> collected at the point that the return is generated.
>>>> 
>>>> Thanks for the information.
>>>> 
>>>>> 
>>>>> That's why I think it's more robust to do this later (at the beginning
>>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>>> already exist.
>>>> 
>>>> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
>>>> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
>>>> New pass as late as possible?
>>> 
>>> If we insert the zeroing before pass_delay_slots and describe the
>>> result correctly, pass_delay_slots should do the right thing.
>>> 
>>> Describing the result correctly includes ensuring that the cleared
>>> registers are treated as live on exit from the function, so that the
>>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>> 
>> In the current implementation for x86, when we generating a zeroing insn as the following:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>        (const_int 0 [0])) "t10.c":11:1 -1
>>     (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>            (reg:SI 1 dx)
>>        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>     (nil))
>> 
>> i.e, after each zeroing insn, the register that is zeroed is marked as “UNSPECV_PRO_EPILOGUE_USE”, 
>> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>> 
>> Is doing this enough to describe the result correctly?
>> Is there other thing we need to do in addition to this?
> 
> I guess that works, but I think it would be better to abstract
> EPILOGUE_USES into a new target-independent wrapper function that
> (a) returns true if EPILOGUE_USES itself returns true and (b) returns
> true for registers that need to be zero on return, if the zeroing
> instructions have already been inserted.  The places that currently
> test EPILOGUE_USES should then test this new wrapper function instead.

Okay, I see. 
Looks like that EPILOGUE_USES is used in df-scan.c to compute the data flow information. If EPILOUGE_USES return true
for registers that need to be zeroed on return, those registers will be included in the data flow information, as a result, later
passes will not be able to delete them. 

This sounds to be a cleaner approach than the current one that marks the registers  “UNSPECV_PRO_EPILOGUE_USE”. 

A more detailed implementation question on this: 
Where should I put this new target-independent wrapper function in? Which header file will be a proper place to hold this new function?

> 
> After inserting the zeroing instructions, the pass should recompute the
> live-out sets based on this.

Is only computing the live-out sets of the block that including the return insn enough? Or we should re-compute the whole procedure? 

Which utility routine I should use to recompute the live-out sets?

> 
>>>>> But the dataflow information has to be correct between
>>>>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>>>>> any pass in that region could clobber the registers in the same way.
>>>> 
>>>> You mean, the data flow information will be not correct after pass_free_cfg? 
>>>> “pass_delay_slots” is after “pass_free_cfg”,  and there might be new “return” generated in “pass_delay_slots”, 
>>>> If we want to generate zeroing for the new “return” which was generated in “pass_delay_slots”, can we correctly to do so?
>>> 
>>> …the zeroing has to be done before pass_free_cfg, because the information
>>> isn't reliable after that point.  I think it would make sense to do it
>>> before pass_compute_alignments, because inserting the zeros will affect
>>> alignment.
>> 
>> Okay. 
>> 
>> Then there is another problem:  what about the new “return”s that are generated at pass_delay_slots?
>> 
>> Should we generate the zeroing for these new returns? Since the data flow information might not be correct at this pass,
>> It looks like that there is no correct way to add the zeroing insn for these new “return”, then, what should we do about this?
> 
> pass_delay_slots isn't a problem.  It doesn't change *what* happens
> on each return path, it just changes how the instructions to achieve
> it are arranged.
> 
> So e.g. if every path through the function clears register R before
> pass_delay_slots, and if that clearing is represented as being necessary,
> then every path through the function will clear register R after the pass
> as well.

Okay, I might now understand what you mean here.

My understanding is:

In our new pass that is put in the beginning of the pass_late_compilation, I,e pass_zero_call_used_regs;

      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
++++  NEXT_PASS (pass_zero_call_used_regs);
          NEXT_PASS (pass_compute_alignments);
          NEXT_PASS (pass_variable_tracking);
          NEXT_PASS (pass_free_cfg);
          NEXT_PASS (pass_machine_reorg);
          NEXT_PASS (pass_cleanup_barriers);
          NEXT_PASS (pass_delay_slots);

When we scan the EXIT BLOCK of the routine, all the return insns have already been there.
The later passes including “pass_delay_slots” will not generate additional returns anymore,  they might just call “target.gen_return” or “target.gen_simple_return() to replace 
“ret_rtx” or “simple_ret_rtx” ?


> 
>>> For extra safety, you could/should also check targetm.hard_regno_scratch_ok
>>> to see whether there's a target-specific reason why a register can't
>>> be clobbered.
>> 
>> /* Return true if is OK to use a hard register REGNO as scratch register
>>   in peephole2.  */
>> DEFHOOK
>> (hard_regno_scratch_ok,
>> 
>> 
>> Is this checking only valid for pass_peephole2?
> 
> No, that comment looks out of date.  The hook is already used in
> postreload, for example.

Okay, I see.

thanks.

Qing
> 
> Thanks,
> Richard
Segher Boessenkool Sept. 14, 2020, 11:09 p.m. UTC | #125
On Fri, Sep 11, 2020 at 05:41:47PM -0500, Qing Zhao wrote:
> > On Sep 11, 2020, at 4:51 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > It is definitely *not* effective if there are gadgets that set rax to
> > a value the attacker wants and then do a syscall.
> 
> You mean the following gadget:
> 
> 
> Gadget 1:
> 
> mov  rax,  value
> syscall
> ret

No, just

mov rax,59
syscall

(no ret necessary!)

I.e. just anything that already does an execve.


Segher
Segher Boessenkool Sept. 14, 2020, 11:20 p.m. UTC | #126
Hi!

On Fri, Sep 11, 2020 at 05:24:58PM -0500, Qing Zhao wrote:
> > So IMO the pass should just search for all the
> > returns in a function and insert the zeroing instructions before
> > each one.
> 
> I was considering this approach too for some time, however, there is one major issue with this as 
> Segher mentioned, The middle end does not know some details on the registers, lacking such 
> detailed information might result incorrect code generation at middle end.
> 
> For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be 
> used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end
> doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be 
> generated. 
> 
> Segher also mentioned that on Power, there are some scratch registers also are used for 
> Other purpose, clearing them before return is not correct. 

Depending where you insert those insns, it can be non-harmful, but in
most places it will not be useful.


What you can do (easy and safe) is change the RTL return instructions to
clear all necessary registers (by outputting extra assembler
instructions).  I still have big doubts how effective that will be, and
esp. compared with how expensive that is, but at least its effect on the
compiler is very local, and it does not get in the way of most things.

(This also works with shrink-wrapping and similar.)

(The effectiveness of this whole scheme depends a *lot* on specifics of
the ABI, btw; in that way it is not generic at all!)


Segher
Segher Boessenkool Sept. 14, 2020, 11:35 p.m. UTC | #127
On Mon, Sep 14, 2020 at 05:33:33PM +0100, Richard Sandiford wrote:
> > However, for the cases on Power as Segher mentioned, there are also some scratch registers used for
> > Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power?
> 
> Segher would be better placed to answer that, but I think the process
> above has to give a conservatively-accurate list of live registers.
> If it misses a register, the other late rtl passes could clobber
> that same register.

It will zero a whole bunch of registers that are overwritten later, that
are not parameter passing registers either.

Doing this with the limited information the middle end has is not the
best idea.


Segher
Qing Zhao Sept. 15, 2020, 3:07 a.m. UTC | #128
> On Sep 14, 2020, at 6:09 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 11, 2020 at 05:41:47PM -0500, Qing Zhao wrote:
>>> On Sep 11, 2020, at 4:51 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> It is definitely *not* effective if there are gadgets that set rax to
>>> a value the attacker wants and then do a syscall.
>> 
>> You mean the following gadget:
>> 
>> 
>> Gadget 1:
>> 
>> mov  rax,  value
>> syscall
>> ret
> 
> No, just
> 
> mov rax,59
> syscall
> 
> (no ret necessary!)

But for ROP, a typical gadget should be ended with a “ret” (or indirect branch), right?

Qing
> 
> I.e. just anything that already does an execve.
> 
> 
> Segher
Richard Sandiford Sept. 15, 2020, 9:11 a.m. UTC | #129
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> 
>>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>>> Like I mentioned earlier though, passes that run after
>>>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>>>> weren't previously used.  For example, on x86_64, the function might
>>>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>>>>> the “used” version of the new command-line option is supposed to clear
>>>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>>>> collected at the point that the return is generated.
>>>>> 
>>>>> Thanks for the information.
>>>>> 
>>>>>> 
>>>>>> That's why I think it's more robust to do this later (at the beginning
>>>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>>>> already exist.
>>>>> 
>>>>> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
>>>>> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
>>>>> New pass as late as possible?
>>>> 
>>>> If we insert the zeroing before pass_delay_slots and describe the
>>>> result correctly, pass_delay_slots should do the right thing.
>>>> 
>>>> Describing the result correctly includes ensuring that the cleared
>>>> registers are treated as live on exit from the function, so that the
>>>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>>> 
>>> In the current implementation for x86, when we generating a zeroing insn as the following:
>>> 
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>        (const_int 0 [0])) "t10.c":11:1 -1
>>>     (nil))
>>> (insn 19 18 20 2 (unspec_volatile [
>>>            (reg:SI 1 dx)
>>>        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>     (nil))
>>> 
>>> i.e, after each zeroing insn, the register that is zeroed is marked as “UNSPECV_PRO_EPILOGUE_USE”, 
>>> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>>> 
>>> Is doing this enough to describe the result correctly?
>>> Is there other thing we need to do in addition to this?
>> 
>> I guess that works, but I think it would be better to abstract
>> EPILOGUE_USES into a new target-independent wrapper function that
>> (a) returns true if EPILOGUE_USES itself returns true and (b) returns
>> true for registers that need to be zero on return, if the zeroing
>> instructions have already been inserted.  The places that currently
>> test EPILOGUE_USES should then test this new wrapper function instead.
>
> Okay, I see. 
> Looks like that EPILOGUE_USES is used in df-scan.c to compute the data flow information. If EPILOUGE_USES return true
> for registers that need to be zeroed on return, those registers will be included in the data flow information, as a result, later
> passes will not be able to delete them. 
>
> This sounds to be a cleaner approach than the current one that marks the registers  “UNSPECV_PRO_EPILOGUE_USE”. 
>
> A more detailed implementation question on this: 
> Where should I put this new target-independent wrapper function in? Which header file will be a proper place to hold this new function?

Not a strong opinion, but: maybe df.h and df-scan.c, since this is
really a DF query.

>> After inserting the zeroing instructions, the pass should recompute the
>> live-out sets based on this.

Sorry, I was wrong here.  It should *cause* the sets to be recomputed
where necessary (rather than recompute them directly), but see below.

> Is only computing the live-out sets of the block that including the return insn enough? Or we should re-compute the whole procedure? 
>
> Which utility routine I should use to recompute the live-out sets?

Inserting the instructions will cause the containing block to be marked
dirty, via df_set_bb_dirty.  I think the pass should also call
df_set_bb_dirty on the exit block itself, to indicate that the
wrapper around EPILOGUE_USES has changed behaviour, but that might
not be necessary.

This gives the df machinery enough information to work out what has changed.
It will then propagate those changes throughout the function.  (I don't
think any propagation would be necessary here, but if I'm wrong about that,
then the df machinery will do whatever propagation is necessary.)

However, the convention is for a pass that uses the df machinery to call
df_analyze first.  This call to df_analyze updates any stale df information.

So unlike what I said yesterday, the pass itself doesn't need to make sure
that the df information is up-to-date.  It just needs to indicate what
has changed, as above.

In the case of pass_delay_slots, pass_free_cfg has:

  /* The resource.c machinery uses DF but the CFG isn't guaranteed to be
     valid at that point so it would be too late to call df_analyze.  */
  if (DELAY_SLOTS && optimize > 0 && flag_delayed_branch)
    {
      df_note_add_problem ();
      df_analyze ();
    }

Any other machine-specific passes that use df already need to call
df_analyze (if they use the df machinery).  So simply marking what
has changed is enough (by design).

> My understanding is:
>
> In our new pass that is put in the beginning of the pass_late_compilation, I,e pass_zero_call_used_regs;
>
>       PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
> ++++  NEXT_PASS (pass_zero_call_used_regs);
>           NEXT_PASS (pass_compute_alignments);
>           NEXT_PASS (pass_variable_tracking);
>           NEXT_PASS (pass_free_cfg);
>           NEXT_PASS (pass_machine_reorg);
>           NEXT_PASS (pass_cleanup_barriers);
>           NEXT_PASS (pass_delay_slots);
>
> When we scan the EXIT BLOCK of the routine, all the return insns have already been there.
> The later passes including “pass_delay_slots” will not generate additional returns anymore,  they might just call “target.gen_return” or “target.gen_simple_return() to replace 
> “ret_rtx” or “simple_ret_rtx” ?

Kind-of.  pass_delay_slots can also duplicate code, so it's not always a
straight replacement.  But the point is that returns don't appear out of
nowhere.  There has to be a semantic reason for them to exist.  The
behaviour of the function after pass_delay_slots has to be the same
as it was before the pass (disregarding undefined behaviour).  Once we've
added clearing of the zero registers to all return paths, that clearing
becomes part of the behaviour of the function, and so will be part of
the behaviour after pass_delay_slots as well.

So I don't think the problem is with passes generating new returns.
It's more whether they could use new registers that then need to be
cleared, which is the main justification for running the new pass
so late in the pipeline.

In principle, there's nothing stopping pass_delay_slots allocating
new registers (like pass_regrename does), and in principle that could
introduce the need to do more clearing.  But I don't think the current
pass does that.  The pass is also very much legacy code at this point,
so the chances of new optimisations being added to it are fairly low.
If that did happen, I think it would be reasonable to expect the pass
to work within the set of registers that have already been allocated,
at least when your new option is in effect.

Thanks,
Richard
Richard Sandiford Sept. 15, 2020, 11:46 a.m. UTC | #130
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Mon, Sep 14, 2020 at 05:33:33PM +0100, Richard Sandiford wrote:
>> > However, for the cases on Power as Segher mentioned, there are also some scratch registers used for
>> > Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power?
>> 
>> Segher would be better placed to answer that, but I think the process
>> above has to give a conservatively-accurate list of live registers.
>> If it misses a register, the other late rtl passes could clobber
>> that same register.
>
> It will zero a whole bunch of registers that are overwritten later, that
> are not parameter passing registers either.

This thread has covered two main issues: correctness and cost.
The question above was about correctness, but your reply seems to be
about cost.  The correctness question was instead: would the process
described in my previous message lead the compiler to think that a
register wasn't live before a Power return instruction when the
register actually was live?  (And if so, how do we get around that
for other post prologue-epilogue passes that use df?)

On the cost issue: when you say some registers are “overwritten later”:
which registers do you mean, and who would be doing the overwriting?
We were talking about inserting zeroing instructions immediately before
returns that already exist.  It looks like the main Power return
pattern is:

(define_insn "<return_str>return"
  [(any_return)]
  "<return_pred>"
  "blr"
  [(set_attr "type" "jmpreg")])

Does this overwrite anything other than the PC?  If not, it doesn't
look like anything in the function itself would clobber other registers
later (i.e. later than the inserted zeroing instructions).  And of course,
if an attacker is performing a ROP attack, the attacker controls which
address the BLR returns to.

Thanks,
Richard
Qing Zhao Sept. 15, 2020, 3:05 p.m. UTC | #131
> On Sep 15, 2020, at 4:11 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>> 
>>>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>>>> Like I mentioned earlier though, passes that run after
>>>>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>>>>> weren't previously used.  For example, on x86_64, the function might
>>>>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>>>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>>>>>> the “used” version of the new command-line option is supposed to clear
>>>>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>>>>> collected at the point that the return is generated.
>>>>>> 
>>>>>> Thanks for the information.
>>>>>> 
>>>>>>> 
>>>>>>> That's why I think it's more robust to do this later (at the beginning
>>>>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>>>>> already exist.
>>>>>> 
>>>>>> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
>>>>>> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
>>>>>> New pass as late as possible?
>>>>> 
>>>>> If we insert the zeroing before pass_delay_slots and describe the
>>>>> result correctly, pass_delay_slots should do the right thing.
>>>>> 
>>>>> Describing the result correctly includes ensuring that the cleared
>>>>> registers are treated as live on exit from the function, so that the
>>>>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>>>> 
>>>> In the current implementation for x86, when we generating a zeroing insn as the following:
>>>> 
>>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>>       (const_int 0 [0])) "t10.c":11:1 -1
>>>>    (nil))
>>>> (insn 19 18 20 2 (unspec_volatile [
>>>>           (reg:SI 1 dx)
>>>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>>    (nil))
>>>> 
>>>> i.e, after each zeroing insn, the register that is zeroed is marked as “UNSPECV_PRO_EPILOGUE_USE”, 
>>>> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>>>> 
>>>> Is doing this enough to describe the result correctly?
>>>> Is there other thing we need to do in addition to this?
>>> 
>>> I guess that works, but I think it would be better to abstract
>>> EPILOGUE_USES into a new target-independent wrapper function that
>>> (a) returns true if EPILOGUE_USES itself returns true and (b) returns
>>> true for registers that need to be zero on return, if the zeroing
>>> instructions have already been inserted.  The places that currently
>>> test EPILOGUE_USES should then test this new wrapper function instead.
>> 
>> Okay, I see. 
>> Looks like that EPILOGUE_USES is used in df-scan.c to compute the data flow information. If EPILOUGE_USES return true
>> for registers that need to be zeroed on return, those registers will be included in the data flow information, as a result, later
>> passes will not be able to delete them. 
>> 
>> This sounds to be a cleaner approach than the current one that marks the registers  “UNSPECV_PRO_EPILOGUE_USE”. 
>> 
>> A more detailed implementation question on this: 
>> Where should I put this new target-independent wrapper function in? Which header file will be a proper place to hold this new function?
> 
> Not a strong opinion, but: maybe df.h and df-scan.c, since this is
> really a DF query.

Okay.

> 
>>> After inserting the zeroing instructions, the pass should recompute the
>>> live-out sets based on this.
> 
> Sorry, I was wrong here.  It should *cause* the sets to be recomputed
> where necessary (rather than recompute them directly), but see below.
> 
>> Is only computing the live-out sets of the block that including the return insn enough? Or we should re-compute the whole procedure? 
>> 
>> Which utility routine I should use to recompute the live-out sets?
> 
> Inserting the instructions will cause the containing block to be marked
> dirty, via df_set_bb_dirty.  I think the pass should also call
> df_set_bb_dirty on the exit block itself, to indicate that the
> wrapper around EPILOGUE_USES has changed behaviour, but that might
> not be necessary.
> 
> This gives the df machinery enough information to work out what has changed.
> It will then propagate those changes throughout the function.  (I don't
> think any propagation would be necessary here, but if I'm wrong about that,
> then the df machinery will do whatever propagation is necessary.)
> 
> However, the convention is for a pass that uses the df machinery to call
> df_analyze first.  This call to df_analyze updates any stale df information.
> 
> So unlike what I said yesterday, the pass itself doesn't need to make sure
> that the df information is up-to-date.  It just needs to indicate what
> has changed, as above.
> 
> In the case of pass_delay_slots, pass_free_cfg has:
> 
>  /* The resource.c machinery uses DF but the CFG isn't guaranteed to be
>     valid at that point so it would be too late to call df_analyze.  */
>  if (DELAY_SLOTS && optimize > 0 && flag_delayed_branch)
>    {
>      df_note_add_problem ();
>      df_analyze ();
>    }
> 
> Any other machine-specific passes that use df already need to call
> df_analyze (if they use the df machinery).  So simply marking what
> has changed is enough (by design).

So, in this new pass, I need:

1. Call “df_analyze” in the beginning to get the up-to-data df information;
2. After generating the zero insns, mark the containing block with “df_set_bb_dirty”. 
3. mark the exit block with “df_set_bb_dirty” to indicate the wrapper around EPILOGUE_USES changed
    Behavior. (This might not need since “df_analyze” in the next pass will call EPILOGUE_USES automatically? )

Is the above enough for DF?

(BTW, how expensive to call “df_analyze”?)

> 
>> My understanding is:
>> 
>> In our new pass that is put in the beginning of the pass_late_compilation, I,e pass_zero_call_used_regs;
>> 
>>      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
>> ++++  NEXT_PASS (pass_zero_call_used_regs);
>>          NEXT_PASS (pass_compute_alignments);
>>          NEXT_PASS (pass_variable_tracking);
>>          NEXT_PASS (pass_free_cfg);
>>          NEXT_PASS (pass_machine_reorg);
>>          NEXT_PASS (pass_cleanup_barriers);
>>          NEXT_PASS (pass_delay_slots);
>> 
>> When we scan the EXIT BLOCK of the routine, all the return insns have already been there.
>> The later passes including “pass_delay_slots” will not generate additional returns anymore,  they might just call “target.gen_return” or “target.gen_simple_return() to replace 
>> “ret_rtx” or “simple_ret_rtx” ?
> 
> Kind-of.  pass_delay_slots can also duplicate code, so it's not always a
> straight replacement.  But the point is that returns don't appear out of
> nowhere.  There has to be a semantic reason for them to exist.  The
> behaviour of the function after pass_delay_slots has to be the same
> as it was before the pass (disregarding undefined behaviour).  Once we've
> added clearing of the zero registers to all return paths, that clearing
> becomes part of the behaviour of the function, and so will be part of
> the behaviour after pass_delay_slots as well.
> 
> So I don't think the problem is with passes generating new returns.
> It's more whether they could use new registers that then need to be
> cleared, which is the main justification for running the new pass
> so late in the pipeline.

agreed.

> 
> In principle, there's nothing stopping pass_delay_slots allocating
> new registers (like pass_regrename does), and in principle that could
> introduce the need to do more clearing.  But I don't think the current
> pass does that.  The pass is also very much legacy code at this point,
> so the chances of new optimisations being added to it are fairly low.
> If that did happen, I think it would be reasonable to expect the pass
> to work within the set of registers that have already been allocated,
> at least when your new option is in effect.

Okay, thanks for the information.

Qing
> 
> Thanks,
> Richard
Segher Boessenkool Sept. 15, 2020, 6:51 p.m. UTC | #132
On Mon, Sep 14, 2020 at 10:07:31PM -0500, Qing Zhao wrote:
> > On Sep 14, 2020, at 6:09 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> Gadget 1:
> >> 
> >> mov  rax,  value
> >> syscall
> >> ret
> > 
> > No, just
> > 
> > mov rax,59
> > syscall
> > 
> > (no ret necessary!)
> 
> But for ROP, a typical gadget should be ended with a “ret” (or indirect branch), right?

Not the last one :-)  (Especially if it is exec!)


Segher
Segher Boessenkool Sept. 15, 2020, 7:22 p.m. UTC | #133
On Tue, Sep 15, 2020 at 12:46:00PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Mon, Sep 14, 2020 at 05:33:33PM +0100, Richard Sandiford wrote:
> >> > However, for the cases on Power as Segher mentioned, there are also some scratch registers used for
> >> > Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power?
> >> 
> >> Segher would be better placed to answer that, but I think the process
> >> above has to give a conservatively-accurate list of live registers.
> >> If it misses a register, the other late rtl passes could clobber
> >> that same register.
> >
> > It will zero a whole bunch of registers that are overwritten later, that
> > are not parameter passing registers either.
> 
> This thread has covered two main issues: correctness and cost.
> The question above was about correctness, but your reply seems to be
> about cost.

The issues are very heavily intertwined.  A much too high execution
cost is unacceptable, just like machine code that does not implement the
source code faithfully.

> On the cost issue: when you say some registers are “overwritten later”:
> which registers do you mean, and who would be doing the overwriting?

(Glue) code that is generated by the linker.

> We were talking about inserting zeroing instructions immediately before
> returns that already exist.  It looks like the main Power return
> pattern is:

It is.

> (define_insn "<return_str>return"
>   [(any_return)]
>   "<return_pred>"
>   "blr"
>   [(set_attr "type" "jmpreg")])
> 
> Does this overwrite anything other than the PC?  If not, it doesn't

(We do not have a "PC" register, but :-) )

Nope.  The blr instruction does not write any register.  (The base
"bclr[l]" insn can write to CTR and LR).

> look like anything in the function itself would clobber other registers
> later (i.e. later than the inserted zeroing instructions).  And of course,
> if an attacker is performing a ROP attack, the attacker controls which
> address the BLR returns to.

That does not matter for the *normal* case.  Making the normal case even
more expensive than this scheme already is is no good.


Anyway, I was concerned about other architectures, too (that may not
even *have* a GCC port (yet)).  The point is that this should follow all
the rules we have for RTL.  Now that it will use DF (thanks!), most of
that will follow automatically (or easily, anyway).


Segher
Segher Boessenkool Sept. 15, 2020, 7:41 p.m. UTC | #134
On Tue, Sep 15, 2020 at 10:11:41AM +0100, Richard Sandiford wrote:
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> >> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
(Putting correct info in DF, inserting the new insns in pro_and_epi).

But, scheduling runs *after* that, and then you need to prevent the
inserted (zeroing) insns from moving -- if you don't, the code after
some zeroing can be used as gadget!  You want to always have all
zeroing insns after *any* computational insn, or it becomes a gadget.


Segher
Qing Zhao Sept. 15, 2020, 10:31 p.m. UTC | #135
> On Sep 15, 2020, at 2:41 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Tue, Sep 15, 2020 at 10:11:41AM +0100, Richard Sandiford wrote:
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> (Putting correct info in DF, inserting the new insns in pro_and_epi).
> 
> But, scheduling runs *after* that, and then you need to prevent the
> inserted (zeroing) insns from moving -- if you don't, the code after
> some zeroing can be used as gadget!  You want to always have all
> zeroing insns after *any* computational insn, or it becomes a gadget.

Please see the previous discussion, we have agreed to put the new pass   (pass_zero_call_used_regs) 
in the beginning of the pass_late_compilation as following:

     PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
++++  NEXT_PASS (pass_zero_call_used_regs);
         NEXT_PASS (pass_compute_alignments);
         NEXT_PASS (pass_variable_tracking);
         NEXT_PASS (pass_free_cfg);
         NEXT_PASS (pass_machine_reorg);
         NEXT_PASS (pass_cleanup_barriers);
         NEXT_PASS (pass_delay_slots);

Scheduling has been done already. 

Qing


> 
> 
> Segher
Segher Boessenkool Sept. 15, 2020, 11:09 p.m. UTC | #136
On Tue, Sep 15, 2020 at 05:31:48PM -0500, Qing Zhao wrote:
> > But, scheduling runs *after* that, and then you need to prevent the
> > inserted (zeroing) insns from moving -- if you don't, the code after
> > some zeroing can be used as gadget!  You want to always have all
> > zeroing insns after *any* computational insn, or it becomes a gadget.
> 
> Please see the previous discussion, we have agreed to put the new pass   (pass_zero_call_used_regs) 
> in the beginning of the pass_late_compilation as following:

Yes, I know that at some point it was said that seemed like a good place
for it.

>      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
> ++++  NEXT_PASS (pass_zero_call_used_regs);
>          NEXT_PASS (pass_compute_alignments);
>          NEXT_PASS (pass_variable_tracking);
>          NEXT_PASS (pass_free_cfg);
>          NEXT_PASS (pass_machine_reorg);
>          NEXT_PASS (pass_cleanup_barriers);
>          NEXT_PASS (pass_delay_slots);
> 
> Scheduling has been done already. 

But there are many more passes that can reorder things.  Like
machine_reorg (which is a big deal).  I don't think other passes here
are harmful (maybe the shorten stuff)?  But.  Targets can also insert
more passes here.

If you want the zeroing insns to stay with the return, you have to
express that in RTL.  Anything else is extremely fragile.


Segher
Qing Zhao Sept. 16, 2020, 1:51 a.m. UTC | #137
> On Sep 15, 2020, at 6:09 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Tue, Sep 15, 2020 at 05:31:48PM -0500, Qing Zhao wrote:
>>> But, scheduling runs *after* that, and then you need to prevent the
>>> inserted (zeroing) insns from moving -- if you don't, the code after
>>> some zeroing can be used as gadget!  You want to always have all
>>> zeroing insns after *any* computational insn, or it becomes a gadget.
>> 
>> Please see the previous discussion, we have agreed to put the new pass   (pass_zero_call_used_regs) 
>> in the beginning of the pass_late_compilation as following:
> 
> Yes, I know that at some point it was said that seemed like a good place
> for it.
> 
>>     PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
>> ++++  NEXT_PASS (pass_zero_call_used_regs);
>>         NEXT_PASS (pass_compute_alignments);
>>         NEXT_PASS (pass_variable_tracking);
>>         NEXT_PASS (pass_free_cfg);
>>         NEXT_PASS (pass_machine_reorg);
>>         NEXT_PASS (pass_cleanup_barriers);
>>         NEXT_PASS (pass_delay_slots);
>> 
>> Scheduling has been done already. 
> 
> But there are many more passes that can reorder things.  Like
> machine_reorg (which is a big deal).  I don't think other passes here
> are harmful (maybe the shorten stuff)?  But.  Targets can also insert
> more passes here.
> 
> If you want the zeroing insns to stay with the return, you have to
> express that in RTL.  

What do you mean by “express that in RTL”?
Could you please explain this in more details?

Do you mean to implement this in “targetm.gen_return” and “targetm.gen_simple_return”?

Qing

> Anything else is extremely fragile.
> 
> 
> Segher
Segher Boessenkool Sept. 16, 2020, 10:35 a.m. UTC | #138
On Tue, Sep 15, 2020 at 08:51:57PM -0500, Qing Zhao wrote:
> > On Sep 15, 2020, at 6:09 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > If you want the zeroing insns to stay with the return, you have to
> > express that in RTL.  
> 
> What do you mean by “express that in RTL”?
> Could you please explain this in more details?

Exactly as I say: you need to tell in the RTL that the insns should stay
together.

Easiest is to just make it one RTL insn.  There are other ways, but
those do not help anything here afaics.

> Do you mean to implement this in “targetm.gen_return” and “targetm.gen_simple_return”?

That is the easiest way, yes.

> > Anything else is extremely fragile.


Segher
Qing Zhao Sept. 16, 2020, 8:57 p.m. UTC | #139
Segher and Richard, 

Now there are two major concerns from the discussion so far:

1. (From Richard):  Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. 
     So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 

2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. 

I think that both of the above 2 concerns are important and should be addressed for the correct implementation. 

In order to support 1,  we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()”  since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. 

In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around.  More restrictions need to be added to these new zero insns.  (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). 


So, based on the above, I propose the following approach that will resolve the above 2 concerns:

1. Add 2 new target hooks:
   A. targetm.pro_epilogue_use (reg)
   This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
   prevent deleting register setting instructions in prologue and epilogue.

   B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
   This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS.

    A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 


2. Add  a new pass, pass_zero_call_used_regs,  in the beginning of pass_late_compilation. 

    This pass will search all “return”s, and compute the hard register set for zeroing, “need_zeroed_hardregs”, based on data flow information, user request, and function abi. 
    Then call targetm.gen_zero_call_used_regs(need_zeroed_hardregs).

3. X86 backend will implement a special version for “gen_zero_call_used_regs”, and “pro_epilogue_use”.


Let me know if you have any more comment on this approach.

thanks.

Qing




> On Sep 16, 2020, at 5:35 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Tue, Sep 15, 2020 at 08:51:57PM -0500, Qing Zhao wrote:
>>> On Sep 15, 2020, at 6:09 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> If you want the zeroing insns to stay with the return, you have to
>>> express that in RTL.  
>> 
>> What do you mean by “express that in RTL”?
>> Could you please explain this in more details?
> 
> Exactly as I say: you need to tell in the RTL that the insns should stay
> together.
> 
> Easiest is to just make it one RTL insn.  There are other ways, but
> those do not help anything here afaics.
> 
>> Do you mean to implement this in “targetm.gen_return” and “targetm.gen_simple_return”?
> 
> That is the easiest way, yes.
> 
>>> Anything else is extremely fragile.
> 
> 
> Segher
Richard Sandiford Sept. 17, 2020, 6:17 a.m. UTC | #140
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> Segher and Richard, 
>
> Now there are two major concerns from the discussion so far:
>
> 1. (From Richard):  Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. 
>      So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 
>
> 2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. 
>
> I think that both of the above 2 concerns are important and should be addressed for the correct implementation. 
>
> In order to support 1,  we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()”  since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. 
>
> In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around.

Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
being deleted as dead.

> More restrictions need to be added to these new zero insns.  (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). 
>
>
> So, based on the above, I propose the following approach that will resolve the above 2 concerns:
>
> 1. Add 2 new target hooks:
>    A. targetm.pro_epilogue_use (reg)
>    This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>    prevent deleting register setting instructions in prologue and epilogue.
>
>    B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>    This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS.
>
>     A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 

This sounds like you're going back to using:

(insn 18 16 19 2 (set (reg:SI 1 dx)
        (const_int 0 [0])) "t10.c":11:1 -1
     (nil))
(insn 19 18 20 2 (unspec_volatile [
            (reg:SI 1 dx)
        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
     (nil))

This also doesn't prevent the zeroing from being moved around.  Like the
EPILOGUE_USES approach, it only prevents the clearing from being removed
as dead.  I still think that the EPILOGUE_USES approach is the better
way of doing that.

In other words: the use insns themselves are volatile and so can't be
moved relative to each other and to other volatile insns.  But the uses
are fake instructions that don't do anything.  The preceding zeroing
instructions are just normal instructions that can be moved around
freely before their respective uses.

I don't think there's a foolproof way of preventing an unknown target
machine_reorg pass from moving the instructions around.  But since we
don't have unknown machine_reorgs (at least not in-tree), I think
instead we should be prepared to patch machine_reorgs where necessary
to ensure that they do the right thing.

If you want to increase the chances that machine_reorgs don't need to be
patched, you could either:

(a) to make the zeroing instructions themselves volatile or
(b) to insert a volatile reference to the register before (rather than
    after) the zeroing instruction

IMO (b) is the way to go, because it avoids the need to define special
volatile move patterns for each type of register.  (b) would be needed
on top of (rather than instead of) the EPILOGUE_USES thing.

I don't think we need a new target-specific unspec_volatile code to do (b).
We can just use an automatically-generated volatile asm to clobber the
registers first.  See e.g. how expand_asm_memory_blockage handles memory
scheduling barriers.

Thanks,
Richard
Qing Zhao Sept. 17, 2020, 2:40 p.m. UTC | #141
> On Sep 17, 2020, at 1:17 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> Segher and Richard, 
>> 
>> Now there are two major concerns from the discussion so far:
>> 
>> 1. (From Richard):  Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. 
>>     So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 
>> 
>> 2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. 
>> 
>> I think that both of the above 2 concerns are important and should be addressed for the correct implementation. 
>> 
>> In order to support 1,  we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()”  since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. 
>> 
>> In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around.
> 
> Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
> being deleted as dead.
> 
>> More restrictions need to be added to these new zero insns.  (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). 
>> 
>> 
>> So, based on the above, I propose the following approach that will resolve the above 2 concerns:
>> 
>> 1. Add 2 new target hooks:
>>   A. targetm.pro_epilogue_use (reg)
>>   This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>>   prevent deleting register setting instructions in prologue and epilogue.
>> 
>>   B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>>   This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS.
>> 
>>    A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 
> 
> This sounds like you're going back to using:
> 
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>        (const_int 0 [0])) "t10.c":11:1 -1
>     (nil))
> (insn 19 18 20 2 (unspec_volatile [
>            (reg:SI 1 dx)
>        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>     (nil))
> 
> This also doesn't prevent the zeroing from being moved around.  Like the
> EPILOGUE_USES approach, it only prevents the clearing from being removed
> as dead.  I still think that the EPILOGUE_USES approach is the better
> way of doing that.

The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)

;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
;; all of memory.  This blocks insns from being moved across this point.

I am not very familiar with how the unspec_volatile actually works in gcc’s data flow analysis, my understanding  from the above is, the RTL insns marked with UNSPEC_volatile would be served as a barrier that no other insns can move across this point. At the same time, since the marked RTL insns is considered to use and clobber all hard registers and memory, it cannot be deleted either. 

So, I thought that “UNSPEC_volatile” should be stronger than “EPILOGUE_USES”. And it can serve the purpose of preventing zeroing insns from deleting and moving. 



> 
> In other words: the use insns themselves are volatile and so can't be
> moved relative to each other and to other volatile insns.  But the uses
> are fake instructions that don't do anything.  The preceding zeroing
> instructions are just normal instructions that can be moved around
> freely before their respective uses.

But since the UNSPEC_volatile insns is considered as a barrier, no other insns can move across them, then the zero insns cannot be moved around too, right?

> 
> I don't think there's a foolproof way of preventing an unknown target
> machine_reorg pass from moving the instructions around.  But since we
> don't have unknown machine_reorgs (at least not in-tree), I think
> instead we should be prepared to patch machine_reorgs where necessary
> to ensure that they do the right thing.
> 
> If you want to increase the chances that machine_reorgs don't need to be
> patched, you could either:
> 
> (a) to make the zeroing instructions themselves volatile or
> (b) to insert a volatile reference to the register before (rather than
>    after) the zeroing instruction
> 
> IMO (b) is the way to go, because it avoids the need to define special
> volatile move patterns for each type of register.  (b) would be needed
> on top of (rather than instead of) the EPILOGUE_USES thing.
> 
Okay, will take approach b. 

But I still don’t quite understand why we still need “EPILOUGE_USES”? What’s the additional benefit from EPILOGUE_USES?

> I don't think we need a new target-specific unspec_volatile code to do (b).
> We can just use an automatically-generated volatile asm to clobber the
> registers first.  See e.g. how expand_asm_memory_blockage handles memory
> scheduling barriers.
/* Generate asm volatile("" : : : "memory") as the memory blockage.  */

static void
expand_asm_memory_blockage (void)
{
  rtx asm_op, clob;

  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
                                 rtvec_alloc (0), rtvec_alloc (0),
                                 rtvec_alloc (0), UNKNOWN_LOCATION);
  MEM_VOLATILE_P (asm_op) = 1;

  clob = gen_rtx_SCRATCH (VOIDmode);
  clob = gen_rtx_MEM (BLKmode, clob);
  clob = gen_rtx_CLOBBER (VOIDmode, clob);

  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
}


As the following? 

/* Generate asm volatile("" : : : “regno") for REGNO.   */

static void
expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
{
  rtx asm_op, clob;

  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
                                 rtvec_alloc (0), rtvec_alloc (0),
                                 rtvec_alloc (0), UNKNOWN_LOCATION);
  MEM_VOLATILE_P (asm_op) = 1;

  clob = gen_rtx_REG (mode, regno);
  clob = gen_rtx_CLOBBER (VOIDmode, clob);

  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
}

Is the above correct? 

thanks.

Qing

> 
> Thanks,
> Richard
Richard Sandiford Sept. 17, 2020, 4:27 p.m. UTC | #142
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 17, 2020, at 1:17 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> Segher and Richard, 
>>> 
>>> Now there are two major concerns from the discussion so far:
>>> 
>>> 1. (From Richard):  Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. 
>>>     So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 
>>> 
>>> 2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. 
>>> 
>>> I think that both of the above 2 concerns are important and should be addressed for the correct implementation. 
>>> 
>>> In order to support 1,  we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()”  since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. 
>>> 
>>> In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around.
>> 
>> Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
>> being deleted as dead.
>> 
>>> More restrictions need to be added to these new zero insns.  (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). 
>>> 
>>> 
>>> So, based on the above, I propose the following approach that will resolve the above 2 concerns:
>>> 
>>> 1. Add 2 new target hooks:
>>>   A. targetm.pro_epilogue_use (reg)
>>>   This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>>>   prevent deleting register setting instructions in prologue and epilogue.
>>> 
>>>   B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>>>   This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS.
>>> 
>>>    A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 
>> 
>> This sounds like you're going back to using:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>        (const_int 0 [0])) "t10.c":11:1 -1
>>     (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>            (reg:SI 1 dx)
>>        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>     (nil))
>> 
>> This also doesn't prevent the zeroing from being moved around.  Like the
>> EPILOGUE_USES approach, it only prevents the clearing from being removed
>> as dead.  I still think that the EPILOGUE_USES approach is the better
>> way of doing that.
>
> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>
> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
> ;; all of memory.  This blocks insns from being moved across this point.

Heh, it looks like that comment dates back to 1994. :-)

The comment is no longer correct though.  I wasn't around at the time,
but I assume the comment was only locally true even then.

If what the comment said was true, then something like:

(define_insn "cld"
  [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
  ""
  "cld"
  [(set_attr "length" "1")
   (set_attr "length_immediate" "0")
   (set_attr "modrm" "0")])

would invalidate the entire register file and so would require all values
to be spilt to the stack around the CLD.

> I am not very familiar with how the unspec_volatile actually works in gcc’s data flow analysis, my understanding  from the above is, the RTL insns marked with UNSPEC_volatile would be served as a barrier that no other insns can move across this point. At the same time, since the marked RTL insns is considered to use and clobber all hard registers and memory, it cannot be deleted either. 

UNSPEC_VOLATILEs can't be deleted.  And they can't be reordered relative
to other UNSPEC_VOLATILEs.  But the problem with:

(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 19 18 20 2 (unspec_volatile [
           (reg:SI 1 dx)
       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
    (nil))

is that the volatile occurs *after* the zeroing instruction.  So at best
it can stop insn 18 moving further down, to be closer to the return
instruction.  There's nothing to stop insn 18 moving further up,
away from the return instruction, which AIUI is what you're trying
to prevent.  E.g. suppose we had:

(insn 17 … pop a register other than dx from the stack …)
(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 19 18 20 2 (unspec_volatile [
           (reg:SI 1 dx)
       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
    (nil))

There is nothing to stop an rtl pass reordering that to:

(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 17 … pop a register other than dx from the stack …)
(insn 19 18 20 2 (unspec_volatile [
           (reg:SI 1 dx)
       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
    (nil))

There's also no dataflow reason why this couldn't be reordered to:

(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 19 18 20 2 (unspec_volatile [
           (reg:SI 1 dx)
       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
    (nil))
(insn 17 … pop a register other than dx from the stack …)

So…

> So, I thought that “UNSPEC_volatile” should be stronger than “EPILOGUE_USES”. And it can serve the purpose of preventing zeroing insns from deleting and moving. 

…both EPILOGUE_USES and UNSPEC_VOLATILE would be effective ways of
stopping insn 18 from being deleted.  But an UNSPEC_VOLATILE after
the instruction would IMO be counterproductive: it would stop the
zeroing instructions that we want to be close to the return instruction
from moving “closer” to the return instruction, but it wouldn't do the
same for unrelated instructions.  So if anything, the unspec_volatile
could increase the chances that something unrelated to the register
zeroing is moved later than the register zeroing.  E.g. this could
happen when filling delayed branch slots.

>> I don't think there's a foolproof way of preventing an unknown target
>> machine_reorg pass from moving the instructions around.  But since we
>> don't have unknown machine_reorgs (at least not in-tree), I think
>> instead we should be prepared to patch machine_reorgs where necessary
>> to ensure that they do the right thing.
>> 
>> If you want to increase the chances that machine_reorgs don't need to be
>> patched, you could either:
>> 
>> (a) to make the zeroing instructions themselves volatile or
>> (b) to insert a volatile reference to the register before (rather than
>>    after) the zeroing instruction
>> 
>> IMO (b) is the way to go, because it avoids the need to define special
>> volatile move patterns for each type of register.  (b) would be needed
>> on top of (rather than instead of) the EPILOGUE_USES thing.
>> 
> Okay, will take approach b. 
>
> But I still don’t quite understand why we still need “EPILOUGE_USES”? What’s the additional benefit from EPILOGUE_USES?

The asm for (b) goes before the instruction, so we'd have:

(insn 17 … new asm …)
(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 19 … return …)

But something has to tell the df machinery that the value of edx
matters on return from the function, otherwise insn 18 could be
deleted as dead.  Adding edx to EPILOGUE_USES provides that information
and stops the instruction from being deleted.

>> I don't think we need a new target-specific unspec_volatile code to do (b).
>> We can just use an automatically-generated volatile asm to clobber the
>> registers first.  See e.g. how expand_asm_memory_blockage handles memory
>> scheduling barriers.
> /* Generate asm volatile("" : : : "memory") as the memory blockage.  */
>
> static void
> expand_asm_memory_blockage (void)
> {
>   rtx asm_op, clob;
>
>   asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>                                  rtvec_alloc (0), rtvec_alloc (0),
>                                  rtvec_alloc (0), UNKNOWN_LOCATION);
>   MEM_VOLATILE_P (asm_op) = 1;
>
>   clob = gen_rtx_SCRATCH (VOIDmode);
>   clob = gen_rtx_MEM (BLKmode, clob);
>   clob = gen_rtx_CLOBBER (VOIDmode, clob);
>
>   emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
> }
>
>
> As the following? 
>
> /* Generate asm volatile("" : : : “regno") for REGNO.   */
>
> static void
> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
> {
>   rtx asm_op, clob;
>
>   asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>                                  rtvec_alloc (0), rtvec_alloc (0),
>                                  rtvec_alloc (0), UNKNOWN_LOCATION);
>   MEM_VOLATILE_P (asm_op) = 1;
>
>   clob = gen_rtx_REG (mode, regno);
>   clob = gen_rtx_CLOBBER (VOIDmode, clob);
>
>   emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
> }
>
> Is the above correct? 

Yeah, looks good.  You should be able to clobber all the registers you
want to clear in one asm.  For extra safety, it might be worth including
a (mem:BLK (scratch)) clobber too, so that memory instructions don't get
moved across the asm.

Thanks,
Richard
Qing Zhao Sept. 17, 2020, 7:07 p.m. UTC | #143
> On Sep 17, 2020, at 11:27 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 17, 2020, at 1:17 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> Segher and Richard, 
>>>> 
>>>> Now there are two major concerns from the discussion so far:
>>>> 
>>>> 1. (From Richard):  Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. 
>>>>    So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 
>>>> 
>>>> 2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. 
>>>> 
>>>> I think that both of the above 2 concerns are important and should be addressed for the correct implementation. 
>>>> 
>>>> In order to support 1,  we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()”  since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. 
>>>> 
>>>> In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around.
>>> 
>>> Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
>>> being deleted as dead.
>>> 
>>>> More restrictions need to be added to these new zero insns.  (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). 
>>>> 
>>>> 
>>>> So, based on the above, I propose the following approach that will resolve the above 2 concerns:
>>>> 
>>>> 1. Add 2 new target hooks:
>>>>  A. targetm.pro_epilogue_use (reg)
>>>>  This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>>>>  prevent deleting register setting instructions in prologue and epilogue.
>>>> 
>>>>  B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>>>>  This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS.
>>>> 
>>>>   A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 
>>> 
>>> This sounds like you're going back to using:
>>> 
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>       (const_int 0 [0])) "t10.c":11:1 -1
>>>    (nil))
>>> (insn 19 18 20 2 (unspec_volatile [
>>>           (reg:SI 1 dx)
>>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>    (nil))
>>> 
>>> This also doesn't prevent the zeroing from being moved around.  Like the
>>> EPILOGUE_USES approach, it only prevents the clearing from being removed
>>> as dead.  I still think that the EPILOGUE_USES approach is the better
>>> way of doing that.
>> 
>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>> 
>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>> ;; all of memory.  This blocks insns from being moved across this point.
> 
> Heh, it looks like that comment dates back to 1994. :-)
> 
> The comment is no longer correct though.  I wasn't around at the time,
> but I assume the comment was only locally true even then.
> 
> If what the comment said was true, then something like:
> 
> (define_insn "cld"
>  [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>  ""
>  "cld"
>  [(set_attr "length" "1")
>   (set_attr "length_immediate" "0")
>   (set_attr "modrm" "0")])
> 
> would invalidate the entire register file and so would require all values
> to be spilt to the stack around the CLD.

Okay, thanks for the info. 
then, what’s the current definition of UNSPEC_VOLATILE? 


> 
>> I am not very familiar with how the unspec_volatile actually works in gcc’s data flow analysis, my understanding  from the above is, the RTL insns marked with UNSPEC_volatile would be served as a barrier that no other insns can move across this point. At the same time, since the marked RTL insns is considered to use and clobber all hard registers and memory, it cannot be deleted either. 
> 
> UNSPEC_VOLATILEs can't be deleted.  And they can't be reordered relative
> to other UNSPEC_VOLATILEs.  But the problem with:
> 
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 19 18 20 2 (unspec_volatile [
>           (reg:SI 1 dx)
>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>    (nil))
> 
> is that the volatile occurs *after* the zeroing instruction.  So at best
> it can stop insn 18 moving further down, to be closer to the return
> instruction.  There's nothing to stop insn 18 moving further up,
> away from the return instruction, which AIUI is what you're trying
> to prevent.  E.g. suppose we had:
> 
> (insn 17 … pop a register other than dx from the stack …)
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 19 18 20 2 (unspec_volatile [
>           (reg:SI 1 dx)
>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>    (nil))
> 
> There is nothing to stop an rtl pass reordering that to:
> 
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 17 … pop a register other than dx from the stack …)
> (insn 19 18 20 2 (unspec_volatile [
>           (reg:SI 1 dx)
>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>    (nil))

Yes, agreed. And then the volatile marking insn should be put BEFORE the zeroing insn. 

> 
> There's also no dataflow reason why this couldn't be reordered to:
> 
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 19 18 20 2 (unspec_volatile [
>           (reg:SI 1 dx)
>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>    (nil))
> (insn 17 … pop a register other than dx from the stack …)
> 

This is the place I don’t quite agree at this moment, maybe I still not quite understand the “UNSPEC_volatile”.

I checked several places in GCC that handle “UNSPEC_VOLATILE”, for example,  for the routine “can_move_insns_across” in gcc/df-problem.c:

      if (NONDEBUG_INSN_P (insn))
        {
          if (volatile_insn_p (PATTERN (insn)))
            return false;

From my understanding of reading the code, when an insn is UNSPEC_VOLATILE, another insn will NOT be able to move across it. 

Then for the above example, the insn 17 should Not be moved across insn 19 either.

Let me know if I miss anything important. 


> So…
> 
>> So, I thought that “UNSPEC_volatile” should be stronger than “EPILOGUE_USES”. And it can serve the purpose of preventing zeroing insns from deleting and moving. 
> 
> …both EPILOGUE_USES and UNSPEC_VOLATILE would be effective ways of
> stopping insn 18 from being deleted.  But an UNSPEC_VOLATILE after
> the instruction would IMO be counterproductive: it would stop the
> zeroing instructions that we want to be close to the return instruction
> from moving “closer” to the return instruction, but it wouldn't do the
> same for unrelated instructions.  So if anything, the unspec_volatile
> could increase the chances that something unrelated to the register
> zeroing is moved later than the register zeroing.  E.g. this could
> happen when filling delayed branch slots.
> 
>>> I don't think there's a foolproof way of preventing an unknown target
>>> machine_reorg pass from moving the instructions around.  But since we
>>> don't have unknown machine_reorgs (at least not in-tree), I think
>>> instead we should be prepared to patch machine_reorgs where necessary
>>> to ensure that they do the right thing.
>>> 
>>> If you want to increase the chances that machine_reorgs don't need to be
>>> patched, you could either:
>>> 
>>> (a) to make the zeroing instructions themselves volatile or
>>> (b) to insert a volatile reference to the register before (rather than
>>>   after) the zeroing instruction
>>> 
>>> IMO (b) is the way to go, because it avoids the need to define special
>>> volatile move patterns for each type of register.  (b) would be needed
>>> on top of (rather than instead of) the EPILOGUE_USES thing.
>>> 
>> Okay, will take approach b. 
>> 
>> But I still don’t quite understand why we still need “EPILOUGE_USES”? What’s the additional benefit from EPILOGUE_USES?
> 
> The asm for (b) goes before the instruction, so we'd have:
> 
> (insn 17 … new asm …)
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 19 … return …)
> 
> But something has to tell the df machinery that the value of edx
> matters on return from the function, otherwise insn 18 could be
> deleted as dead.  Adding edx to EPILOGUE_USES provides that information
> and stops the instruction from being deleted.


In the above, insn 17 will be something like:

(insn 17 ...(unspec_volatile [  (reg:SI 1 dx)
    ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1 
(nil))

So, the reg edx is marked as “UNSPEC_volatile” already, that should mean the value of edx matters on return from the function already, my understanding is that df should automatically pick up the “UNSPEC_VOLATILE” insn and it’s operands.   “UNSPEC_VOLATILE” insn should serve the same purpose as putting “edx” to EPILOGUE_USES. 

Do I miss anything here?

> 
>>> I don't think we need a new target-specific unspec_volatile code to do (b).
>>> We can just use an automatically-generated volatile asm to clobber the
>>> registers first.  See e.g. how expand_asm_memory_blockage handles memory
>>> scheduling barriers.
>> /* Generate asm volatile("" : : : "memory") as the memory blockage.  */
>> 
>> static void
>> expand_asm_memory_blockage (void)
>> {
>>  rtx asm_op, clob;
>> 
>>  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>                                 rtvec_alloc (0), rtvec_alloc (0),
>>                                 rtvec_alloc (0), UNKNOWN_LOCATION);
>>  MEM_VOLATILE_P (asm_op) = 1;
>> 
>>  clob = gen_rtx_SCRATCH (VOIDmode);
>>  clob = gen_rtx_MEM (BLKmode, clob);
>>  clob = gen_rtx_CLOBBER (VOIDmode, clob);
>> 
>>  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>> }
>> 
>> 
>> As the following? 
>> 
>> /* Generate asm volatile("" : : : “regno") for REGNO.   */
>> 
>> static void
>> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
>> {
>>  rtx asm_op, clob;
>> 
>>  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>                                 rtvec_alloc (0), rtvec_alloc (0),
>>                                 rtvec_alloc (0), UNKNOWN_LOCATION);
>>  MEM_VOLATILE_P (asm_op) = 1;
>> 
>>  clob = gen_rtx_REG (mode, regno);
>>  clob = gen_rtx_CLOBBER (VOIDmode, clob);
>> 
>>  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>> }
>> 
>> Is the above correct? 
> 
> Yeah, looks good.  You should be able to clobber all the registers you
> want to clear in one asm.

How to do this?

thanks.

Qing
>  For extra safety, it might be worth including
> a (mem:BLK (scratch)) clobber too, so that memory instructions don't get
> moved across the asm.
> 
> Thanks,
> Richard
Segher Boessenkool Sept. 17, 2020, 10:26 p.m. UTC | #144
On Thu, Sep 17, 2020 at 05:27:59PM +0100, Richard Sandiford wrote:
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> > The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
> >
> > ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
> > ;; all of memory.  This blocks insns from being moved across this point.
> 
> Heh, it looks like that comment dates back to 1994. :-)
> 
> The comment is no longer correct though.  I wasn't around at the time,
> but I assume the comment was only locally true even then.

I think it was never true at all, even.

An unspec_volatile is just an unspec that is volatile, i.e. it needs to
be executed in the real machine exactly like in the abstract C machine
(wrt sequence points).  It typically does something the compiler does
not model (say, to resources it does not know about), but you can use it
for anything you want executed approximately as written.

> UNSPEC_VOLATILEs can't be deleted.

(If they are executed at all, anyway ;-) )


Segher
Qing Zhao Sept. 18, 2020, 8:31 p.m. UTC | #145
Hi, Richard,

During my implementation of the new version of the patch. I still feel that it’s not practical to add a default definition in the middle end to just use move patterns to zero each selected register. 

The major issues are:

There are some target specific information on how to define “general register” set and “all register” set,  we have to add a new specific target hook to get such target specific information and pass to middle-end. 


For example, on X86, for CALL_USED_REGISTERS, we have:

#define CALL_USED_REGISTERS                                     \
/*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7*/      \
{  1, 1, 1, 0, 4, 4, 0, 1, 1,  1,  1,  1,  1,  1,  1,  1,       \
/*arg,flags,fpsr,frame*/                                        \
    1,   1,    1,    1,                                         \
/*xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7*/                     \
     1,   1,   1,   1,   1,   1,   6,   6,                      \
/* mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7*/   


From the above, we can see “st0 to st7” are call_used_registers for x86, however, we should not zero these registers on x86. 

Such details is only known by x86 backend. 

I guess that other platforms might have similar issue. 

If we still want  a default definition in middle end to generate the zeroing insn for selected registers, I have to add another target hook, say, “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)” to check whether a register should be zeroed based on gpr_only (general register only)  and target specific decision.   I will provide a x86 implementation for this target hook in this patch. 

Other targets have to implement this new target hook to utilize the default handler. 

Let me know your opinion:

A.  Will not provide default definition in middle end to generate the zeroing insn for selected registers.  Move the generation work all to target; X86 implementation will be provided;

OR:

B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 


thanks.

Qing


> On Sep 11, 2020, at 4:44 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Having a target hook sounds good, but I think it should have a
> default definition that just uses the move patterns to zero each
> selected register.  I expect the default will be good enough for
> most targets.
> 
> Thanks,
> Richard
Segher Boessenkool Sept. 18, 2020, 10:51 p.m. UTC | #146
Hi!

On Fri, Sep 18, 2020 at 03:31:12PM -0500, Qing Zhao wrote:
> Let me know your opinion:
> 
> A.  Will not provide default definition in middle end to generate the zeroing insn for selected registers.  Move the generation work all to target; X86 implementation will be provided;
> 
> OR:
> 
> B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 

Is this just to make the xor thing work?  i386 has a peephole to
transform the mov to a xor for this (and the backend could just handle
it in its mov<M> patterns, maybe a peephole was easier for i386, no
idea).


Segher
Richard Sandiford Sept. 21, 2020, 7:23 a.m. UTC | #147
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> Hi, Richard,
>
> During my implementation of the new version of the patch. I still feel that it’s not practical to add a default definition in the middle end to just use move patterns to zero each selected register. 
>
> The major issues are:
>
> There are some target specific information on how to define “general register” set and “all register” set,  we have to add a new specific target hook to get such target specific information and pass to middle-end. 

GENERAL_REGS and ALL_REGS are already concepts that target-independent
code knows about though.  I think the non-fixed subsets of those would
make good starting sets, which the target could whittle down it wanted
or needed to.

> For example, on X86, for CALL_USED_REGISTERS, we have:
>
> #define CALL_USED_REGISTERS                                     \
> /*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7*/      \
> {  1, 1, 1, 0, 4, 4, 0, 1, 1,  1,  1,  1,  1,  1,  1,  1,       \
> /*arg,flags,fpsr,frame*/                                        \
>     1,   1,    1,    1,                                         \
> /*xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7*/                     \
>      1,   1,   1,   1,   1,   1,   6,   6,                      \
> /* mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7*/   
>
>
> From the above, we can see “st0 to st7” are call_used_registers for x86, however, we should not zero these registers on x86. 
>
> Such details is only known by x86 backend. 
>
> I guess that other platforms might have similar issue. 

They might, but that doesn't disprove that there's a sensisble default
choice that works for most targets.

FWIW, stack registers themselves are already exposed outside targets
(see reg-stack.c, although since x86 is the only port that uses it,
the main part of it is effectively target-dependent at the moment).
Similarly for register windows.

> If we still want  a default definition in middle end to generate the zeroing insn for selected registers, I have to add another target hook, say, “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)” to check whether a register should be zeroed based on gpr_only (general register only)  and target specific decision.   I will provide a x86 implementation for this target hook in this patch. 
>
> Other targets have to implement this new target hook to utilize the default handler. 
>
> Let me know your opinion:
>
> A.  Will not provide default definition in middle end to generate the zeroing insn for selected registers.  Move the generation work all to target; X86 implementation will be provided;
>
> OR:
>
> B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 

The kind of target hook interface I was thinking of was:

  HARD_REG_SET TARGET_EMIT_MOVE_ZEROS (const HARD_REG_SET &regs)

which:

- emits zeroing instructions for some target-specific subset of REGS

- returns the set of registers that were actually cleared

The default implementation would clear all registers in REGS,
using reg_raw_mode[R] as the mode for register R.  Targets could
then override the hook and:

- drop registers that shouldn't be cleared

- handle some or all of the remaining registers in a more optimal,
  target-specific way

The targets could then use the default implementation of the hook
to handle any residue.  E.g. the default implementation would be
able to handle general registers on x86.

Thanks,
Richard
Qing Zhao Sept. 21, 2020, 2:13 p.m. UTC | #148
> On Sep 18, 2020, at 5:51 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Fri, Sep 18, 2020 at 03:31:12PM -0500, Qing Zhao wrote:
>> Let me know your opinion:
>> 
>> A.  Will not provide default definition in middle end to generate the zeroing insn for selected registers.  Move the generation work all to target; X86 implementation will be provided;
>> 
>> OR:
>> 
>> B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 
> 
> Is this just to make the xor thing work?  i386 has a peephole to
> transform the mov to a xor for this (and the backend could just handle
> it in its mov<M> patterns, maybe a peephole was easier for i386, no
> idea).

You mean what’s the purpose of the new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)?

The purpose of this new target hook is for the target to delete some of the call_used registers that should not be zeroed, for example, the stack registers in X86. (St0-st7). 
For other platforms, there might be other call_used registers that should not be zeroed. 

Qing

> 
> 
> Segher
Qing Zhao Sept. 21, 2020, 2:29 p.m. UTC | #149
> On Sep 21, 2020, at 2:23 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>> Hi, Richard,
>> 
>> During my implementation of the new version of the patch. I still feel that it’s not practical to add a default definition in the middle end to just use move patterns to zero each selected register. 
>> 
>> The major issues are:
>> 
>> There are some target specific information on how to define “general register” set and “all register” set,  we have to add a new specific target hook to get such target specific information and pass to middle-end. 
> 
> GENERAL_REGS and ALL_REGS are already concepts that target-independent
> code knows about though.  I think the non-fixed subsets of those would
> make good starting sets, which the target could whittle down it wanted
> or needed to.

Yes, this is what I am currently doing:  

First, the middle end computes the initial need_zeroed_hardregs based on user request, data flow, and function abi. Then pass this “need_zeroed_hardregs” to target hook;
Then, the target hook will delete some of the registers that should not be zeroed in that specific target from “need_zeroed_hardregs”, for example, stack_regs on x86.

> 
>> For example, on X86, for CALL_USED_REGISTERS, we have:
>> 
>> #define CALL_USED_REGISTERS                                     \
>> /*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7*/      \
>> {  1, 1, 1, 0, 4, 4, 0, 1, 1,  1,  1,  1,  1,  1,  1,  1,       \
>> /*arg,flags,fpsr,frame*/                                        \
>>    1,   1,    1,    1,                                         \
>> /*xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7*/                     \
>>     1,   1,   1,   1,   1,   1,   6,   6,                      \
>> /* mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7*/   
>> 
>> 
>> From the above, we can see “st0 to st7” are call_used_registers for x86, however, we should not zero these registers on x86. 
>> 
>> Such details is only known by x86 backend. 
>> 
>> I guess that other platforms might have similar issue. 
> 
> They might, but that doesn't disprove that there's a sensisble default
> choice that works for most targets.
> 
> FWIW, stack registers themselves are already exposed outside targets
> (see reg-stack.c, although since x86 is the only port that uses it,
> the main part of it is effectively target-dependent at the moment).
> Similarly for register windows.

Yes, the stack_regs currently can be referenced as STACK_REG_P in middle end. So for X86, we might be able to identify this in middle end.

However, my major concern is other platforms that we are not very familiar with, there might be some special registers on that platform that should not be zeroed,  and currently, there is no way to identify them in middle end.

For such platform, the default handler will not be correct. 
> 
>> If we still want  a default definition in middle end to generate the zeroing insn for selected registers, I have to add another target hook, say, “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)” to check whether a register should be zeroed based on gpr_only (general register only)  and target specific decision.   I will provide a x86 implementation for this target hook in this patch. 
>> 
>> Other targets have to implement this new target hook to utilize the default handler. 
>> 
>> Let me know your opinion:
>> 
>> A.  Will not provide default definition in middle end to generate the zeroing insn for selected registers.  Move the generation work all to target; X86 implementation will be provided;
>> 
>> OR:
>> 
>> B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 
> 
> The kind of target hook interface I was thinking of was:
> 
>  HARD_REG_SET TARGET_EMIT_MOVE_ZEROS (const HARD_REG_SET &regs)
> 
> which:
> 
> - emits zeroing instructions for some target-specific subset of REGS
> 
> - returns the set of registers that were actually cleared
> 
> The default implementation would clear all registers in REGS,
> using reg_raw_mode[R] as the mode for register R.  Targets could
> then override the hook and:
> 
> - drop registers that shouldn't be cleared
> 
> - handle some or all of the remaining registers in a more optimal,
>  target-specific way
> 
> The targets could then use the default implementation of the hook
> to handle any residue.  E.g. the default implementation would be
> able to handle general registers on x86.

Even for the general registers on X86, we need some special optimization for optimal code generation, for example, we might want to optimize 
A “mov” to xor on X86;

My major concern with the default implementation of the hook is:

If a target has some special registers that should not be zeroed, and we do not provide an overridden implementation for this target, then the default implementation will generate incorrect code for this target. 

How to resolve this issue?

thanks.

Qing

> 
> Thanks,
> Richard
Richard Sandiford Sept. 21, 2020, 3:35 p.m. UTC | #150
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> My major concern with the default implementation of the hook is:
>
> If a target has some special registers that should not be zeroed, and we do not provide an overridden implementation for this target, then the default implementation will generate incorrect code for this target. 

That's OK.  The default behaviour of hooks and macros often needs
to be corrected by target code.  For example, going back to some
of the macros and hooks we talked about earlier:

- EPILOGUE_USES by default returns false for all registers.
  This would be the wrong behaviour for any target that currently
  defines EPILOGUE_USES to something else.

- TARGET_HARD_REGNO_SCRATCH_OK by default returns true for all registers.
  This would be the wrong behaviour for any target that currently defines
  the hook to do something else.

And in general, if there's a target-specific reason that something
has to be different from normal, it's better where possible to expose
the underlying concept that makes that different behaviour necessary,
rather than expose the downstream effects of that concept.  For example,
IMO it's a historical mistake that targets that support interrupt
handlers need to change all of:

- TARGET_HARD_REGNO_SCRATCH_OK
- HARD_REGNO_RENAME_OK
- EPILOGUE_USES

to expose what is essentially one concept.  IMO we should instead
just expose the fact that certain functions have extra call-saved
registers.  (This is now possible with the function_abi stuff,
but most interrupt handler support predates that.)

So if there is some concept that prevents your new target hook being
correct for x86, I think we should try if possible to expose that
concept to target-independent code.  And in the case of stack registers,
that has already been done.

The same would apply to any other target for which the default turns out
not to be correct.

But in cases where there is no underlying concept that can sensibly
be extracted out, it's OK if targets need to override the default
to get correct behaviour.

Thanks,
Richard
Qing Zhao Sept. 21, 2020, 4:34 p.m. UTC | #151
> On Sep 21, 2020, at 10:35 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> My major concern with the default implementation of the hook is:
>> 
>> If a target has some special registers that should not be zeroed, and we do not provide an overridden implementation for this target, then the default implementation will generate incorrect code for this target. 
> 
> That's OK.  The default behaviour of hooks and macros often needs
> to be corrected by target code.  For example, going back to some
> of the macros and hooks we talked about earlier:
> 
> - EPILOGUE_USES by default returns false for all registers.
>  This would be the wrong behaviour for any target that currently
>  defines EPILOGUE_USES to something else.
> 
> - TARGET_HARD_REGNO_SCRATCH_OK by default returns true for all registers.
>  This would be the wrong behaviour for any target that currently defines
>  the hook to do something else.
> 
> And in general, if there's a target-specific reason that something
> has to be different from normal, it's better where possible to expose
> the underlying concept that makes that different behaviour necessary,
> rather than expose the downstream effects of that concept.  For example,
> IMO it's a historical mistake that targets that support interrupt
> handlers need to change all of:
> 
> - TARGET_HARD_REGNO_SCRATCH_OK
> - HARD_REGNO_RENAME_OK
> - EPILOGUE_USES
> 
> to expose what is essentially one concept.  IMO we should instead
> just expose the fact that certain functions have extra call-saved
> registers.  (This is now possible with the function_abi stuff,
> but most interrupt handler support predates that.)
> 
> So if there is some concept that prevents your new target hook being
> correct for x86, I think we should try if possible to expose that
> concept to target-independent code.  And in the case of stack registers,
> that has already been done.

I will exclude “stack registers” in the middle end to see whether this can resolve the issue with X86. 
> 
> The same would apply to any other target for which the default turns out
> not to be correct.
> 
> But in cases where there is no underlying concept that can sensibly
> be extracted out, it's OK if targets need to override the default
> to get correct behaviour.

Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?

Qing
> 
> Thanks,
> Richard
Richard Sandiford Sept. 21, 2020, 7:11 p.m. UTC | #152
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> But in cases where there is no underlying concept that can sensibly
>> be extracted out, it's OK if targets need to override the default
>> to get correct behaviour.
>
> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?

The point is that we're trying to implement this in a target-independent
way, like for most compiler features.  If the option doesn't work for a
particular target, then that's a bug like any other.  The most we can
reasonably do is:

(a) try to implement the feature in a way that uses all the appropriate
    pieces of compiler infrastructure (what we've been discussing)

(b) add tests for the feature that run on all targets

It's possible that bugs could slip through even then.  But that's true
of anything.

Targets like x86 support many subtargets, many different compilation
modes, and many different compiler features (register asms, various
fancy function attributes, etc.).  So even after the option is
committed and is supposedly supported on x86, it's possible that
we'll find a bug in the feature on x86 itself.

I don't think anyone would suggest that we should warn the user that the
option might be buggy on x86 (it's initial target).  But I also don't
see any reason for believing that a bug on x86 is less likely than
a bug on other targets.

Thanks,
Richard
Qing Zhao Sept. 21, 2020, 7:22 p.m. UTC | #153
> On Sep 21, 2020, at 2:11 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> But in cases where there is no underlying concept that can sensibly
>>> be extracted out, it's OK if targets need to override the default
>>> to get correct behaviour.
>> 
>> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
>> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?
> 
> The point is that we're trying to implement this in a target-independent
> way, like for most compiler features.  If the option doesn't work for a
> particular target, then that's a bug like any other.  The most we can
> reasonably do is:
> 
> (a) try to implement the feature in a way that uses all the appropriate
>    pieces of compiler infrastructure (what we've been discussing)
> 
> (b) add tests for the feature that run on all targets
> 
> It's possible that bugs could slip through even then.  But that's true
> of anything.
> 
> Targets like x86 support many subtargets, many different compilation
> modes, and many different compiler features (register asms, various
> fancy function attributes, etc.).  So even after the option is
> committed and is supposedly supported on x86, it's possible that
> we'll find a bug in the feature on x86 itself.
> 
> I don't think anyone would suggest that we should warn the user that the
> option might be buggy on x86 (it's initial target).  But I also don't
> see any reason for believing that a bug on x86 is less likely than
> a bug on other targets.

Okay, then I will add the default implementation as you suggested. And also provide the overriden optimized implementation on X86. 

Let me know if you have further suggestion.

Qing
> 
> Thanks,
> Richard
Qing Zhao Sept. 21, 2020, 8:05 p.m. UTC | #154
> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> 
>> On Sep 21, 2020, at 2:11 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> But in cases where there is no underlying concept that can sensibly
>>>> be extracted out, it's OK if targets need to override the default
>>>> to get correct behaviour.
>>> 
>>> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
>>> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?
>> 
>> The point is that we're trying to implement this in a target-independent
>> way, like for most compiler features.  If the option doesn't work for a
>> particular target, then that's a bug like any other.  The most we can
>> reasonably do is:
>> 
>> (a) try to implement the feature in a way that uses all the appropriate
>>   pieces of compiler infrastructure (what we've been discussing)
>> 
>> (b) add tests for the feature that run on all targets
>> 
>> It's possible that bugs could slip through even then.  But that's true
>> of anything.
>> 
>> Targets like x86 support many subtargets, many different compilation
>> modes, and many different compiler features (register asms, various
>> fancy function attributes, etc.).  So even after the option is
>> committed and is supposedly supported on x86, it's possible that
>> we'll find a bug in the feature on x86 itself.
>> 
>> I don't think anyone would suggest that we should warn the user that the
>> option might be buggy on x86 (it's initial target).  But I also don't
>> see any reason for believing that a bug on x86 is less likely than
>> a bug on other targets.
> 
> Okay, then I will add the default implementation as you suggested. And also provide the overriden optimized implementation on X86. 

For X86, looks like that in addition to stack registers (st0 to st7), mask registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  should Not be zeroed too.

As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use them in middle end similar as “STACK_REG_P”?

Qing
> 
> Let me know if you have further suggestion.
> 
> Qing
>> 
>> Thanks,
>> Richard
Segher Boessenkool Sept. 21, 2020, 8:34 p.m. UTC | #155
On Mon, Sep 21, 2020 at 09:13:58AM -0500, Qing Zhao wrote:
> > On Sep 18, 2020, at 5:51 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 
> > 
> > Is this just to make the xor thing work?  i386 has a peephole to
> > transform the mov to a xor for this (and the backend could just handle
> > it in its mov<M> patterns, maybe a peephole was easier for i386, no
> > idea).
> 
> You mean what’s the purpose of the new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)?
> 
> The purpose of this new target hook is for the target to delete some of the call_used registers that should not be zeroed, for example, the stack registers in X86. (St0-st7). 

Oh, I didn't see the _P.  Maybe give it a better name?  Also, a better
interface altogether, a call per hard register is a bit much (and easily
avoidable).

> For other platforms, there might be other call_used registers that should not be zeroed. 

But you cannot *add* anything with this interface, and it cannot return
different results depending on which return insn this is.  It is not a
good abstraction IMO.


Segher
Qing Zhao Sept. 21, 2020, 8:58 p.m. UTC | #156
> On Sep 21, 2020, at 3:34 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Mon, Sep 21, 2020 at 09:13:58AM -0500, Qing Zhao wrote:
>>> On Sep 18, 2020, at 5:51 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>> B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 
>>> 
>>> Is this just to make the xor thing work?  i386 has a peephole to
>>> transform the mov to a xor for this (and the backend could just handle
>>> it in its mov<M> patterns, maybe a peephole was easier for i386, no
>>> idea).
>> 
>> You mean what’s the purpose of the new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)?
>> 
>> The purpose of this new target hook is for the target to delete some of the call_used registers that should not be zeroed, for example, the stack registers in X86. (St0-st7). 
> 
> Oh, I didn't see the _P.  Maybe give it a better name?  Also, a better
> interface altogether, a call per hard register is a bit much (and easily
> avoidable).
> 
>> For other platforms, there might be other call_used registers that should not be zeroed. 
> 
> But you cannot *add* anything with this interface, and it cannot return
> different results depending on which return insn this is.  It is not a
> good abstraction IMO.

This hook will not depend on which return insn.  It just check whether the specified register can be zeroed for this target, for example, it will exclude stack register (st0 to st7), MMX registers (mm0 to mm7) and mask registers (t0 to t7) for X86 target from zeroing. 

The information depending on which return should be reflected in the data flow information,  which we can easily get from middle-end’s data flow analysis. 

I have added such target hook in the previous patch as: 

https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550018.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550018.html>

However, I got several comments on too much target specific details exposed unnecessary in the very beginning of the discussion. 

However, If we want to add a default implementation in the middle end as Richard suggested, this target hook might be necessary.

Qing

> 
> 
> Segher
Segher Boessenkool Sept. 22, 2020, 12:25 a.m. UTC | #157
On Mon, Sep 21, 2020 at 03:58:25PM -0500, Qing Zhao wrote:
> > On Sep 21, 2020, at 3:34 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > But you cannot *add* anything with this interface, and it cannot return
> > different results depending on which return insn this is.  It is not a
> > good abstraction IMO.
> 
> This hook will not depend on which return insn.

But good code generation very much *does*.


Segher
Richard Sandiford Sept. 22, 2020, 4:31 p.m. UTC | #158
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> 
>> 
>> 
>>> On Sep 21, 2020, at 2:11 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> But in cases where there is no underlying concept that can sensibly
>>>>> be extracted out, it's OK if targets need to override the default
>>>>> to get correct behaviour.
>>>> 
>>>> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
>>>> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?
>>> 
>>> The point is that we're trying to implement this in a target-independent
>>> way, like for most compiler features.  If the option doesn't work for a
>>> particular target, then that's a bug like any other.  The most we can
>>> reasonably do is:
>>> 
>>> (a) try to implement the feature in a way that uses all the appropriate
>>>   pieces of compiler infrastructure (what we've been discussing)
>>> 
>>> (b) add tests for the feature that run on all targets
>>> 
>>> It's possible that bugs could slip through even then.  But that's true
>>> of anything.
>>> 
>>> Targets like x86 support many subtargets, many different compilation
>>> modes, and many different compiler features (register asms, various
>>> fancy function attributes, etc.).  So even after the option is
>>> committed and is supposedly supported on x86, it's possible that
>>> we'll find a bug in the feature on x86 itself.
>>> 
>>> I don't think anyone would suggest that we should warn the user that the
>>> option might be buggy on x86 (it's initial target).  But I also don't
>>> see any reason for believing that a bug on x86 is less likely than
>>> a bug on other targets.
>> 
>> Okay, then I will add the default implementation as you suggested. And also provide the overriden optimized implementation on X86. 
>
> For X86, looks like that in addition to stack registers (st0 to st7), mask registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  should Not be zeroed too.
>
> As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use them in middle end similar as “STACK_REG_P”?

No, those are x86-specific like you say.

Taking each in turn: what is the reason for not clearing mask registers?
And what is the reason for not clearing mm0-7?  In each case, is it a
performance or a correctness issue?

Although the registers themselves are target-specific, the reason
for excluding them might be something that could be exposed to
target-independent code.

As a general comment, with at least three sets of excluded registers,
the “all” in one of the suggested option values is beginning to feel
like a misnomer.  (Maybe that has already been dropped though.)

Thanks,
Richard
Richard Sandiford Sept. 22, 2020, 5:06 p.m. UTC | #159
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 17, 2020, at 11:27 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>> On Sep 17, 2020, at 1:17 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> 
>>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> Segher and Richard, 
>>>>> 
>>>>> Now there are two major concerns from the discussion so far:
>>>>> 
>>>>> 1. (From Richard):  Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. 
>>>>>    So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 
>>>>> 
>>>>> 2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. 
>>>>> 
>>>>> I think that both of the above 2 concerns are important and should be addressed for the correct implementation. 
>>>>> 
>>>>> In order to support 1,  we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()”  since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. 
>>>>> 
>>>>> In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around.
>>>> 
>>>> Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
>>>> being deleted as dead.
>>>> 
>>>>> More restrictions need to be added to these new zero insns.  (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). 
>>>>> 
>>>>> 
>>>>> So, based on the above, I propose the following approach that will resolve the above 2 concerns:
>>>>> 
>>>>> 1. Add 2 new target hooks:
>>>>>  A. targetm.pro_epilogue_use (reg)
>>>>>  This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>>>>>  prevent deleting register setting instructions in prologue and epilogue.
>>>>> 
>>>>>  B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>>>>>  This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS.
>>>>> 
>>>>>   A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 
>>>> 
>>>> This sounds like you're going back to using:
>>>> 
>>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>>       (const_int 0 [0])) "t10.c":11:1 -1
>>>>    (nil))
>>>> (insn 19 18 20 2 (unspec_volatile [
>>>>           (reg:SI 1 dx)
>>>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>>    (nil))
>>>> 
>>>> This also doesn't prevent the zeroing from being moved around.  Like the
>>>> EPILOGUE_USES approach, it only prevents the clearing from being removed
>>>> as dead.  I still think that the EPILOGUE_USES approach is the better
>>>> way of doing that.
>>> 
>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>> 
>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>> ;; all of memory.  This blocks insns from being moved across this point.
>> 
>> Heh, it looks like that comment dates back to 1994. :-)
>> 
>> The comment is no longer correct though.  I wasn't around at the time,
>> but I assume the comment was only locally true even then.
>> 
>> If what the comment said was true, then something like:
>> 
>> (define_insn "cld"
>>  [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>  ""
>>  "cld"
>>  [(set_attr "length" "1")
>>   (set_attr "length_immediate" "0")
>>   (set_attr "modrm" "0")])
>> 
>> would invalidate the entire register file and so would require all values
>> to be spilt to the stack around the CLD.
>
> Okay, thanks for the info. 
> then, what’s the current definition of UNSPEC_VOLATILE? 

I'm not sure it's written down anywhere TBH.  rtl.texi just says:

  @code{unspec_volatile} is used for volatile operations and operations
  that may trap; @code{unspec} is used for other operations.

which seems like a cyclic definition: volatile expressions are defined
to be expressions that are volatile.

But IMO the semantics are that unspec_volatile patterns with a given
set of inputs and outputs act for dataflow purposes like volatile asms
with the same inputs and outputs.  The semantics of asm volatile are
at least slightly more well-defined (if only by example); see extend.texi
for details.  In particular:

  Note that the compiler can move even @code{volatile asm} instructions relative
  to other code, including across jump instructions. For example, on many 
  targets there is a system register that controls the rounding mode of 
  floating-point operations. Setting it with a @code{volatile asm} statement,
  as in the following PowerPC example, does not work reliably.

  @example
  asm volatile("mtfsf 255, %0" : : "f" (fpenv));
  sum = x + y;
  @end example

  The compiler may move the addition back before the @code{volatile asm}
  statement. To make it work as expected, add an artificial dependency to
  the @code{asm} by referencing a variable in the subsequent code, for
  example:

  @example
  asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
  sum = x + y;
  @end example

which is very similar to the unspec_volatile case we're talking about.

To take an x86 example:

  void
  f (char *x)
  {
    asm volatile ("");
    x[0] = 0;
    asm volatile ("");
    x[1] = 0;
    asm volatile ("");
  }

gets optimised to:

        xorl    %eax, %eax
        movw    %ax, (%rdi)

with the two stores being merged.  The same thing is IMO valid for
unspec_volatile.  In both cases, you would need some kind of memory
clobber to prevent the move and merge from happening.

>>> I am not very familiar with how the unspec_volatile actually works in gcc’s data flow analysis, my understanding  from the above is, the RTL insns marked with UNSPEC_volatile would be served as a barrier that no other insns can move across this point. At the same time, since the marked RTL insns is considered to use and clobber all hard registers and memory, it cannot be deleted either. 
>> 
>> UNSPEC_VOLATILEs can't be deleted.  And they can't be reordered relative
>> to other UNSPEC_VOLATILEs.  But the problem with:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>       (const_int 0 [0])) "t10.c":11:1 -1
>>    (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>           (reg:SI 1 dx)
>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>    (nil))
>> 
>> is that the volatile occurs *after* the zeroing instruction.  So at best
>> it can stop insn 18 moving further down, to be closer to the return
>> instruction.  There's nothing to stop insn 18 moving further up,
>> away from the return instruction, which AIUI is what you're trying
>> to prevent.  E.g. suppose we had:
>> 
>> (insn 17 … pop a register other than dx from the stack …)
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>       (const_int 0 [0])) "t10.c":11:1 -1
>>    (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>           (reg:SI 1 dx)
>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>    (nil))
>> 
>> There is nothing to stop an rtl pass reordering that to:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>       (const_int 0 [0])) "t10.c":11:1 -1
>>    (nil))
>> (insn 17 … pop a register other than dx from the stack …)
>> (insn 19 18 20 2 (unspec_volatile [
>>           (reg:SI 1 dx)
>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>    (nil))
>
> Yes, agreed. And then the volatile marking insn should be put BEFORE the zeroing insn. 
>
>> 
>> There's also no dataflow reason why this couldn't be reordered to:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>       (const_int 0 [0])) "t10.c":11:1 -1
>>    (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>           (reg:SI 1 dx)
>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>    (nil))
>> (insn 17 … pop a register other than dx from the stack …)
>> 
>
> This is the place I don’t quite agree at this moment, maybe I still not quite understand the “UNSPEC_volatile”.
>
> I checked several places in GCC that handle “UNSPEC_VOLATILE”, for example,  for the routine “can_move_insns_across” in gcc/df-problem.c:
>
>       if (NONDEBUG_INSN_P (insn))
>         {
>           if (volatile_insn_p (PATTERN (insn)))
>             return false;
>
> From my understanding of reading the code, when an insn is UNSPEC_VOLATILE, another insn will NOT be able to move across it. 
>
> Then for the above example, the insn 17 should Not be moved across insn 19 either.
>
> Let me know if I miss anything important. 

The above is conservatively correct.  But not all passes do it.
E.g. combine does have a similar approach:

  /* If INSN contains volatile references (specifically volatile MEMs),
     we cannot combine across any other volatile references.
     Even if INSN doesn't contain volatile references, any intervening
     volatile insn might affect machine state.  */

  is_volatile_p = volatile_refs_p (PATTERN (insn))
    ? volatile_refs_p
    : volatile_insn_p;

And like you say, the passes that use can_move_insns_across will be
conservative too.  But not many passes use that function.

Passes like fwprop.c, postreload-gcse.c and ree.c do not (AFAIK) worry
about volatile asms or unspec_volatiles, and can move code across them.
And that's kind-of inevitable.  Having an “everything barrier” makes
life very hard for global optimisation.

>> So…
>> 
>>> So, I thought that “UNSPEC_volatile” should be stronger than “EPILOGUE_USES”. And it can serve the purpose of preventing zeroing insns from deleting and moving. 
>> 
>> …both EPILOGUE_USES and UNSPEC_VOLATILE would be effective ways of
>> stopping insn 18 from being deleted.  But an UNSPEC_VOLATILE after
>> the instruction would IMO be counterproductive: it would stop the
>> zeroing instructions that we want to be close to the return instruction
>> from moving “closer” to the return instruction, but it wouldn't do the
>> same for unrelated instructions.  So if anything, the unspec_volatile
>> could increase the chances that something unrelated to the register
>> zeroing is moved later than the register zeroing.  E.g. this could
>> happen when filling delayed branch slots.
>> 
>>>> I don't think there's a foolproof way of preventing an unknown target
>>>> machine_reorg pass from moving the instructions around.  But since we
>>>> don't have unknown machine_reorgs (at least not in-tree), I think
>>>> instead we should be prepared to patch machine_reorgs where necessary
>>>> to ensure that they do the right thing.
>>>> 
>>>> If you want to increase the chances that machine_reorgs don't need to be
>>>> patched, you could either:
>>>> 
>>>> (a) to make the zeroing instructions themselves volatile or
>>>> (b) to insert a volatile reference to the register before (rather than
>>>>   after) the zeroing instruction
>>>> 
>>>> IMO (b) is the way to go, because it avoids the need to define special
>>>> volatile move patterns for each type of register.  (b) would be needed
>>>> on top of (rather than instead of) the EPILOGUE_USES thing.
>>>> 
>>> Okay, will take approach b. 
>>> 
>>> But I still don’t quite understand why we still need “EPILOUGE_USES”? What’s the additional benefit from EPILOGUE_USES?
>> 
>> The asm for (b) goes before the instruction, so we'd have:
>> 
>> (insn 17 … new asm …)
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>       (const_int 0 [0])) "t10.c":11:1 -1
>>    (nil))
>> (insn 19 … return …)
>> 
>> But something has to tell the df machinery that the value of edx
>> matters on return from the function, otherwise insn 18 could be
>> deleted as dead.  Adding edx to EPILOGUE_USES provides that information
>> and stops the instruction from being deleted.
>
>
> In the above, insn 17 will be something like:
>
> (insn 17 ...(unspec_volatile [  (reg:SI 1 dx)
>     ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1 
> (nil))

In the example above, insn 17 would be an asm that clobbers dx
(instead of using dx).

> So, the reg edx is marked as “UNSPEC_volatile” already, that should mean the value of edx matters on return from the function already, my understanding is that df should automatically pick up the “UNSPEC_VOLATILE” insn and it’s operands.   “UNSPEC_VOLATILE” insn should serve the same purpose as putting “edx” to EPILOGUE_USES. 
>
> Do I miss anything here?

The point is that any use of dx at insn 17 comes before the definition
in insn 18.  So a use in insn 17 would keep alive any store to dx that
happend before insn 17.  But it would not keep the store in insn 18 live,
since insn 18 executes later.

>>>> I don't think we need a new target-specific unspec_volatile code to do (b).
>>>> We can just use an automatically-generated volatile asm to clobber the
>>>> registers first.  See e.g. how expand_asm_memory_blockage handles memory
>>>> scheduling barriers.
>>> /* Generate asm volatile("" : : : "memory") as the memory blockage.  */
>>> 
>>> static void
>>> expand_asm_memory_blockage (void)
>>> {
>>>  rtx asm_op, clob;
>>> 
>>>  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>>                                 rtvec_alloc (0), rtvec_alloc (0),
>>>                                 rtvec_alloc (0), UNKNOWN_LOCATION);
>>>  MEM_VOLATILE_P (asm_op) = 1;
>>> 
>>>  clob = gen_rtx_SCRATCH (VOIDmode);
>>>  clob = gen_rtx_MEM (BLKmode, clob);
>>>  clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>> 
>>>  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>> }
>>> 
>>> 
>>> As the following? 
>>> 
>>> /* Generate asm volatile("" : : : “regno") for REGNO.   */
>>> 
>>> static void
>>> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
>>> {
>>>  rtx asm_op, clob;
>>> 
>>>  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>>                                 rtvec_alloc (0), rtvec_alloc (0),
>>>                                 rtvec_alloc (0), UNKNOWN_LOCATION);
>>>  MEM_VOLATILE_P (asm_op) = 1;
>>> 
>>>  clob = gen_rtx_REG (mode, regno);
>>>  clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>> 
>>>  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>> }
>>> 
>>> Is the above correct? 
>> 
>> Yeah, looks good.  You should be able to clobber all the registers you
>> want to clear in one asm.
>
> How to do this?

Rather than create:

  gen_rtvec (2, asm_op, clob)

with just the asm and one clobber, you can create:

  gen_rtvec (N + 1, asm_op, clob1, …, clobN)

with N clobbers side-by-side.  When N is variable (as it probably would
be in your case), it's easier to use rtvec_alloc and fill in the fields
using RTVEC_ELT.  E.g.:

  rtvec v = rtvec_alloc (N + 1);
  RTVEC_ELT (v, 0) = asm_op;
  RTVEC_ELT (v, 1) = clob1;
  …
  RTVEC_ELT (v, N) = clobN;

Thanks,
Richard
Qing Zhao Sept. 22, 2020, 6:25 p.m. UTC | #160
Hi, Hongjiu, 


> On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>> 
>>> 
>>> 
>>>> On Sep 21, 2020, at 2:11 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> 
>>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>>> But in cases where there is no underlying concept that can sensibly
>>>>>> be extracted out, it's OK if targets need to override the default
>>>>>> to get correct behaviour.
>>>>> 
>>>>> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
>>>>> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?
>>>> 
>>>> The point is that we're trying to implement this in a target-independent
>>>> way, like for most compiler features.  If the option doesn't work for a
>>>> particular target, then that's a bug like any other.  The most we can
>>>> reasonably do is:
>>>> 
>>>> (a) try to implement the feature in a way that uses all the appropriate
>>>>  pieces of compiler infrastructure (what we've been discussing)
>>>> 
>>>> (b) add tests for the feature that run on all targets
>>>> 
>>>> It's possible that bugs could slip through even then.  But that's true
>>>> of anything.
>>>> 
>>>> Targets like x86 support many subtargets, many different compilation
>>>> modes, and many different compiler features (register asms, various
>>>> fancy function attributes, etc.).  So even after the option is
>>>> committed and is supposedly supported on x86, it's possible that
>>>> we'll find a bug in the feature on x86 itself.
>>>> 
>>>> I don't think anyone would suggest that we should warn the user that the
>>>> option might be buggy on x86 (it's initial target).  But I also don't
>>>> see any reason for believing that a bug on x86 is less likely than
>>>> a bug on other targets.
>>> 
>>> Okay, then I will add the default implementation as you suggested. And also provide the overriden optimized implementation on X86. 
>> 
>> For X86, looks like that in addition to stack registers (st0 to st7), mask registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  should Not be zeroed too.
>> 
>> As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use them in middle end similar as “STACK_REG_P”?
> 
> No, those are x86-specific like you say.
> 
> Taking each in turn: what is the reason for not clearing mask registers?
> And what is the reason for not clearing mm0-7?  In each case, is it a
> performance or a correctness issue?

Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)

thanks.

Qing

> 
> Although the registers themselves are target-specific, the reason
> for excluding them might be something that could be exposed to
> target-independent code.
> 
> As a general comment, with at least three sets of excluded registers,
> the “all” in one of the suggested option values is beginning to feel
> like a misnomer.  (Maybe that has already been dropped though.)
> 
> Thanks,
> Richard
H.J. Lu Sept. 22, 2020, 6:35 p.m. UTC | #161
On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Hi, Hongjiu,
>
>
> > On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> >
> > Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> >>> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >>>
> >>>
> >>>
> >>>> On Sep 21, 2020, at 2:11 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> >>>>
> >>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> >>>>>> But in cases where there is no underlying concept that can sensibly
> >>>>>> be extracted out, it's OK if targets need to override the default
> >>>>>> to get correct behaviour.
> >>>>>
> >>>>> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
> >>>>> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?
> >>>>
> >>>> The point is that we're trying to implement this in a target-independent
> >>>> way, like for most compiler features.  If the option doesn't work for a
> >>>> particular target, then that's a bug like any other.  The most we can
> >>>> reasonably do is:
> >>>>
> >>>> (a) try to implement the feature in a way that uses all the appropriate
> >>>>  pieces of compiler infrastructure (what we've been discussing)
> >>>>
> >>>> (b) add tests for the feature that run on all targets
> >>>>
> >>>> It's possible that bugs could slip through even then.  But that's true
> >>>> of anything.
> >>>>
> >>>> Targets like x86 support many subtargets, many different compilation
> >>>> modes, and many different compiler features (register asms, various
> >>>> fancy function attributes, etc.).  So even after the option is
> >>>> committed and is supposedly supported on x86, it's possible that
> >>>> we'll find a bug in the feature on x86 itself.
> >>>>
> >>>> I don't think anyone would suggest that we should warn the user that the
> >>>> option might be buggy on x86 (it's initial target).  But I also don't
> >>>> see any reason for believing that a bug on x86 is less likely than
> >>>> a bug on other targets.
> >>>
> >>> Okay, then I will add the default implementation as you suggested. And also provide the overriden optimized implementation on X86.
> >>
> >> For X86, looks like that in addition to stack registers (st0 to st7), mask registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  should Not be zeroed too.
> >>
> >> As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use them in middle end similar as “STACK_REG_P”?
> >
> > No, those are x86-specific like you say.
> >
> > Taking each in turn: what is the reason for not clearing mask registers?
> > And what is the reason for not clearing mm0-7?  In each case, is it a
> > performance or a correctness issue?
>
> Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)
>

No particular reason.  You can add them.

H.J.
Qing Zhao Sept. 22, 2020, 7:34 p.m. UTC | #162
> On Sep 22, 2020, at 1:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> Hi, Hongjiu,
>> 
>> 
>>> On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Sep 21, 2020, at 2:11 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>> 
>>>>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>>>>> But in cases where there is no underlying concept that can sensibly
>>>>>>>> be extracted out, it's OK if targets need to override the default
>>>>>>>> to get correct behaviour.
>>>>>>> 
>>>>>>> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
>>>>>>> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?
>>>>>> 
>>>>>> The point is that we're trying to implement this in a target-independent
>>>>>> way, like for most compiler features.  If the option doesn't work for a
>>>>>> particular target, then that's a bug like any other.  The most we can
>>>>>> reasonably do is:
>>>>>> 
>>>>>> (a) try to implement the feature in a way that uses all the appropriate
>>>>>> pieces of compiler infrastructure (what we've been discussing)
>>>>>> 
>>>>>> (b) add tests for the feature that run on all targets
>>>>>> 
>>>>>> It's possible that bugs could slip through even then.  But that's true
>>>>>> of anything.
>>>>>> 
>>>>>> Targets like x86 support many subtargets, many different compilation
>>>>>> modes, and many different compiler features (register asms, various
>>>>>> fancy function attributes, etc.).  So even after the option is
>>>>>> committed and is supposedly supported on x86, it's possible that
>>>>>> we'll find a bug in the feature on x86 itself.
>>>>>> 
>>>>>> I don't think anyone would suggest that we should warn the user that the
>>>>>> option might be buggy on x86 (it's initial target).  But I also don't
>>>>>> see any reason for believing that a bug on x86 is less likely than
>>>>>> a bug on other targets.
>>>>> 
>>>>> Okay, then I will add the default implementation as you suggested. And also provide the overriden optimized implementation on X86.
>>>> 
>>>> For X86, looks like that in addition to stack registers (st0 to st7), mask registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  should Not be zeroed too.
>>>> 
>>>> As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use them in middle end similar as “STACK_REG_P”?
>>> 
>>> No, those are x86-specific like you say.
>>> 
>>> Taking each in turn: what is the reason for not clearing mask registers?
>>> And what is the reason for not clearing mm0-7?  In each case, is it a
>>> performance or a correctness issue?
>> 
>> Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)
>> 
> 
> No particular reason.  You can add them.

Okay, thanks.

Then I guess that the reason we didn’t zero mask registers and mm0-7 registers on x86  is mainly for the performance consideration.
There might not be too much benefit for mitigating ROP attack if we zero these additional registers, but we will got much more performance overhead.

What’s you opinion, Richard?

Qing




> 
> H.J.
Qing Zhao Sept. 22, 2020, 9:32 p.m. UTC | #163
> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> 
>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>> 
>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>> 
>>> Heh, it looks like that comment dates back to 1994. :-)
>>> 
>>> The comment is no longer correct though.  I wasn't around at the time,
>>> but I assume the comment was only locally true even then.
>>> 
>>> If what the comment said was true, then something like:
>>> 
>>> (define_insn "cld"
>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>> ""
>>> "cld"
>>> [(set_attr "length" "1")
>>>  (set_attr "length_immediate" "0")
>>>  (set_attr "modrm" "0")])
>>> 
>>> would invalidate the entire register file and so would require all values
>>> to be spilt to the stack around the CLD.
>> 
>> Okay, thanks for the info. 
>> then, what’s the current definition of UNSPEC_VOLATILE? 
> 
> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
> 
>  @code{unspec_volatile} is used for volatile operations and operations
>  that may trap; @code{unspec} is used for other operations.
> 
> which seems like a cyclic definition: volatile expressions are defined
> to be expressions that are volatile.
> 
> But IMO the semantics are that unspec_volatile patterns with a given
> set of inputs and outputs act for dataflow purposes like volatile asms
> with the same inputs and outputs.  The semantics of asm volatile are
> at least slightly more well-defined (if only by example); see extend.texi
> for details.  In particular:
> 
>  Note that the compiler can move even @code{volatile asm} instructions relative
>  to other code, including across jump instructions. For example, on many 
>  targets there is a system register that controls the rounding mode of 
>  floating-point operations. Setting it with a @code{volatile asm} statement,
>  as in the following PowerPC example, does not work reliably.
> 
>  @example
>  asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>  sum = x + y;
>  @end example
> 
>  The compiler may move the addition back before the @code{volatile asm}
>  statement. To make it work as expected, add an artificial dependency to
>  the @code{asm} by referencing a variable in the subsequent code, for
>  example:
> 
>  @example
>  asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>  sum = x + y;
>  @end example
> 
> which is very similar to the unspec_volatile case we're talking about.
> 
> To take an x86 example:
> 
>  void
>  f (char *x)
>  {
>    asm volatile ("");
>    x[0] = 0;
>    asm volatile ("");
>    x[1] = 0;
>    asm volatile ("");
>  }

If we change the above as the following: (but it might not correct on the asm format):

Void
F (char *x)
{
asm volatile (“x[0]”);
x[0] = 0;
asm volatile (“x[1]"); 
x[1] = 0;
 asm volatile ("”);
}

Will the moving and merging be blocked?


I found the following code in df-scan.c:

static void
df_uses_record (class df_collection_rec *collection_rec,
                rtx *loc, enum df_ref_type ref_type,
                basic_block bb, struct df_insn_info *insn_info,
                int flags)
{
…

    case ASM_OPERANDS:
    case UNSPEC_VOLATILE:
    case TRAP_IF:
    case ASM_INPUT:
…
        if (code == ASM_OPERANDS)
          {
            int j;

            for (j = 0; j < ASM_OPERANDS_INPUT_LENGTH (x); j++)
              df_uses_record (collection_rec, &ASM_OPERANDS_INPUT (x, j),
                              DF_REF_REG_USE, bb, insn_info, flags);
            return;
          }
        break;
…
}


Looks like ONLY the operands of  “ASM_OPERANDS” are recorded as USES in df analysis,  the operands of “UNSPEC_VOLATILE” are NOT. 

If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the data flow analysis should automatically pick up the operands of “ASM_OPERANDS”, and fix the data flow, right?


> 
> gets optimised to:
> 
>        xorl    %eax, %eax
>        movw    %ax, (%rdi)
> 
> with the two stores being merged.  The same thing is IMO valid for
> unspec_volatile.  In both cases, you would need some kind of memory
> clobber to prevent the move and merge from happening.
> 
>>> 
>>> There's also no dataflow reason why this couldn't be reordered to:
>>> 
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>      (const_int 0 [0])) "t10.c":11:1 -1
>>>   (nil))
>>> (insn 19 18 20 2 (unspec_volatile [
>>>          (reg:SI 1 dx)
>>>      ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>   (nil))
>>> (insn 17 … pop a register other than dx from the stack …)
>>> 
>> 
>> This is the place I don’t quite agree at this moment, maybe I still not quite understand the “UNSPEC_volatile”.
>> 
>> I checked several places in GCC that handle “UNSPEC_VOLATILE”, for example,  for the routine “can_move_insns_across” in gcc/df-problem.c:
>> 
>>      if (NONDEBUG_INSN_P (insn))
>>        {
>>          if (volatile_insn_p (PATTERN (insn)))
>>            return false;
>> 
>> From my understanding of reading the code, when an insn is UNSPEC_VOLATILE, another insn will NOT be able to move across it. 
>> 
>> Then for the above example, the insn 17 should Not be moved across insn 19 either.
>> 
>> Let me know if I miss anything important. 
> 
> The above is conservatively correct.  But not all passes do it.
> E.g. combine does have a similar approach:
> 
>  /* If INSN contains volatile references (specifically volatile MEMs),
>     we cannot combine across any other volatile references.
>     Even if INSN doesn't contain volatile references, any intervening
>     volatile insn might affect machine state.  */
> 
>  is_volatile_p = volatile_refs_p (PATTERN (insn))
>    ? volatile_refs_p
>    : volatile_insn_p;
> 
> And like you say, the passes that use can_move_insns_across will be
> conservative too.  But not many passes use that function.

Okay, I see. 
> 
> Passes like fwprop.c, postreload-gcse.c and ree.c do not (AFAIK) worry
> about volatile asms or unspec_volatiles, and can move code across them.
> And that's kind-of inevitable.  Having an “everything barrier” makes
> life very hard for global optimisation.

Okay, so, it’s intentionally not making UNSPEC_VOLATILE as an “everything barrier”? 

(But I do feel that the design for UNSPEC_volatile is not clean)

> 
>>> The asm for (b) goes before the instruction, so we'd have:
>>> 
>>> (insn 17 … new asm …)
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>      (const_int 0 [0])) "t10.c":11:1 -1
>>>   (nil))
>>> (insn 19 … return …)
>>> 
>>> But something has to tell the df machinery that the value of edx
>>> matters on return from the function, otherwise insn 18 could be
>>> deleted as dead.  Adding edx to EPILOGUE_USES provides that information
>>> and stops the instruction from being deleted.
>> 
>> 
>> In the above, insn 17 will be something like:
>> 
>> (insn 17 ...(unspec_volatile [  (reg:SI 1 dx)
>>    ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1 
>> (nil))
> 
> In the example above, insn 17 would be an asm that clobbers dx
> (instead of using dx).
> 
>> So, the reg edx is marked as “UNSPEC_volatile” already, that should mean the value of edx matters on return from the function already, my understanding is that df should automatically pick up the “UNSPEC_VOLATILE” insn and it’s operands.   “UNSPEC_VOLATILE” insn should serve the same purpose as putting “edx” to EPILOGUE_USES. 
>> 
>> Do I miss anything here?
> 
> The point is that any use of dx at insn 17 comes before the definition
> in insn 18.  So a use in insn 17 would keep alive any store to dx that
> happend before insn 17.  But it would not keep the store in insn 18 live,
> since insn 18 executes later.

Okay, I see. 
> 
>>>>> I don't think we need a new target-specific unspec_volatile code to do (b).
>>>>> We can just use an automatically-generated volatile asm to clobber the
>>>>> registers first.  See e.g. how expand_asm_memory_blockage handles memory
>>>>> scheduling barriers.
>>>> /* Generate asm volatile("" : : : "memory") as the memory blockage.  */
>>>> 
>>>> static void
>>>> expand_asm_memory_blockage (void)
>>>> {
>>>> rtx asm_op, clob;
>>>> 
>>>> asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>>>                                rtvec_alloc (0), rtvec_alloc (0),
>>>>                                rtvec_alloc (0), UNKNOWN_LOCATION);
>>>> MEM_VOLATILE_P (asm_op) = 1;
>>>> 
>>>> clob = gen_rtx_SCRATCH (VOIDmode);
>>>> clob = gen_rtx_MEM (BLKmode, clob);
>>>> clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>>> 
>>>> emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>>> }
>>>> 
>>>> 
>>>> As the following? 
>>>> 
>>>> /* Generate asm volatile("" : : : “regno") for REGNO.   */
>>>> 
>>>> static void
>>>> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
>>>> {
>>>> rtx asm_op, clob;
>>>> 
>>>> asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>>>                                rtvec_alloc (0), rtvec_alloc (0),
>>>>                                rtvec_alloc (0), UNKNOWN_LOCATION);
>>>> MEM_VOLATILE_P (asm_op) = 1;
>>>> 
>>>> clob = gen_rtx_REG (mode, regno);
>>>> clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>>> 
>>>> emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>>> }
>>>> 
>>>> Is the above correct? 
>>> 
>>> Yeah, looks good.  You should be able to clobber all the registers you
>>> want to clear in one asm.
>> 
>> How to do this?
> 
> Rather than create:
> 
>  gen_rtvec (2, asm_op, clob)
> 
> with just the asm and one clobber, you can create:
> 
>  gen_rtvec (N + 1, asm_op, clob1, …, clobN)
> 
> with N clobbers side-by-side.  When N is variable (as it probably would
> be in your case), it's easier to use rtvec_alloc and fill in the fields
> using RTVEC_ELT.  E.g.:
> 
>  rtvec v = rtvec_alloc (N + 1);
>  RTVEC_ELT (v, 0) = asm_op;
>  RTVEC_ELT (v, 1) = clob1;
>  …
>  RTVEC_ELT (v, N) = clobN;

Thanks.

Qing
> 
> Thanks,
> Richard
Segher Boessenkool Sept. 22, 2020, 10:37 p.m. UTC | #164
Hi!

On Tue, Sep 22, 2020 at 06:06:30PM +0100, Richard Sandiford wrote:
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> > Okay, thanks for the info. 
> > then, what’s the current definition of UNSPEC_VOLATILE? 
> 
> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
> 
>   @code{unspec_volatile} is used for volatile operations and operations
>   that may trap; @code{unspec} is used for other operations.
> 
> which seems like a cyclic definition: volatile expressions are defined
> to be expressions that are volatile.

volatile_insn_p returns true for unspec_volatile (and all other volatile
things).  Unfortunately the comment on this function is just as confused
as pretty much everything else :-/

> But IMO the semantics are that unspec_volatile patterns with a given
> set of inputs and outputs act for dataflow purposes like volatile asms
> with the same inputs and outputs.  The semantics of asm volatile are
> at least slightly more well-defined (if only by example); see extend.texi
> for details.  In particular:
> 
>   Note that the compiler can move even @code{volatile asm} instructions relative
>   to other code, including across jump instructions. For example, on many 
>   targets there is a system register that controls the rounding mode of 
>   floating-point operations. Setting it with a @code{volatile asm} statement,
>   as in the following PowerPC example, does not work reliably.
> 
>   @example
>   asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>   sum = x + y;
>   @end example
> 
>   The compiler may move the addition back before the @code{volatile asm}
>   statement. To make it work as expected, add an artificial dependency to
>   the @code{asm} by referencing a variable in the subsequent code, for
>   example:
> 
>   @example
>   asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>   sum = x + y;
>   @end example
> 
> which is very similar to the unspec_volatile case we're talking about.

So just like volatile memory accesses, they have an (unknown) side
effect, which means they have to execute on the real machine as on the
abstract machine (wrt sequence points).  All side effects have to happen
exactly as often as proscribed, and in the same order.  Just like
volatile asm, too.

And there is no magic to it, there are no other effects.

> To take an x86 example:
> 
>   void
>   f (char *x)
>   {
>     asm volatile ("");
>     x[0] = 0;
>     asm volatile ("");
>     x[1] = 0;
>     asm volatile ("");
>   }
> 
> gets optimised to:
> 
>         xorl    %eax, %eax
>         movw    %ax, (%rdi)

(If you use "#" or "#smth" you can see those in the generated asm --
completely empty asm is helpfully (uh...) not printed.)

> with the two stores being merged.  The same thing is IMO valid for
> unspec_volatile.  In both cases, you would need some kind of memory
> clobber to prevent the move and merge from happening.

Even then, x[] could be optimised away completely (with whole program
optimisation, or something).  The only way to really prevent the
compiler from optimising memory accesses is to make it not see the
details (with an asm or an unspec, for example).

> The above is conservatively correct.  But not all passes do it.
> E.g. combine does have a similar approach:
> 
>   /* If INSN contains volatile references (specifically volatile MEMs),
>      we cannot combine across any other volatile references.

And this is correct, and the *minimum* to do even (this could change the
order of the side effects, depending how combine places the resulting
insns in I2 and I3).

>      Even if INSN doesn't contain volatile references, any intervening
>      volatile insn might affect machine state.  */

Confusingly stated, but essentially correct (it is possible we place
the volatile at I2, and everything would still be sequenced correctly,
but combine does not guarantee that).

>   is_volatile_p = volatile_refs_p (PATTERN (insn))
>     ? volatile_refs_p
>     : volatile_insn_p;

Too much subtlety in there, heh.


Segher
Richard Sandiford Sept. 23, 2020, 10:43 a.m. UTC | #165
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 22, 2020, at 1:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>>>> On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> Taking each in turn: what is the reason for not clearing mask registers?
>>>> And what is the reason for not clearing mm0-7?  In each case, is it a
>>>> performance or a correctness issue?
>>> 
>>> Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)
>>> 
>> 
>> No particular reason.  You can add them.
>
> Okay, thanks.
>
> Then I guess that the reason we didn’t zero mask registers and mm0-7 registers on x86  is mainly for the performance consideration.
> There might not be too much benefit for mitigating ROP attack if we zero these additional registers, but we will got much more performance overhead.
>
> What’s you opinion, Richard?

Dropping them is fine with me FWIW.  That seems like a natural use
for the new hook: drop zeroing that isn't actively wrong, but isn't
likely to be useful either.

Thanks,
Richard
Richard Sandiford Sept. 23, 2020, 11:05 a.m. UTC | #166
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>> 
>>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>> 
>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>>> 
>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>> 
>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>> but I assume the comment was only locally true even then.
>>>> 
>>>> If what the comment said was true, then something like:
>>>> 
>>>> (define_insn "cld"
>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>> ""
>>>> "cld"
>>>> [(set_attr "length" "1")
>>>>  (set_attr "length_immediate" "0")
>>>>  (set_attr "modrm" "0")])
>>>> 
>>>> would invalidate the entire register file and so would require all values
>>>> to be spilt to the stack around the CLD.
>>> 
>>> Okay, thanks for the info. 
>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>> 
>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>> 
>>  @code{unspec_volatile} is used for volatile operations and operations
>>  that may trap; @code{unspec} is used for other operations.
>> 
>> which seems like a cyclic definition: volatile expressions are defined
>> to be expressions that are volatile.
>> 
>> But IMO the semantics are that unspec_volatile patterns with a given
>> set of inputs and outputs act for dataflow purposes like volatile asms
>> with the same inputs and outputs.  The semantics of asm volatile are
>> at least slightly more well-defined (if only by example); see extend.texi
>> for details.  In particular:
>> 
>>  Note that the compiler can move even @code{volatile asm} instructions relative
>>  to other code, including across jump instructions. For example, on many 
>>  targets there is a system register that controls the rounding mode of 
>>  floating-point operations. Setting it with a @code{volatile asm} statement,
>>  as in the following PowerPC example, does not work reliably.
>> 
>>  @example
>>  asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>  sum = x + y;
>>  @end example
>> 
>>  The compiler may move the addition back before the @code{volatile asm}
>>  statement. To make it work as expected, add an artificial dependency to
>>  the @code{asm} by referencing a variable in the subsequent code, for
>>  example:
>> 
>>  @example
>>  asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>  sum = x + y;
>>  @end example
>> 
>> which is very similar to the unspec_volatile case we're talking about.
>> 
>> To take an x86 example:
>> 
>>  void
>>  f (char *x)
>>  {
>>    asm volatile ("");
>>    x[0] = 0;
>>    asm volatile ("");
>>    x[1] = 0;
>>    asm volatile ("");
>>  }
>
> If we change the above as the following: (but it might not correct on the asm format):
>
> Void
> F (char *x)
> {
> asm volatile (“x[0]”);
> x[0] = 0;
> asm volatile (“x[1]"); 
> x[1] = 0;
>  asm volatile ("”);
> }
>
> Will the moving and merging be blocked?

That would stop assignments moving up, but it wouldn't stop x[0] moving
down across the x[1] asm.  Using:

  asm volatile ("" ::: "memory");

would prevent moves in both directions, which was what I meant in my
later comment about memory clobbers.

In each case, the same would be true for unspec_volatile.

> I found the following code in df-scan.c:
>
> static void
> df_uses_record (class df_collection_rec *collection_rec,
>                 rtx *loc, enum df_ref_type ref_type,
>                 basic_block bb, struct df_insn_info *insn_info,
>                 int flags)
> {
> …
>
>     case ASM_OPERANDS:
>     case UNSPEC_VOLATILE:
>     case TRAP_IF:
>     case ASM_INPUT:
> …
>         if (code == ASM_OPERANDS)
>           {
>             int j;
>
>             for (j = 0; j < ASM_OPERANDS_INPUT_LENGTH (x); j++)
>               df_uses_record (collection_rec, &ASM_OPERANDS_INPUT (x, j),
>                               DF_REF_REG_USE, bb, insn_info, flags);
>             return;
>           }
>         break;
> …
> }
>
>
> Looks like ONLY the operands of  “ASM_OPERANDS” are recorded as USES in df analysis,  the operands of “UNSPEC_VOLATILE” are NOT. 

The recursion code after the switch statement handles the operands of
unspec_volatile.

> If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the data flow analysis should automatically pick up the operands of “ASM_OPERANDS”, and fix the data flow, right?

Using a volatile asm or an unspec_volatile would be equally correct.
The reason for preferring a volatile asm is that it doesn't require
target-specific .md patterns.

Of course, as mentioned before, “correct” in this case is: make a good
but not foolproof attempt at trying to prevent later passes from moving
the zeroing instructions further away from the return instruction
(or, equivalently, moving other instructions closer to the return
instruction).  Remember that we arrived here from a discussion about
whether the volatile insns would be enough to prevent machine_reorg and
other passes from moving instructions around (modulo bugs in those passes).
My position was that the volatile insns would help, but that we might
still find cases where a machine_reorg makes a behaviourally-correct
transformation that we don't want.

>> Passes like fwprop.c, postreload-gcse.c and ree.c do not (AFAIK) worry
>> about volatile asms or unspec_volatiles, and can move code across them.
>> And that's kind-of inevitable.  Having an “everything barrier” makes
>> life very hard for global optimisation.
>
> Okay, so, it’s intentionally not making UNSPEC_VOLATILE as an “everything barrier”? 

Yeah.

> (But I do feel that the design for UNSPEC_volatile is not clean)

Agreed.  But I think that's partly because what it's trying to achieve
isn't clean either.  It's a catch-all for “something is happening,
but we're not saying what”.  And not saying what is itself unclean. ;-)

Thanks,
Richard
Qing Zhao Sept. 23, 2020, 1:54 p.m. UTC | #167
> On Sep 23, 2020, at 5:43 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> On Sep 22, 2020, at 1:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>>>>> On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>> Taking each in turn: what is the reason for not clearing mask registers?
>>>>> And what is the reason for not clearing mm0-7?  In each case, is it a
>>>>> performance or a correctness issue?
>>>> 
>>>> Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)
>>>> 
>>> 
>>> No particular reason.  You can add them.
>> 
>> Okay, thanks.
>> 
>> Then I guess that the reason we didn’t zero mask registers and mm0-7 registers on x86  is mainly for the performance consideration.
>> There might not be too much benefit for mitigating ROP attack if we zero these additional registers, but we will got much more performance overhead.
>> 
>> What’s you opinion, Richard?
> 
> Dropping them is fine with me FWIW.  That seems like a natural use
> for the new hook: drop zeroing that isn't actively wrong, but isn't
> likely to be useful either.

Okay, I will add a  new hook for this purpose.

Qing
> 
> Thanks,
> Richard
Qing Zhao Sept. 23, 2020, 2:14 p.m. UTC | #168
> On Sep 23, 2020, at 6:05 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>> 
>>>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>>> 
>>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>>>> 
>>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>>> 
>>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>>> but I assume the comment was only locally true even then.
>>>>> 
>>>>> If what the comment said was true, then something like:
>>>>> 
>>>>> (define_insn "cld"
>>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>>> ""
>>>>> "cld"
>>>>> [(set_attr "length" "1")
>>>>> (set_attr "length_immediate" "0")
>>>>> (set_attr "modrm" "0")])
>>>>> 
>>>>> would invalidate the entire register file and so would require all values
>>>>> to be spilt to the stack around the CLD.
>>>> 
>>>> Okay, thanks for the info. 
>>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>>> 
>>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>>> 
>>> @code{unspec_volatile} is used for volatile operations and operations
>>> that may trap; @code{unspec} is used for other operations.
>>> 
>>> which seems like a cyclic definition: volatile expressions are defined
>>> to be expressions that are volatile.
>>> 
>>> But IMO the semantics are that unspec_volatile patterns with a given
>>> set of inputs and outputs act for dataflow purposes like volatile asms
>>> with the same inputs and outputs.  The semantics of asm volatile are
>>> at least slightly more well-defined (if only by example); see extend.texi
>>> for details.  In particular:
>>> 
>>> Note that the compiler can move even @code{volatile asm} instructions relative
>>> to other code, including across jump instructions. For example, on many 
>>> targets there is a system register that controls the rounding mode of 
>>> floating-point operations. Setting it with a @code{volatile asm} statement,
>>> as in the following PowerPC example, does not work reliably.
>>> 
>>> @example
>>> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>> sum = x + y;
>>> @end example
>>> 
>>> The compiler may move the addition back before the @code{volatile asm}
>>> statement. To make it work as expected, add an artificial dependency to
>>> the @code{asm} by referencing a variable in the subsequent code, for
>>> example:
>>> 
>>> @example
>>> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>> sum = x + y;
>>> @end example
>>> 
>>> which is very similar to the unspec_volatile case we're talking about.
>>> 
>>> To take an x86 example:
>>> 
>>> void
>>> f (char *x)
>>> {
>>>   asm volatile ("");
>>>   x[0] = 0;
>>>   asm volatile ("");
>>>   x[1] = 0;
>>>   asm volatile ("");
>>> }
>> 
>> If we change the above as the following: (but it might not correct on the asm format):
>> 
>> Void
>> F (char *x)
>> {
>> asm volatile (“x[0]”);
>> x[0] = 0;
>> asm volatile (“x[1]"); 
>> x[1] = 0;
>> asm volatile ("”);
>> }
>> 
>> Will the moving and merging be blocked?
> 
> That would stop assignments moving up, but it wouldn't stop x[0] moving
> down across the x[1] asm.  Using:
> 
>  asm volatile ("" ::: "memory");
> 
> would prevent moves in both directions, which was what I meant in my
> later comment about memory clobbers.
> 
> In each case, the same would be true for unspec_volatile.

So, is the following good enough:

asm volatile (reg1, reg2, … regN, memory)
mov reg1, 0
mov reg2, 0
...
mov regN,0
asm volatile (reg1, reg2,… regN, memory)
return


I.e, just add one “asm volatile” insn whose operands include all registers and memory BEFORE and AFTER the whole zeroing sequence.

Or, we have to add one “asm volatile” insn before and after each “mov” insn? 


> 
>> I found the following code in df-scan.c:
>> 
>> static void
>> df_uses_record (class df_collection_rec *collection_rec,
>>                rtx *loc, enum df_ref_type ref_type,
>>                basic_block bb, struct df_insn_info *insn_info,
>>                int flags)
>> {
>> …
>> 
>>    case ASM_OPERANDS:
>>    case UNSPEC_VOLATILE:
>>    case TRAP_IF:
>>    case ASM_INPUT:
>> …
>>        if (code == ASM_OPERANDS)
>>          {
>>            int j;
>> 
>>            for (j = 0; j < ASM_OPERANDS_INPUT_LENGTH (x); j++)
>>              df_uses_record (collection_rec, &ASM_OPERANDS_INPUT (x, j),
>>                              DF_REF_REG_USE, bb, insn_info, flags);
>>            return;
>>          }
>>        break;
>> …
>> }
>> 
>> 
>> Looks like ONLY the operands of  “ASM_OPERANDS” are recorded as USES in df analysis,  the operands of “UNSPEC_VOLATILE” are NOT. 
> 
> The recursion code after the switch statement handles the operands of
> unspec_volatile.

Okay, I see. 
So, these two are actually equal to each other. 

> 
>> If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the data flow analysis should automatically pick up the operands of “ASM_OPERANDS”, and fix the data flow, right?
> 
> Using a volatile asm or an unspec_volatile would be equally correct.
> The reason for preferring a volatile asm is that it doesn't require
> target-specific .md patterns.
Okay.

Then is there any benefit to use “UNSPEC_volatile” over “volatile asm”?
> 
> Of course, as mentioned before, “correct” in this case is: make a good
> but not foolproof attempt at trying to prevent later passes from moving
> the zeroing instructions further away from the return instruction
> (or, equivalently, moving other instructions closer to the return
> instruction).  Remember that we arrived here from a discussion about
> whether the volatile insns would be enough to prevent machine_reorg and
> other passes from moving instructions around (modulo bugs in those passes).
> My position was that the volatile insns would help, but that we might
> still find cases where a machine_reorg makes a behaviourally-correct
> transformation that we don't want.
So, you mean after adding “volatile asm” or “UNSPEC_volatile”,  although 
most of the insn movement can be prevented, there might still be small possibitly 
Some unwanted transformation might happen?

> 
>>> Passes like fwprop.c, postreload-gcse.c and ree.c do not (AFAIK) worry
>>> about volatile asms or unspec_volatiles, and can move code across them.
>>> And that's kind-of inevitable.  Having an “everything barrier” makes
>>> life very hard for global optimisation.
>> 
>> Okay, so, it’s intentionally not making UNSPEC_VOLATILE as an “everything barrier”? 
> 
> Yeah.
> 
>> (But I do feel that the design for UNSPEC_volatile is not clean)
> 
> Agreed.  But I think that's partly because what it's trying to achieve
> isn't clean either.  It's a catch-all for “something is happening,
> but we're not saying what”.  And not saying what is itself unclean. ;-)

thanks.

Qing
> 
> Thanks,
> Richard
Richard Sandiford Sept. 23, 2020, 2:22 p.m. UTC | #169
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 23, 2020, at 5:43 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> On Sep 22, 2020, at 1:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>>>>>> On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>> Taking each in turn: what is the reason for not clearing mask registers?
>>>>>> And what is the reason for not clearing mm0-7?  In each case, is it a
>>>>>> performance or a correctness issue?
>>>>> 
>>>>> Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)
>>>>> 
>>>> 
>>>> No particular reason.  You can add them.
>>> 
>>> Okay, thanks.
>>> 
>>> Then I guess that the reason we didn’t zero mask registers and mm0-7 registers on x86  is mainly for the performance consideration.
>>> There might not be too much benefit for mitigating ROP attack if we zero these additional registers, but we will got much more performance overhead.
>>> 
>>> What’s you opinion, Richard?
>> 
>> Dropping them is fine with me FWIW.  That seems like a natural use
>> for the new hook: drop zeroing that isn't actively wrong, but isn't
>> likely to be useful either.
>
> Okay, I will add a  new hook for this purpose.

It doesn't need to be a new hook.  The one I mentioned before
would enough:

> The kind of target hook interface I was thinking of was:
>
>   HARD_REG_SET TARGET_EMIT_MOVE_ZEROS (const HARD_REG_SET &regs)
>
> which:
>
> - emits zeroing instructions for some target-specific subset of REGS
>
> - returns the set of registers that were actually cleared

Not clearing mm0-7 and k0-7 would come under the first bullet point.

Thanks,
Richard
Qing Zhao Sept. 23, 2020, 2:28 p.m. UTC | #170
> On Sep 22, 2020, at 5:37 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Tue, Sep 22, 2020 at 06:06:30PM +0100, Richard Sandiford wrote:
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> Okay, thanks for the info. 
>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>> 
>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>> 
>>  @code{unspec_volatile} is used for volatile operations and operations
>>  that may trap; @code{unspec} is used for other operations.
>> 
>> which seems like a cyclic definition: volatile expressions are defined
>> to be expressions that are volatile.
> 
> volatile_insn_p returns true for unspec_volatile (and all other volatile
> things).  Unfortunately the comment on this function is just as confused
> as pretty much everything else :-/
> 
>> But IMO the semantics are that unspec_volatile patterns with a given
>> set of inputs and outputs act for dataflow purposes like volatile asms
>> with the same inputs and outputs.  The semantics of asm volatile are
>> at least slightly more well-defined (if only by example); see extend.texi
>> for details.  In particular:
>> 
>>  Note that the compiler can move even @code{volatile asm} instructions relative
>>  to other code, including across jump instructions. For example, on many 
>>  targets there is a system register that controls the rounding mode of 
>>  floating-point operations. Setting it with a @code{volatile asm} statement,
>>  as in the following PowerPC example, does not work reliably.
>> 
>>  @example
>>  asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>  sum = x + y;
>>  @end example
>> 
>>  The compiler may move the addition back before the @code{volatile asm}
>>  statement. To make it work as expected, add an artificial dependency to
>>  the @code{asm} by referencing a variable in the subsequent code, for
>>  example:
>> 
>>  @example
>>  asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>  sum = x + y;
>>  @end example
>> 
>> which is very similar to the unspec_volatile case we're talking about.
> 
> So just like volatile memory accesses, they have an (unknown) side
> effect, which means they have to execute on the real machine as on the
> abstract machine (wrt sequence points).  All side effects have to happen
> exactly as often as proscribed, and in the same order.  Just like
> volatile asm, too.
Don’t quite understand the above, what do you mean by “they have to 
execute on the real machine as on the abstract machine”?

> 
> And there is no magic to it, there are no other effects.
> 
>> To take an x86 example:
>> 
>>  void
>>  f (char *x)
>>  {
>>    asm volatile ("");
>>    x[0] = 0;
>>    asm volatile ("");
>>    x[1] = 0;
>>    asm volatile ("");
>>  }
>> 
>> gets optimised to:
>> 
>>        xorl    %eax, %eax
>>        movw    %ax, (%rdi)
> 
> (If you use "#" or "#smth" you can see those in the generated asm --
> completely empty asm is helpfully (uh...) not printed.)

Can you explain this in more details?

> 
>> with the two stores being merged.  The same thing is IMO valid for
>> unspec_volatile.  In both cases, you would need some kind of memory
>> clobber to prevent the move and merge from happening.
> 
> Even then, x[] could be optimised away completely (with whole program
> optimisation, or something).  The only way to really prevent the
> compiler from optimising memory accesses is to make it not see the
> details (with an asm or an unspec, for example).
You mean with a asm volatile (“” “memory”)?

> 
>> The above is conservatively correct.  But not all passes do it.
>> E.g. combine does have a similar approach:
>> 
>>  /* If INSN contains volatile references (specifically volatile MEMs),
>>     we cannot combine across any other volatile references.
> 
> And this is correct, and the *minimum* to do even (this could change the
> order of the side effects, depending how combine places the resulting
> insns in I2 and I3).

Please clarify what “L2 and L3” are?
> 
>>     Even if INSN doesn't contain volatile references, any intervening
>>     volatile insn might affect machine state.  */
> 
> Confusingly stated, but essentially correct (it is possible we place
> the volatile at I2, and everything would still be sequenced correctly,
> but combine does not guarantee that).

thanks.

Qing
> 
>>  is_volatile_p = volatile_refs_p (PATTERN (insn))
>>    ? volatile_refs_p
>>    : volatile_insn_p;
> 
> Too much subtlety in there, heh.
> 
> 
> Segher
Richard Sandiford Sept. 23, 2020, 2:32 p.m. UTC | #171
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 23, 2020, at 6:05 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>>> 
>>>>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>>>> 
>>>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>>>>> 
>>>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>>>> 
>>>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>>>> but I assume the comment was only locally true even then.
>>>>>> 
>>>>>> If what the comment said was true, then something like:
>>>>>> 
>>>>>> (define_insn "cld"
>>>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>>>> ""
>>>>>> "cld"
>>>>>> [(set_attr "length" "1")
>>>>>> (set_attr "length_immediate" "0")
>>>>>> (set_attr "modrm" "0")])
>>>>>> 
>>>>>> would invalidate the entire register file and so would require all values
>>>>>> to be spilt to the stack around the CLD.
>>>>> 
>>>>> Okay, thanks for the info. 
>>>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>>>> 
>>>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>>>> 
>>>> @code{unspec_volatile} is used for volatile operations and operations
>>>> that may trap; @code{unspec} is used for other operations.
>>>> 
>>>> which seems like a cyclic definition: volatile expressions are defined
>>>> to be expressions that are volatile.
>>>> 
>>>> But IMO the semantics are that unspec_volatile patterns with a given
>>>> set of inputs and outputs act for dataflow purposes like volatile asms
>>>> with the same inputs and outputs.  The semantics of asm volatile are
>>>> at least slightly more well-defined (if only by example); see extend.texi
>>>> for details.  In particular:
>>>> 
>>>> Note that the compiler can move even @code{volatile asm} instructions relative
>>>> to other code, including across jump instructions. For example, on many 
>>>> targets there is a system register that controls the rounding mode of 
>>>> floating-point operations. Setting it with a @code{volatile asm} statement,
>>>> as in the following PowerPC example, does not work reliably.
>>>> 
>>>> @example
>>>> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>>> sum = x + y;
>>>> @end example
>>>> 
>>>> The compiler may move the addition back before the @code{volatile asm}
>>>> statement. To make it work as expected, add an artificial dependency to
>>>> the @code{asm} by referencing a variable in the subsequent code, for
>>>> example:
>>>> 
>>>> @example
>>>> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>>> sum = x + y;
>>>> @end example
>>>> 
>>>> which is very similar to the unspec_volatile case we're talking about.
>>>> 
>>>> To take an x86 example:
>>>> 
>>>> void
>>>> f (char *x)
>>>> {
>>>>   asm volatile ("");
>>>>   x[0] = 0;
>>>>   asm volatile ("");
>>>>   x[1] = 0;
>>>>   asm volatile ("");
>>>> }
>>> 
>>> If we change the above as the following: (but it might not correct on the asm format):
>>> 
>>> Void
>>> F (char *x)
>>> {
>>> asm volatile (“x[0]”);
>>> x[0] = 0;
>>> asm volatile (“x[1]"); 
>>> x[1] = 0;
>>> asm volatile ("”);
>>> }
>>> 
>>> Will the moving and merging be blocked?
>> 
>> That would stop assignments moving up, but it wouldn't stop x[0] moving
>> down across the x[1] asm.  Using:
>> 
>>  asm volatile ("" ::: "memory");
>> 
>> would prevent moves in both directions, which was what I meant in my
>> later comment about memory clobbers.
>> 
>> In each case, the same would be true for unspec_volatile.
>
> So, is the following good enough:
>
> asm volatile (reg1, reg2, … regN, memory)
> mov reg1, 0
> mov reg2, 0
> ...
> mov regN,0
> asm volatile (reg1, reg2,… regN, memory)
> return
>
>
> I.e, just add one “asm volatile” insn whose operands include all registers and memory BEFORE and AFTER the whole zeroing sequence.

It isn't clear from your syntax whether the asm volatile arguments
are uses or clobbers.  The idea was:

- There would be an asm volatile before the moves that clobbers (but does
  not use) (mem:BLK (scratch)) and the zeroed registers.

- EPILOGUE_USES would make the zeroed registers live after the return.

> Or, we have to add one “asm volatile” insn before and after each “mov” insn? 

No, the idea with the multiple clobber thing was to have a single asm.

>>> If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the data flow analysis should automatically pick up the operands of “ASM_OPERANDS”, and fix the data flow, right?
>> 
>> Using a volatile asm or an unspec_volatile would be equally correct.
>> The reason for preferring a volatile asm is that it doesn't require
>> target-specific .md patterns.
> Okay.
>
> Then is there any benefit to use “UNSPEC_volatile” over “volatile asm”?

In general, yes: you can use the full .md functionality with
unspec_volatiles, such as splitting insns, adding match_scratches
with different clobber requirements, writing custom output code,
setting attributes, etc.

But there isn't an advantage to using unspec_volatile in this case,
where the instruction doesn't actually do anything.

>> Of course, as mentioned before, “correct” in this case is: make a good
>> but not foolproof attempt at trying to prevent later passes from moving
>> the zeroing instructions further away from the return instruction
>> (or, equivalently, moving other instructions closer to the return
>> instruction).  Remember that we arrived here from a discussion about
>> whether the volatile insns would be enough to prevent machine_reorg and
>> other passes from moving instructions around (modulo bugs in those passes).
>> My position was that the volatile insns would help, but that we might
>> still find cases where a machine_reorg makes a behaviourally-correct
>> transformation that we don't want.
> So, you mean after adding “volatile asm” or “UNSPEC_volatile”,  although 
> most of the insn movement can be prevented, there might still be small possibitly 
> Some unwanted transformation might happen?

I wouldn't want to quantify the possibility.  The point is just that the
possibility exists.  The unspec_volatile does not prevent movement of
unrelated non-volatile operations.

Thanks,
Richard
Qing Zhao Sept. 23, 2020, 2:35 p.m. UTC | #172
> On Sep 23, 2020, at 9:22 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 23, 2020, at 5:43 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> On Sep 22, 2020, at 1:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>>>>>>> On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>>> Taking each in turn: what is the reason for not clearing mask registers?
>>>>>>> And what is the reason for not clearing mm0-7?  In each case, is it a
>>>>>>> performance or a correctness issue?
>>>>>> 
>>>>>> Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)
>>>>>> 
>>>>> 
>>>>> No particular reason.  You can add them.
>>>> 
>>>> Okay, thanks.
>>>> 
>>>> Then I guess that the reason we didn’t zero mask registers and mm0-7 registers on x86  is mainly for the performance consideration.
>>>> There might not be too much benefit for mitigating ROP attack if we zero these additional registers, but we will got much more performance overhead.
>>>> 
>>>> What’s you opinion, Richard?
>>> 
>>> Dropping them is fine with me FWIW.  That seems like a natural use
>>> for the new hook: drop zeroing that isn't actively wrong, but isn't
>>> likely to be useful either.
>> 
>> Okay, I will add a  new hook for this purpose.
> 
> It doesn't need to be a new hook.  The one I mentioned before
> would enough:
> 
>> The kind of target hook interface I was thinking of was:
>> 
>>  HARD_REG_SET TARGET_EMIT_MOVE_ZEROS (const HARD_REG_SET &regs)
>> 
>> which:
>> 
>> - emits zeroing instructions for some target-specific subset of REGS
>> 
>> - returns the set of registers that were actually cleared
> 
> Not clearing mm0-7 and k0-7 would come under the first bullet point.
This makes sense.

However, how about the second bullet point:

- returns the set of registers that were actually cleared

Should we delete mm0-7 and k0-7 from the return set even though they are not zeroed by the target?

Qing


> 
> Thanks,
> Richard
Richard Sandiford Sept. 23, 2020, 2:40 p.m. UTC | #173
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> Dropping them is fine with me FWIW.  That seems like a natural use
>>>> for the new hook: drop zeroing that isn't actively wrong, but isn't
>>>> likely to be useful either.
>>> 
>>> Okay, I will add a  new hook for this purpose.
>> 
>> It doesn't need to be a new hook.  The one I mentioned before
>> would enough:
>> 
>>> The kind of target hook interface I was thinking of was:
>>> 
>>>  HARD_REG_SET TARGET_EMIT_MOVE_ZEROS (const HARD_REG_SET &regs)
>>> 
>>> which:
>>> 
>>> - emits zeroing instructions for some target-specific subset of REGS
>>> 
>>> - returns the set of registers that were actually cleared
>> 
>> Not clearing mm0-7 and k0-7 would come under the first bullet point.
> This makes sense.
>
> However, how about the second bullet point:
>
> - returns the set of registers that were actually cleared
>
> Should we delete mm0-7 and k0-7 from the return set even though they are not zeroed by the target?

Yes, the point of the return value is to tell the caller what the
hook actually did.  If the hook didn't clear mm0-7 then the returned
set shouldn't include mm0-7.

Thanks,
Richard
Qing Zhao Sept. 23, 2020, 2:48 p.m. UTC | #174
> On Sep 23, 2020, at 9:32 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> On Sep 23, 2020, at 6:05 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>>>> 
>>>>>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>>>>> 
>>>>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>>>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>>>>>> 
>>>>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>>>>> 
>>>>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>>>>> but I assume the comment was only locally true even then.
>>>>>>> 
>>>>>>> If what the comment said was true, then something like:
>>>>>>> 
>>>>>>> (define_insn "cld"
>>>>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>>>>> ""
>>>>>>> "cld"
>>>>>>> [(set_attr "length" "1")
>>>>>>> (set_attr "length_immediate" "0")
>>>>>>> (set_attr "modrm" "0")])
>>>>>>> 
>>>>>>> would invalidate the entire register file and so would require all values
>>>>>>> to be spilt to the stack around the CLD.
>>>>>> 
>>>>>> Okay, thanks for the info. 
>>>>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>>>>> 
>>>>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>>>>> 
>>>>> @code{unspec_volatile} is used for volatile operations and operations
>>>>> that may trap; @code{unspec} is used for other operations.
>>>>> 
>>>>> which seems like a cyclic definition: volatile expressions are defined
>>>>> to be expressions that are volatile.
>>>>> 
>>>>> But IMO the semantics are that unspec_volatile patterns with a given
>>>>> set of inputs and outputs act for dataflow purposes like volatile asms
>>>>> with the same inputs and outputs.  The semantics of asm volatile are
>>>>> at least slightly more well-defined (if only by example); see extend.texi
>>>>> for details.  In particular:
>>>>> 
>>>>> Note that the compiler can move even @code{volatile asm} instructions relative
>>>>> to other code, including across jump instructions. For example, on many 
>>>>> targets there is a system register that controls the rounding mode of 
>>>>> floating-point operations. Setting it with a @code{volatile asm} statement,
>>>>> as in the following PowerPC example, does not work reliably.
>>>>> 
>>>>> @example
>>>>> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>>>> sum = x + y;
>>>>> @end example
>>>>> 
>>>>> The compiler may move the addition back before the @code{volatile asm}
>>>>> statement. To make it work as expected, add an artificial dependency to
>>>>> the @code{asm} by referencing a variable in the subsequent code, for
>>>>> example:
>>>>> 
>>>>> @example
>>>>> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>>>> sum = x + y;
>>>>> @end example
>>>>> 
>>>>> which is very similar to the unspec_volatile case we're talking about.
>>>>> 
>>>>> To take an x86 example:
>>>>> 
>>>>> void
>>>>> f (char *x)
>>>>> {
>>>>>  asm volatile ("");
>>>>>  x[0] = 0;
>>>>>  asm volatile ("");
>>>>>  x[1] = 0;
>>>>>  asm volatile ("");
>>>>> }
>>>> 
>>>> If we change the above as the following: (but it might not correct on the asm format):
>>>> 
>>>> Void
>>>> F (char *x)
>>>> {
>>>> asm volatile (“x[0]”);
>>>> x[0] = 0;
>>>> asm volatile (“x[1]"); 
>>>> x[1] = 0;
>>>> asm volatile ("”);
>>>> }
>>>> 
>>>> Will the moving and merging be blocked?
>>> 
>>> That would stop assignments moving up, but it wouldn't stop x[0] moving
>>> down across the x[1] asm.  Using:
>>> 
>>> asm volatile ("" ::: "memory");
>>> 
>>> would prevent moves in both directions, which was what I meant in my
>>> later comment about memory clobbers.
>>> 
>>> In each case, the same would be true for unspec_volatile.
>> 
>> So, is the following good enough:
>> 
>> asm volatile (reg1, reg2, … regN, memory)
>> mov reg1, 0
>> mov reg2, 0
>> ...
>> mov regN,0
>> asm volatile (reg1, reg2,… regN, memory)
>> return
>> 
>> 
>> I.e, just add one “asm volatile” insn whose operands include all registers and memory BEFORE and AFTER the whole zeroing sequence.
> 
> It isn't clear from your syntax whether the asm volatile arguments
> are uses or clobbers.

How can the syntax of asm volatile distinguish “Uses” and “Clobbers”? 

>  The idea was:
> 
> - There would be an asm volatile before the moves that clobbers (but does
>  not use) (mem:BLK (scratch)) and the zeroed registers.
> 
> - EPILOGUE_USES would make the zeroed registers live after the return.

Is EPILOGUE_USES the only way for this purpose? Will add another “asm volatile” immediately before the return serve the same purpose?


> 
>> Or, we have to add one “asm volatile” insn before and after each “mov” insn? 
> 
> No, the idea with the multiple clobber thing was to have a single asm.
Okay.
> 
>>>> If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the data flow analysis should automatically pick up the operands of “ASM_OPERANDS”, and fix the data flow, right?
>>> 
>>> Using a volatile asm or an unspec_volatile would be equally correct.
>>> The reason for preferring a volatile asm is that it doesn't require
>>> target-specific .md patterns.
>> Okay.
>> 
>> Then is there any benefit to use “UNSPEC_volatile” over “volatile asm”?
> 
> In general, yes: you can use the full .md functionality with
> unspec_volatiles, such as splitting insns, adding match_scratches
> with different clobber requirements, writing custom output code,
> setting attributes, etc.
> 
> But there isn't an advantage to using unspec_volatile in this case,
> where the instruction doesn't actually do anything.

Okay, I see. 

> 
>>> Of course, as mentioned before, “correct” in this case is: make a good
>>> but not foolproof attempt at trying to prevent later passes from moving
>>> the zeroing instructions further away from the return instruction
>>> (or, equivalently, moving other instructions closer to the return
>>> instruction).  Remember that we arrived here from a discussion about
>>> whether the volatile insns would be enough to prevent machine_reorg and
>>> other passes from moving instructions around (modulo bugs in those passes).
>>> My position was that the volatile insns would help, but that we might
>>> still find cases where a machine_reorg makes a behaviourally-correct
>>> transformation that we don't want.
>> So, you mean after adding “volatile asm” or “UNSPEC_volatile”,  although 
>> most of the insn movement can be prevented, there might still be small possibitly 
>> Some unwanted transformation might happen?
> 
> I wouldn't want to quantify the possibility.  The point is just that the
> possibility exists.  The unspec_volatile does not prevent movement of
> unrelated non-volatile operations.

Okay. 

thanks.

Qing
> 
> Thanks,
> Richard
Qing Zhao Sept. 23, 2020, 2:49 p.m. UTC | #175
> On Sep 23, 2020, at 9:40 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> Dropping them is fine with me FWIW.  That seems like a natural use
>>>>> for the new hook: drop zeroing that isn't actively wrong, but isn't
>>>>> likely to be useful either.
>>>> 
>>>> Okay, I will add a  new hook for this purpose.
>>> 
>>> It doesn't need to be a new hook.  The one I mentioned before
>>> would enough:
>>> 
>>>> The kind of target hook interface I was thinking of was:
>>>> 
>>>> HARD_REG_SET TARGET_EMIT_MOVE_ZEROS (const HARD_REG_SET &regs)
>>>> 
>>>> which:
>>>> 
>>>> - emits zeroing instructions for some target-specific subset of REGS
>>>> 
>>>> - returns the set of registers that were actually cleared
>>> 
>>> Not clearing mm0-7 and k0-7 would come under the first bullet point.
>> This makes sense.
>> 
>> However, how about the second bullet point:
>> 
>> - returns the set of registers that were actually cleared
>> 
>> Should we delete mm0-7 and k0-7 from the return set even though they are not zeroed by the target?
> 
> Yes, the point of the return value is to tell the caller what the
> hook actually did.  If the hook didn't clear mm0-7 then the returned
> set shouldn't include mm0-7.

Okay.

Qing
> 
> Thanks,
> Richard
Richard Sandiford Sept. 23, 2020, 3:21 p.m. UTC | #176
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 23, 2020, at 9:32 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> On Sep 23, 2020, at 6:05 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> 
>>>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>>>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>>>>> 
>>>>>>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>>>>>> 
>>>>>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>>>>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>>>>>>> 
>>>>>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>>>>>> 
>>>>>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>>>>>> but I assume the comment was only locally true even then.
>>>>>>>> 
>>>>>>>> If what the comment said was true, then something like:
>>>>>>>> 
>>>>>>>> (define_insn "cld"
>>>>>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>>>>>> ""
>>>>>>>> "cld"
>>>>>>>> [(set_attr "length" "1")
>>>>>>>> (set_attr "length_immediate" "0")
>>>>>>>> (set_attr "modrm" "0")])
>>>>>>>> 
>>>>>>>> would invalidate the entire register file and so would require all values
>>>>>>>> to be spilt to the stack around the CLD.
>>>>>>> 
>>>>>>> Okay, thanks for the info. 
>>>>>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>>>>>> 
>>>>>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>>>>>> 
>>>>>> @code{unspec_volatile} is used for volatile operations and operations
>>>>>> that may trap; @code{unspec} is used for other operations.
>>>>>> 
>>>>>> which seems like a cyclic definition: volatile expressions are defined
>>>>>> to be expressions that are volatile.
>>>>>> 
>>>>>> But IMO the semantics are that unspec_volatile patterns with a given
>>>>>> set of inputs and outputs act for dataflow purposes like volatile asms
>>>>>> with the same inputs and outputs.  The semantics of asm volatile are
>>>>>> at least slightly more well-defined (if only by example); see extend.texi
>>>>>> for details.  In particular:
>>>>>> 
>>>>>> Note that the compiler can move even @code{volatile asm} instructions relative
>>>>>> to other code, including across jump instructions. For example, on many 
>>>>>> targets there is a system register that controls the rounding mode of 
>>>>>> floating-point operations. Setting it with a @code{volatile asm} statement,
>>>>>> as in the following PowerPC example, does not work reliably.
>>>>>> 
>>>>>> @example
>>>>>> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>>>>> sum = x + y;
>>>>>> @end example
>>>>>> 
>>>>>> The compiler may move the addition back before the @code{volatile asm}
>>>>>> statement. To make it work as expected, add an artificial dependency to
>>>>>> the @code{asm} by referencing a variable in the subsequent code, for
>>>>>> example:
>>>>>> 
>>>>>> @example
>>>>>> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>>>>> sum = x + y;
>>>>>> @end example
>>>>>> 
>>>>>> which is very similar to the unspec_volatile case we're talking about.
>>>>>> 
>>>>>> To take an x86 example:
>>>>>> 
>>>>>> void
>>>>>> f (char *x)
>>>>>> {
>>>>>>  asm volatile ("");
>>>>>>  x[0] = 0;
>>>>>>  asm volatile ("");
>>>>>>  x[1] = 0;
>>>>>>  asm volatile ("");
>>>>>> }
>>>>> 
>>>>> If we change the above as the following: (but it might not correct on the asm format):
>>>>> 
>>>>> Void
>>>>> F (char *x)
>>>>> {
>>>>> asm volatile (“x[0]”);
>>>>> x[0] = 0;
>>>>> asm volatile (“x[1]"); 
>>>>> x[1] = 0;
>>>>> asm volatile ("”);
>>>>> }
>>>>> 
>>>>> Will the moving and merging be blocked?
>>>> 
>>>> That would stop assignments moving up, but it wouldn't stop x[0] moving
>>>> down across the x[1] asm.  Using:
>>>> 
>>>> asm volatile ("" ::: "memory");
>>>> 
>>>> would prevent moves in both directions, which was what I meant in my
>>>> later comment about memory clobbers.
>>>> 
>>>> In each case, the same would be true for unspec_volatile.
>>> 
>>> So, is the following good enough:
>>> 
>>> asm volatile (reg1, reg2, … regN, memory)
>>> mov reg1, 0
>>> mov reg2, 0
>>> ...
>>> mov regN,0
>>> asm volatile (reg1, reg2,… regN, memory)
>>> return
>>> 
>>> 
>>> I.e, just add one “asm volatile” insn whose operands include all registers and memory BEFORE and AFTER the whole zeroing sequence.
>> 
>> It isn't clear from your syntax whether the asm volatile arguments
>> are uses or clobbers.
>
> How can the syntax of asm volatile distinguish “Uses” and “Clobbers”? 

Well, I wasn't trying to discuss correct syntax, I just wasn't sure what
you meant.

As mentioned in the quote below, I was expecting the asm volatile
before the zeroing to include clobbers generated as discussed in
the earlier message:

  rtx asm_op = gen_rtx_ASM_OPERANDS (…);
  MEM_VOLATILE_P (asm_op) = 1;

  rtvec v = rtvec_alloc (N + 1);
  RTVEC_ELT (v, 0) = asm_op;
  RTVEC_ELT (v, 1) = gen_rtx_CLOBBER (VOIDmode, …);
  …
  RTVEC_ELT (v, N) = gen_rtx_CLOBBER (VOIDmode, …);

  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));

But doing this after the zeroing would give:

  …clobber reg1 in an asm…
  …set reg1 to zero…
  …clobber reg1 in an asm…

Dataflow-wise, the second clobber overwrites the effect of the zeroing.
Since nothing uses reg1 between the zeroing and the clobber, the zeroing
could be removed as dead.

>>  The idea was:
>> 
>> - There would be an asm volatile before the moves that clobbers (but does
>>  not use) (mem:BLK (scratch)) and the zeroed registers.
>> 
>> - EPILOGUE_USES would make the zeroed registers live after the return.
>
> Is EPILOGUE_USES the only way for this purpose? Will add another “asm volatile” immediately before the return serve the same purpose?

Why do you want to use an asm to keep the instructions live though?

As I think I mentioned before (but sorry if I'm misremembering),
using an asm would be counterproductive on delayed-branch targets.
The delayed branch scheduler looks backwards for something that could
fill the delay slot.  If we have an asm after the zeroing instructions
that uses the zeroed registers, that would prevent any zeroing
instruction from filling the delay slot.  The delayed branch scheduler
would therefore try to fill the delay slot with something from before
the zeroing sequence, which is exactly what we'd like to avoid.

Also, using an asm after the sequence would allow a machine_reorg
pass to reuse the zeroed registers for something else between the
second asm and the return.

IMO, marking the zeroed registers as being live out of the function
is the simplest, most direct way of representing the fact that the
zeroing effect has to survive to the function return.  It's how we
make sure that the function return value remains live and how we make
sure that the restored call-preserved registers remain live.

Thanks,
Richard
Qing Zhao Sept. 23, 2020, 4:08 p.m. UTC | #177
> On Sep 23, 2020, at 10:21 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 23, 2020, at 9:32 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> On Sep 23, 2020, at 6:05 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>> 
>>>>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>>>>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>>>>>>> 
>>>>>>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>>>>>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>>>>>>>> 
>>>>>>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>>>>>>> 
>>>>>>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>>>>>>> but I assume the comment was only locally true even then.
>>>>>>>>> 
>>>>>>>>> If what the comment said was true, then something like:
>>>>>>>>> 
>>>>>>>>> (define_insn "cld"
>>>>>>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>>>>>>> ""
>>>>>>>>> "cld"
>>>>>>>>> [(set_attr "length" "1")
>>>>>>>>> (set_attr "length_immediate" "0")
>>>>>>>>> (set_attr "modrm" "0")])
>>>>>>>>> 
>>>>>>>>> would invalidate the entire register file and so would require all values
>>>>>>>>> to be spilt to the stack around the CLD.
>>>>>>>> 
>>>>>>>> Okay, thanks for the info. 
>>>>>>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>>>>>>> 
>>>>>>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>>>>>>> 
>>>>>>> @code{unspec_volatile} is used for volatile operations and operations
>>>>>>> that may trap; @code{unspec} is used for other operations.
>>>>>>> 
>>>>>>> which seems like a cyclic definition: volatile expressions are defined
>>>>>>> to be expressions that are volatile.
>>>>>>> 
>>>>>>> But IMO the semantics are that unspec_volatile patterns with a given
>>>>>>> set of inputs and outputs act for dataflow purposes like volatile asms
>>>>>>> with the same inputs and outputs.  The semantics of asm volatile are
>>>>>>> at least slightly more well-defined (if only by example); see extend.texi
>>>>>>> for details.  In particular:
>>>>>>> 
>>>>>>> Note that the compiler can move even @code{volatile asm} instructions relative
>>>>>>> to other code, including across jump instructions. For example, on many 
>>>>>>> targets there is a system register that controls the rounding mode of 
>>>>>>> floating-point operations. Setting it with a @code{volatile asm} statement,
>>>>>>> as in the following PowerPC example, does not work reliably.
>>>>>>> 
>>>>>>> @example
>>>>>>> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>>>>>> sum = x + y;
>>>>>>> @end example
>>>>>>> 
>>>>>>> The compiler may move the addition back before the @code{volatile asm}
>>>>>>> statement. To make it work as expected, add an artificial dependency to
>>>>>>> the @code{asm} by referencing a variable in the subsequent code, for
>>>>>>> example:
>>>>>>> 
>>>>>>> @example
>>>>>>> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>>>>>> sum = x + y;
>>>>>>> @end example
>>>>>>> 
>>>>>>> which is very similar to the unspec_volatile case we're talking about.
>>>>>>> 
>>>>>>> To take an x86 example:
>>>>>>> 
>>>>>>> void
>>>>>>> f (char *x)
>>>>>>> {
>>>>>>> asm volatile ("");
>>>>>>> x[0] = 0;
>>>>>>> asm volatile ("");
>>>>>>> x[1] = 0;
>>>>>>> asm volatile ("");
>>>>>>> }
>>>>>> 
>>>>>> If we change the above as the following: (but it might not correct on the asm format):
>>>>>> 
>>>>>> Void
>>>>>> F (char *x)
>>>>>> {
>>>>>> asm volatile (“x[0]”);
>>>>>> x[0] = 0;
>>>>>> asm volatile (“x[1]"); 
>>>>>> x[1] = 0;
>>>>>> asm volatile ("”);
>>>>>> }
>>>>>> 
>>>>>> Will the moving and merging be blocked?
>>>>> 
>>>>> That would stop assignments moving up, but it wouldn't stop x[0] moving
>>>>> down across the x[1] asm.  Using:
>>>>> 
>>>>> asm volatile ("" ::: "memory");
>>>>> 
>>>>> would prevent moves in both directions, which was what I meant in my
>>>>> later comment about memory clobbers.
>>>>> 
>>>>> In each case, the same would be true for unspec_volatile.
>>>> 
>>>> So, is the following good enough:
>>>> 
>>>> asm volatile (reg1, reg2, … regN, memory)
>>>> mov reg1, 0
>>>> mov reg2, 0
>>>> ...
>>>> mov regN,0
>>>> asm volatile (reg1, reg2,… regN, memory)
>>>> return
>>>> 
>>>> 
>>>> I.e, just add one “asm volatile” insn whose operands include all registers and memory BEFORE and AFTER the whole zeroing sequence.
>>> 
>>> It isn't clear from your syntax whether the asm volatile arguments
>>> are uses or clobbers.
>> 
>> How can the syntax of asm volatile distinguish “Uses” and “Clobbers”? 
> 
> Well, I wasn't trying to discuss correct syntax, I just wasn't sure what
> you meant.
> 
> As mentioned in the quote below, I was expecting the asm volatile
> before the zeroing to include clobbers generated as discussed in
> the earlier message:
> 
>  rtx asm_op = gen_rtx_ASM_OPERANDS (…);
>  MEM_VOLATILE_P (asm_op) = 1;
> 
>  rtvec v = rtvec_alloc (N + 1);
>  RTVEC_ELT (v, 0) = asm_op;
>  RTVEC_ELT (v, 1) = gen_rtx_CLOBBER (VOIDmode, …);
>  …
>  RTVEC_ELT (v, N) = gen_rtx_CLOBBER (VOIDmode, …);
> 
>  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
> 
> But doing this after the zeroing would give:
> 
>  …clobber reg1 in an asm…
>  …set reg1 to zero…
>  …clobber reg1 in an asm…
> 
> Dataflow-wise, the second clobber overwrites the effect of the zeroing.
> Since nothing uses reg1 between the zeroing and the clobber, the zeroing
> could be removed as dead.

Okay, I see.
Thanks for the explanation.

> 
>>> The idea was:
>>> 
>>> - There would be an asm volatile before the moves that clobbers (but does
>>> not use) (mem:BLK (scratch)) and the zeroed registers.
>>> 
>>> - EPILOGUE_USES would make the zeroed registers live after the return.
>> 
>> Is EPILOGUE_USES the only way for this purpose? Will add another “asm volatile” immediately before the return serve the same purpose?
> 
> Why do you want to use an asm to keep the instructions live though?

Just want to avoid changing of “EPILOGUE_USES” and make the implementation simpler… -:)
But I might be wrong here.

> 
> As I think I mentioned before (but sorry if I'm misremembering),
> using an asm would be counterproductive on delayed-branch targets.
> The delayed branch scheduler looks backwards for something that could
> fill the delay slot.  If we have an asm after the zeroing instructions
> that uses the zeroed registers, that would prevent any zeroing
> instruction from filling the delay slot.  The delayed branch scheduler
> would therefore try to fill the delay slot with something from before
> the zeroing sequence, which is exactly what we'd like to avoid.
> 
> Also, using an asm after the sequence would allow a machine_reorg
> pass to reuse the zeroed registers for something else between the
> second asm and the return.
> 
> IMO, marking the zeroed registers as being live out of the function
> is the simplest, most direct way of representing the fact that the
> zeroing effect has to survive to the function return.  It's how we
> make sure that the function return value remains live and how we make
> sure that the restored call-preserved registers remain live.

Okay, now I understand.

Thanks a lot for your patience. 

Qing
> 
> Thanks,
> Richard
Segher Boessenkool Sept. 23, 2020, 11:40 p.m. UTC | #178
On Wed, Sep 23, 2020 at 09:28:33AM -0500, Qing Zhao wrote:
> > On Sep 22, 2020, at 5:37 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> which is very similar to the unspec_volatile case we're talking about.
> > 
> > So just like volatile memory accesses, they have an (unknown) side
> > effect, which means they have to execute on the real machine as on the
> > abstract machine (wrt sequence points).  All side effects have to happen
> > exactly as often as proscribed, and in the same order.  Just like
> > volatile asm, too.
> Don’t quite understand the above, what do you mean by “they have to 
> execute on the real machine as on the abstract machine”?

Exactly as described in the C standard.

> > (If you use "#" or "#smth" you can see those in the generated asm --
> > completely empty asm is helpfully (uh...) not printed.)
> 
> Can you explain this in more details?

final.c...  see
            /* Output the insn using them.  */
            if (string[0])
              {
(it doesn't output anything if an asm template is the empty string!)

> > Even then, x[] could be optimised away completely (with whole program
> > optimisation, or something).  The only way to really prevent the
> > compiler from optimising memory accesses is to make it not see the
> > details (with an asm or an unspec, for example).
> You mean with a asm volatile (“” “memory”)?

No, I meant doing the memory access from asm.  The only way to get
exactly the machine instructions you want is to write it in assembler
(inline assembler usually can work, too).

> >> The above is conservatively correct.  But not all passes do it.
> >> E.g. combine does have a similar approach:
> >> 
> >>  /* If INSN contains volatile references (specifically volatile MEMs),
> >>     we cannot combine across any other volatile references.
> > 
> > And this is correct, and the *minimum* to do even (this could change the
> > order of the side effects, depending how combine places the resulting
> > insns in I2 and I3).
> 
> Please clarify what “L2 and L3” are?

I2 and I3.  Combine name the insns it combines I0, I1, I2, and I3, and
writes the new insns it generates to the places of I2 and I3.  (In both
cases all of the lower numbered insns can be omitted, e.g. combine I2,
I3 into a new I3.  That is the general gist; there is some other stuff,
like, erm, "other_insn" :-) .)


Segher
Segher Boessenkool Sept. 23, 2020, 11:46 p.m. UTC | #179
Hi!

On Wed, Sep 23, 2020 at 12:05:22PM +0100, Richard Sandiford wrote:
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> > (But I do feel that the design for UNSPEC_volatile is not clean)
> 
> Agreed.  But I think that's partly because what it's trying to achieve
> isn't clean either.  It's a catch-all for “something is happening,
> but we're not saying what”.  And not saying what is itself unclean. ;-)

It shares all those same issues with just unspec, there is nothing that
unspec_volatile adds that is weird like this.  But yes, that is a very
good reason to not use unspecs unless you have to: they hinder
optimisation much, and if that was your actual *goal*, you will often
find that they do not prevent every optimisation you wanted them to.


Segher
diff mbox series

Patch

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 3721483..cc93d6f 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -136,6 +136,8 @@  static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
static tree ignore_attribute (tree *, tree, tree, int, bool *);
static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
+static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
+						 bool *);
static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
@@ -434,6 +436,9 @@  const struct attribute_spec c_common_attribute_table[] =
			      ignore_attribute, NULL },
  { "no_split_stack",	      0, 0, true,  false, false, false,
			      handle_no_split_stack_attribute, NULL },
+  { "zero_call_used_regs",    1, 1, true, false, false, false,
+			      handle_zero_call_used_regs_attribute, NULL },
+
  /* For internal use (marking of builtins and runtime functions) only.
     The name contains space to prevent its usage in source code.  */
  { "fn spec",		      1, 1, false, true, true, false,
@@ -4506,6 +4511,69 @@  handle_no_split_stack_attribute (tree *node, tree name,
  return NULL_TREE;
}

+/* Handle a "zero_call_used_regs" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
+				      int ARG_UNUSED (flags),
+				      bool *no_add_attris)
+{
+  tree decl = *node;
+  tree id = TREE_VALUE (args);
+  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
+
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+		"%qE attribute applies only to functions", name);
+      *no_add_attris = true;
+      return NULL_TREE;
+    }
+  else if (DECL_INITIAL (decl))
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+		"cannot set %qE attribute after definition", name);
+      *no_add_attris = true;
+      return NULL_TREE;
+    }
+
+  if (TREE_CODE (id) != STRING_CST)
+    {
+      error ("attribute %qE arguments not a string", name);
+      *no_add_attris = true;
+      return NULL_TREE;
+    }
+
+  if (!targetm.calls.pro_epilogue_use)
+    {
+      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
+      return NULL_TREE;
+    }
+
+  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
+    zero_call_used_regs_type = zero_call_used_regs_skip;
+  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
+    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
+  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
+    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
+  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
+    zero_call_used_regs_type = zero_call_used_regs_used;
+  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
+    zero_call_used_regs_type = zero_call_used_regs_all;
+  else
+    {
+      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
+ 	     name, "skip", "used-gpr", "all-gpr", "used", "all");
+      *no_add_attris = true;
+      return NULL_TREE;
+    }
+
+  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
+
+  return NULL_TREE;
+}
+
/* Handle a "returns_nonnull" attribute; arguments as in
   struct attribute_spec.handler.  */

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 81bd2ee..ded1880 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -2681,6 +2681,10 @@  merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
	  DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
	}

+      /* Merge the zero_call_used_regs_type information.  */
+      if (TREE_CODE (newdecl) == FUNCTION_DECL)
+	DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
+
      /* Merge the storage class information.  */
      merge_weak (newdecl, olddecl);

diff --git a/gcc/common.opt b/gcc/common.opt
index df8af36..19900f9 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3083,6 +3083,29 @@  fzero-initialized-in-bss
Common Report Var(flag_zero_initialized_in_bss) Init(1)
Put zero initialized data in the bss section.

+fzero-call-used-regs=
+Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
+Clear call-used registers upon function return.
+
+Enum
+Name(zero_call_used_regs) Type(enum zero_call_used_regs)
+Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
+
+EnumValue
+Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
+
+EnumValue
+Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
+
+EnumValue
+Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
+
+EnumValue
+Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
+
+EnumValue
+Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
+
g
Common Driver RejectNegative JoinedOrMissing
Generate debug information in default format.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5c373c0..fd1aa9c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3551,6 +3551,48 @@  ix86_function_value_regno_p (const unsigned int regno)
  return false;
}

+/* TARGET_ZERO_CALL_USED_REGNO_P.  */
+
+static bool
+ix86_zero_call_used_regno_p (const unsigned int regno,
+			     bool gpr_only)
+{
+  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
+}
+
+/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
+
+static machine_mode
+ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
+{
+  /* NB: We only need to zero the lower 32 bits for integer registers
+     and the lower 128 bits for vector registers since destination are
+     zero-extended to the full register width.  */
+  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
+}
+
+/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
+
+static rtx
+ix86_zero_all_vector_registers (bool used_only)
+{
+  if (!TARGET_AVX)
+    return NULL;
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
+	 || (TARGET_64BIT
+	     && (REX_SSE_REGNO_P (regno)
+		 || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
+	&& (!this_target_hard_regs->x_call_used_regs[regno]
+	    || fixed_regs[regno]
+	    || is_live_reg_at_exit (regno)
+	    || (used_only && !df_regs_ever_live_p (regno))))
+      return NULL;
+
+  return gen_avx_vzeroall ();
+}
+
/* Define how to find the value returned by a function.
   VALTYPE is the data type of the value (as a tree).
   If the precise function being called is known, FUNC is its FUNCTION_DECL;
@@ -8513,7 +8555,7 @@  ix86_expand_prologue (void)
      insn = emit_insn (gen_set_got (pic));
      RTX_FRAME_RELATED_P (insn) = 1;
      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
-      emit_insn (gen_prologue_use (pic));
+      emit_insn (gen_pro_epilogue_use (pic));
      /* Deleting already emmitted SET_GOT if exist and allocated to
	 REAL_PIC_OFFSET_TABLE_REGNUM.  */
      ix86_elim_entry_set_got (pic);
@@ -8542,7 +8584,7 @@  ix86_expand_prologue (void)
     Further, prevent alloca modifications to the stack pointer from being
     combined with prologue modifications.  */
  if (TARGET_SEH)
-    emit_insn (gen_prologue_use (stack_pointer_rtx));
+    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
}

/* Emit code to restore REG using a POP insn.  */
@@ -23319,6 +23361,18 @@  ix86_run_selftests (void)
#undef TARGET_FUNCTION_VALUE_REGNO_P
#define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p

+#undef TARGET_ZERO_CALL_USED_REGNO_P
+#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
+
+#undef TARGET_ZERO_CALL_USED_REGNO_MODE
+#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
+
+#undef TARGET_PRO_EPILOGUE_USE
+#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
+
+#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
+#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
+
#undef TARGET_PROMOTE_FUNCTION_MODE
#define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d0ecd9e..e7df59f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -194,7 +194,7 @@ 
  UNSPECV_STACK_PROBE
  UNSPECV_PROBE_STACK_RANGE
  UNSPECV_ALIGN
-  UNSPECV_PROLOGUE_USE
+  UNSPECV_PRO_EPILOGUE_USE
  UNSPECV_SPLIT_STACK_RETURN
  UNSPECV_CLD
  UNSPECV_NOPS
@@ -13525,8 +13525,8 @@ 

;; As USE insns aren't meaningful after reload, this is used instead
;; to prevent deleting instructions setting registers for PIC code
-(define_insn "prologue_use"
-  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
+(define_insn "pro_epilogue_use"
+  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
  ""
  ""
  [(set_attr "length" "0")])
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 6b6cfcd..e56d6ec 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -418,6 +418,16 @@  enum symbol_visibility
  VISIBILITY_INTERNAL
};

+/* Zero call-used registers type.  */
+enum zero_call_used_regs {
+  zero_call_used_regs_unset = 0,
+  zero_call_used_regs_skip,
+  zero_call_used_regs_used_gpr,
+  zero_call_used_regs_all_gpr,
+  zero_call_used_regs_used,
+  zero_call_used_regs_all
+};
+
/* enums used by the targetm.excess_precision hook.  */

enum flt_eval_method
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index c800b74..b32c55f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3984,6 +3984,17 @@  performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
A declaration to which @code{weakref} is attached and that is associated
with a named @code{target} must be @code{static}.

+@item zero_call_used_regs ("@var{choice}")
+@cindex @code{zero_call_used_regs} function attribute
+The @code{zero_call_used_regs} attribute causes the compiler to zero
+call-used registers at function return according to @var{choice}.
+@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
+call-used general purpose registers which are used in funciton.
+@samp{all-gpr} zeros all call-used general purpose registers.
+@samp{used} zeros call-used registers which are used in function.
+@samp{all} zeros all call-used registers.  The default for the
+attribute is controlled by @option{-fzero-call-used-regs}.
+
@end table

@c This is the end of the target-independent attribute table
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 09bcc5b..da02686 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -542,7 +542,7 @@  Objective-C and Objective-C++ Dialects}.
-funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
-funsafe-math-optimizations  -funswitch-loops @gol
-fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
--fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
+-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
--param @var{name}=@var{value}
-O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}

@@ -12273,6 +12273,17 @@  int foo (void)

Not all targets support this option.

+@item -fzero-call-used-regs=@var{choice}
+@opindex fzero-call-used-regs
+Zero call-used registers at function return according to
+@var{choice}.  @samp{skip}, which is the default, doesn't zero
+call-used registers.  @samp{used-gpr} zeros call-used general purpose
+registers which are used in function.  @samp{all-gpr} zeros all
+call-used registers.  @samp{used} zeros call-used registers which
+are used in function.  @samp{all} zeros all call-used registers.  You
+can control this behavior for a specific function by using the function
+attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
+
@item --param @var{name}=@var{value}
@opindex param
In some places, GCC uses various constants to control the amount of
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 6e7d9dc..43dddd3 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4571,6 +4571,22 @@  should recognize only the caller's register numbers.
If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
@end deftypefn

+@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
+A target hook that returns @code{true} if @var{regno} is the number of a
+call used register.  If @var{general_reg_only_p} is @code{true},
+@var{regno} must be the number of a hard general register.
+
+If this hook is not defined, then default_zero_call_used_regno_p will be used.
+@end deftypefn
+
+@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
+A target hook that returns a mode of suitable to zero the register for the
+call used register @var{regno} in @var{mode}.
+
+If this hook is not defined, then default_zero_call_used_regno_mode will be
+used.
+@end deftypefn
+
@defmac APPLY_RESULT_SIZE
Define this macro if @samp{untyped_call} and @samp{untyped_return}
need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
@@ -12043,6 +12059,17 @@  argument list due to stack realignment.  Return @code{NULL} if no DRAP
is needed.
@end deftypefn

+@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
+This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
+prevent deleting register setting instructions in proprologue and epilogue.
+@end deftypefn
+
+@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
+This hook should return an rtx to zero all vector registers at function
+exit.  If @var{used_only} is @code{true}, only used vector registers should
+be zeroed.  Return @code{NULL} if possible
+@end deftypefn
+
@deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
When optimization is disabled, this hook indicates whether or not
arguments should be allocated to stack slots.  Normally, GCC allocates
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 3be984b..bee917a 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3430,6 +3430,10 @@  for a new target instead.

@hook TARGET_FUNCTION_VALUE_REGNO_P

+@hook TARGET_ZERO_CALL_USED_REGNO_P
+
+@hook TARGET_ZERO_CALL_USED_REGNO_MODE
+
@defmac APPLY_RESULT_SIZE
Define this macro if @samp{untyped_call} and @samp{untyped_return}
need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
@@ -8109,6 +8113,10 @@  and the associated definitions of those functions.

@hook TARGET_GET_DRAP_RTX

+@hook TARGET_PRO_EPILOGUE_USE
+
+@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
+
@hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS

@hook TARGET_CONST_ANCHOR
diff --git a/gcc/function.c b/gcc/function.c
index 9eee9b5..9908530 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -50,6 +50,7 @@  along with GCC; see the file COPYING3.  If not see
#include "emit-rtl.h"
#include "recog.h"
#include "rtl-error.h"
+#include "hard-reg-set.h"
#include "alias.h"
#include "fold-const.h"
#include "stor-layout.h"
@@ -5808,6 +5809,147 @@  make_prologue_seq (void)
  return seq;
}

+/* Check whether the hard register REGNO is live at the exit block
+ * of the current routine.  */
+bool
+is_live_reg_at_exit (unsigned int regno)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+    {
+      bitmap live_out = df_get_live_out (e->src);
+      if (REGNO_REG_SET_P (live_out, regno))
+	return true;
+    }
+
+  return false;
+}
+
+/* Emit a sequence of insns to zero the call-used-registers for the current
+ * function.  */
+
+static void
+gen_call_used_regs_seq (void)
+{
+  if (!targetm.calls.pro_epilogue_use)
+    return;
+
+  bool gpr_only = true;
+  bool used_only = true;
+  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
+
+  if (flag_zero_call_used_regs)
+    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
+	== zero_call_used_regs_unset)
+      zero_call_used_regs_type = flag_zero_call_used_regs;
+    else
+      zero_call_used_regs_type
+	= DECL_ZERO_CALL_USED_REGS (current_function_decl);
+  else
+    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
+
+  /* No need to zero call-used-regs when no user request is present.  */
+  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
+    return;
+
+  /* No need to zero call-used-regs in main ().  */
+  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
+    return;
+
+  /* No need to zero call-used-regs if __builtin_eh_return is called
+     since it isn't a normal function return.  */
+  if (crtl->calls_eh_return)
+    return;
+
+  /* If gpr_only is true, only zero call-used-registers that are
+     general-purpose registers; if used_only is true, only zero
+     call-used-registers that are used in the current function.  */
+  switch (zero_call_used_regs_type)
+    {
+      case zero_call_used_regs_all_gpr:
+	used_only = false;
+	break;
+      case zero_call_used_regs_used:
+	gpr_only = false;
+	break;
+      case zero_call_used_regs_all:
+	gpr_only = false;
+	used_only = false;
+	break;
+      default:
+	break;
+    }
+
+  /* An optimization to use a single hard insn to zero all vector registers on
+     the target that provides such insn.  */
+  if (!gpr_only
+      && targetm.calls.zero_all_vector_registers)
+    {
+      rtx zero_all_vec_insn
+	= targetm.calls.zero_all_vector_registers (used_only);
+      if (zero_all_vec_insn)
+	{
+	  emit_insn (zero_all_vec_insn);
+	  gpr_only = true;
+	}
+    }
+
+  /* For each of the hard registers, check to see whether we should zero it if:
+     1. it is a call-used-registers;
+ and 2. it is not a fixed-registers;
+ and 3. it is not live at the end of the routine;
+ and 4. it is general purpose register if gpr_only is true;
+ and 5. it is used in the routine if used_only is true;
+   */
+
+  /* This array holds the zero rtx with the correponding machine mode.  */
+  rtx zero_rtx[(int)MAX_MACHINE_MODE];
+  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
+    zero_rtx[i] = NULL_RTX;
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    {
+      if (!this_target_hard_regs->x_call_used_regs[regno])
+	continue;
+      if (fixed_regs[regno])
+	continue;
+      if (is_live_reg_at_exit (regno))
+	continue;
+      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
+	continue;
+      if (used_only && !df_regs_ever_live_p (regno))
+	continue;
+
+      /* Now we can emit insn to zero this register.  */
+      rtx reg, tmp;
+
+      machine_mode mode
+	= targetm.calls.zero_call_used_regno_mode (regno,
+						   reg_raw_mode[regno]);
+      if (mode == VOIDmode)
+	continue;
+      if (!have_regs_of_mode[mode])
+	continue;
+
+      reg = gen_rtx_REG (mode, regno);
+      if (zero_rtx[(int)mode] == NULL_RTX)
+	{
+	  zero_rtx[(int)mode] = reg;
+	  tmp = gen_rtx_SET (reg, const0_rtx);
+	  emit_insn (tmp);
+	}
+      else
+	emit_move_insn (reg, zero_rtx[(int)mode]);
+
+      emit_insn (targetm.calls.pro_epilogue_use (reg));
+    }
+
+  return;
+}
+
+
/* Return a sequence to be used as the epilogue for the current function,
   or NULL.  */

@@ -5819,6 +5961,9 @@  make_epilogue_seq (void)

  start_sequence ();
  emit_note (NOTE_INSN_EPILOGUE_BEG);
+
+  gen_call_used_regs_seq ();
+
  rtx_insn *seq = targetm.gen_epilogue ();
  if (seq)
    emit_jump_insn (seq);
diff --git a/gcc/function.h b/gcc/function.h
index d55cbdd..fc36c3e 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -705,4 +705,6 @@  extern const char *current_function_name (void);

extern void used_types_insert (tree);

+extern bool is_live_reg_at_exit (unsigned int);
+
#endif  /* GCC_FUNCTION_H */
diff --git a/gcc/target.def b/gcc/target.def
index 07059a8..8aab63e 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5022,6 +5022,26 @@  If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
 default_function_value_regno_p)

DEFHOOK
+(zero_call_used_regno_p,
+ "A target hook that returns @code{true} if @var{regno} is the number of a\n\
+call used register.  If @var{general_reg_only_p} is @code{true},\n\
+@var{regno} must be the number of a hard general register.\n\
+\n\
+If this hook is not defined, then default_zero_call_used_regno_p will be used.",
+ bool, (const unsigned int regno, bool general_reg_only_p),
+ default_zero_call_used_regno_p)
+
+DEFHOOK
+(zero_call_used_regno_mode,
+ "A target hook that returns a mode of suitable to zero the register for the\n\
+call used register @var{regno} in @var{mode}.\n\
+\n\
+If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
+used.",
+ machine_mode, (const unsigned int regno, machine_mode mode),
+ default_zero_call_used_regno_mode)
+
+DEFHOOK
(fntype_abi,
 "Return the ABI used by a function with type @var{type}; see the\n\
definition of @code{predefined_function_abi} for details of the ABI\n\
@@ -5068,6 +5088,19 @@  argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
is needed.",
 rtx, (void), NULL)

+DEFHOOK
+(pro_epilogue_use,
+ "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
+prevent deleting register setting instructions in proprologue and epilogue.",
+ rtx, (rtx reg), NULL)
+
+DEFHOOK
+(zero_all_vector_registers,
+ "This hook should return an rtx to zero all vector registers at function\n\
+exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
+be zeroed.  Return @code{NULL} if possible",
+ rtx, (bool used_only), NULL)
+
/* Return true if all function parameters should be spilled to the
   stack.  */
DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 0113c7b..ed02173 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -987,6 +987,23 @@  default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
#endif
}

+/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
+
+bool
+default_zero_call_used_regno_p (const unsigned int,
+				bool)
+{
+  return false;
+}
+
+/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
+
+machine_mode
+default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
+{
+  return mode;
+}
+
rtx
default_internal_arg_pointer (void)
{
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index b572a36..370df19 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -162,6 +162,9 @@  extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
extern rtx default_function_value (const_tree, const_tree, bool);
extern rtx default_libcall_value (machine_mode, const_rtx);
extern bool default_function_value_regno_p (const unsigned int);
+extern bool default_zero_call_used_regno_p (const unsigned int, bool);
+extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
+						       machine_mode);
extern rtx default_internal_arg_pointer (void);
extern rtx default_static_chain (const_tree, bool);
extern void default_trampoline_init (rtx, tree, rtx);
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
new file mode 100644
index 0000000..3c2ac72
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
@@ -0,0 +1,3 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fzero-call-used-regs=used" } */
+/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
new file mode 100644
index 0000000..acf48c4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
@@ -0,0 +1,4 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2" } */
+
+extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
new file mode 100644
index 0000000..9f61dc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
new file mode 100644
index 0000000..09048e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
new file mode 100644
index 0000000..4862688
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
@@ -0,0 +1,39 @@ 
+/* { dg-do run { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
+
+struct S { int i; };
+__attribute__((const, noinline, noclone))
+struct S foo (int x)
+{
+  struct S s;
+  s.i = x;
+  return s;
+}
+
+int a[2048], b[2048], c[2048], d[2048];
+struct S e[2048];
+
+__attribute__((noinline, noclone)) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      e[i] = foo (i);
+      a[i+2] = a[i] + a[i+1];
+      b[10] = b[10] + i;
+      c[i] = c[2047 - i];
+      d[i] = d[i + 1];
+    }
+}
+
+int
+main ()
+{
+  int i;
+  bar ();
+  for (i = 0; i < 1024; i++)
+    if (e[i].i != i)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
new file mode 100644
index 0000000..500251b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
@@ -0,0 +1,39 @@ 
+/* { dg-do run { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+struct S { int i; };
+__attribute__((const, noinline, noclone))
+struct S foo (int x)
+{
+  struct S s;
+  s.i = x;
+  return s;
+}
+
+int a[2048], b[2048], c[2048], d[2048];
+struct S e[2048];
+
+__attribute__((noinline, noclone)) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      e[i] = foo (i);
+      a[i+2] = a[i] + a[i+1];
+      b[10] = b[10] + i;
+      c[i] = c[2047 - i];
+      d[i] = d[i + 1];
+    }
+}
+
+int
+main ()
+{
+  int i;
+  bar ();
+  for (i = 0; i < 1024; i++)
+    if (e[i].i != i)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
new file mode 100644
index 0000000..8b058e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
new file mode 100644
index 0000000..d4eaaf7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
new file mode 100644
index 0000000..dd3bb90
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
new file mode 100644
index 0000000..e2274f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
new file mode 100644
index 0000000..7f5d153
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
new file mode 100644
index 0000000..fe13d2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
new file mode 100644
index 0000000..205a532
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
new file mode 100644
index 0000000..e046684
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
new file mode 100644
index 0000000..4be8ff6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
new file mode 100644
index 0000000..0eb34e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
+
+__attribute__ ((zero_call_used_regs("used")))
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
new file mode 100644
index 0000000..cbb63a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
new file mode 100644
index 0000000..7573197
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
new file mode 100644
index 0000000..de71223
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
new file mode 100644
index 0000000..ccfa441
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
new file mode 100644
index 0000000..6b46ca3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+__attribute__ ((zero_call_used_regs("all-gpr")))
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
new file mode 100644
index 0000000..0680f38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
new file mode 100644
index 0000000..534defa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
new file mode 100644
index 0000000..477bb19
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
new file mode 100644
index 0000000..a305a60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 95eea63..01a1f24 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1464,6 +1464,15 @@  process_options (void)
	}
    }

+  if (flag_zero_call_used_regs != zero_call_used_regs_skip
+      && !targetm.calls.pro_epilogue_use)
+    {
+      error_at (UNKNOWN_LOCATION,
+		"%<-fzero-call-used-regs=%> is not supported for this "
+		"target");
+      flag_zero_call_used_regs = zero_call_used_regs_skip;
+    }
+
  /* One region RA really helps to decrease the code size.  */
  if (flag_ira_region == IRA_REGION_AUTODETECT)
    flag_ira_region
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 8c5a2e3..71badbd 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1825,7 +1825,11 @@  struct GTY(()) tree_decl_with_vis {
 unsigned final : 1;
 /* Belong to FUNCTION_DECL exclusively.  */
 unsigned regdecl_flag : 1;
- /* 14 unused bits. */
+
+ /* How to clear call-used registers upon function return.  */
+ ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
+
+ /* 11 unused bits.  */
};

struct GTY(()) tree_var_decl {
diff --git a/gcc/tree.h b/gcc/tree.h
index cf546ed..d378a88 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -2925,6 +2925,11 @@  extern void decl_value_expr_insert (tree, tree);
#define DECL_VISIBILITY(NODE) \
  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)

+/* Value of the function decl's type of zeroing the call used
+   registers upon return from function.  */
+#define DECL_ZERO_CALL_USED_REGS(NODE) \
+  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
+
/* Nonzero means that the decl (or an enclosing scope) had its
   visibility specified rather than being inferred.  */
#define DECL_VISIBILITY_SPECIFIED(NODE) \