mbox series

[v3,0/8] __builtin_dynamic_object_size

Message ID 20211126052851.2176408-1-siddhesh@gotplt.org
Headers show
Series __builtin_dynamic_object_size | expand

Message

Siddhesh Poyarekar Nov. 26, 2021, 5:28 a.m. UTC
This patchset implements the __builtin_dynamic_object_size builtin for
gcc.  The primary motivation to have this builtin in gcc is to enable
_FORTIFY_SOURCE=3 support with gcc, thus allowing greater fortification
in use cases where the potential performance tradeoff is acceptable.

Semantics:
----------

__builtin_dynamic_object_size has the same signature as
__builtin_object_size; it accepts a pointer and type ranging from 0 to 3
and it returns an object size estimate for the pointer based on an
analysis of which objects the pointer could point to.  The actual
properties of the object size estimate are different:

- In the best case __builtin_dynamic_object_size evaluates to an
  expression that represents a precise size of the object being pointed
  to.

- In case a precise object size expression cannot be evaluated,
  __builtin_dynamic_object_size attempts to evaluate an estimate size
  expression based on the object size type.

- In what situations the builtin returns an estimate vs a precise
  expression is an implementation detail and may change in future.
  Users must always assume, as in the case of __builtin_object_size, that
  the returned value is the maximum or minimum based on the object size
  type they have provided.

- In the worst case of failure, __builtin_dynamic_object_size returns a
  constant (size_t)-1 or (size_t)0.

Implementation:
---------------

- The __builtin_dynamic_object_size support is implemented in
  tree-object-size.  In most cases the first pass (early_objsz) the
  builtin is treated like __builtin_object_size to preserve subobject
  bounds.

- Each element of the object_sizes vector is now a TREE_VEC of size 2
  holding bytes to the end of the object and the full size of the
  object.  This allows proper handling of negative offsets, allowing
  them to the extent of the whole object bounds.  This improves
  __builtin_object_size usage too with negative offsets, consistently
  returning valid results for pointer decrementing loops too.

- The patchset begins with structural modification of the
  tree-object-size pass, followed by enhancement to return size
  expressions.  I have split the implementation into one feature per
  patch (calls, function parameters, PHI, etc.) to hopefully ease
  review.

Performance:
------------

Expressions generated by this pass in theory could be arbitrarily
complex.  I have not made an attempt to limit nesting of objects since
it seemed too early to do that.  In practice based on the few
applications I built, most of the complexity of the expressions got
folded away.  Even so, the performance overhead is likely to be
non-zero.  If we find performance degradation to be significant, we
could later add nesting limits to bail out if a size expression gets too
complex.

I have implemented simplification of __*_chk to their normal
variants if we can determine at compile time that it is safe.  This
should limit the performance overhead of the expressions in valid cases.

Build time performance doesn't seem to be affected much based on an
unscientific check to time
`make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`.  It only increases by
about a couple of seconds when the dynamic tests are added and remains
more or less in the same ballpark otherwise.

Testing:
--------

I have added tests for dynamic object sizes as well as wrappers for all
__builtin_object_size tests to provide wide coverage.  I have also done
a full bootstrap build and test run on x86_64.

I have also built bash, cmake, wpa_supplicant and systemtap with
_FORTIFY_SOURCE=2 and _FORTIFY_SOURCE=3 (with a hacked up glibc to make
sure it works) and saw no issues in any of those builds.  I did some
rudimentary analysis of the generated binaries using fortify-metrics[1]
to confirm that there was a difference in coverage between the two
fortification levels.

Here is a summary of coverage in the above packages:

F = number of fortified calls
T = Total number of calls to fortifiable functions (fortified as well as
unfortified)
C = F * 100/ T

Package		F(2)	T(2)	F(3)	T(3)	C(2)	C(3)
bash		428	1220	1005	1196	35.08%	84.03%
wpa_supplicant	1635	3232	2350	3408	50.59%	68.96%
systemtap	324	1990	343	1994	16.28%	17.20%
cmake		830	14181	958	14196	5.85%	6.75%

The numbers are slightly lower than the previous patch series because in
the interim I pushed an improvement to folding of the _chk builtins so
that they can use ranges to simplify the calls to their regular
variants.  Also note that even _FORTIFY_SOURCE=2 coverage should be
improved due to negative offset handling.

Additional testing plans (i.e. I've already started to do some of this):

- Build packages to compare values returned by __builtin_object_size
  with the older pass and this new one.  Also compare with
  __builtin_dynamic_object_size.

- Expand the list of packages to get more coverage metrics.

- Explore performance impact on applications on building with
  _FORTIFY_SOURCE=3.

Limitations/Future work:
------------------------

- I need to enable _FORTIFY_SOURCE=3 for gcc in glibc; currently it is
  llvm-only.  I've started working on these patches too on the side.

- Explore ways to use the non-constant sizes returned for
  __builtin_object_size to arrive at a constant estimate to improve
  _FORTIFY_SOURCE=2 coverage in a way that accounts for undefined
  behaviour.

- More work could to be done to reduce the performance impact of the
  computation.  One way could be to add a heuristic where the pass keeps
  track of nesting in the expression and either bail out or compute an
  estimate if nesting crosses a threshold.  I'll take this up once we
  have more data on the nature of the bottlenecks.

Changes from v2:

Changes to individual patches have been mentioned in the patches
themselves.

- Dropped patch to remove check_for_plus_in_for_loops and osi->pass
- Merge negative offset support (10/10 in v2) into 3/8 and support
  static object sizes
- Merge dynamic offset (10/10 in v2) support into 8/8

Siddhesh Poyarekar (8):
  tree-object-size: Replace magic numbers with enums
  tree-object-size: Abstract object_sizes array
  tree-object-size: Save sizes as trees and support negative offsets
  __builtin_dynamic_object_size: Recognize builtin
  tree-object-size: Support dynamic sizes in conditions
  tree-object-size: Handle function parameters
  tree-object-size: Handle GIMPLE_CALL
  tree-object-size: Dynamic sizes for ADDR_EXPR

 gcc/builtins.c                                |   23 +-
 gcc/builtins.def                              |    1 +
 gcc/doc/extend.texi                           |   13 +
 gcc/gimple-fold.c                             |   11 +-
 .../g++.dg/ext/builtin-dynamic-object-size1.C |    5 +
 .../g++.dg/ext/builtin-dynamic-object-size2.C |    5 +
 .../gcc.dg/builtin-dynamic-alloc-size.c       |    7 +
 .../gcc.dg/builtin-dynamic-object-size-0.c    |  464 +++++++
 .../gcc.dg/builtin-dynamic-object-size-1.c    |    6 +
 .../gcc.dg/builtin-dynamic-object-size-10.c   |   11 +
 .../gcc.dg/builtin-dynamic-object-size-11.c   |    7 +
 .../gcc.dg/builtin-dynamic-object-size-12.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-13.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-14.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-15.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-16.c   |    6 +
 .../gcc.dg/builtin-dynamic-object-size-17.c   |    7 +
 .../gcc.dg/builtin-dynamic-object-size-18.c   |    8 +
 .../gcc.dg/builtin-dynamic-object-size-19.c   |  104 ++
 .../gcc.dg/builtin-dynamic-object-size-2.c    |    6 +
 .../gcc.dg/builtin-dynamic-object-size-3.c    |    6 +
 .../gcc.dg/builtin-dynamic-object-size-4.c    |    6 +
 .../gcc.dg/builtin-dynamic-object-size-5.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-6.c    |    5 +
 .../gcc.dg/builtin-dynamic-object-size-7.c    |    5 +
 .../gcc.dg/builtin-dynamic-object-size-8.c    |    5 +
 .../gcc.dg/builtin-dynamic-object-size-9.c    |    5 +
 gcc/testsuite/gcc.dg/builtin-object-size-1.c  |  184 ++-
 gcc/testsuite/gcc.dg/builtin-object-size-16.c |    2 +
 gcc/testsuite/gcc.dg/builtin-object-size-17.c |    2 +
 gcc/testsuite/gcc.dg/builtin-object-size-2.c  |  163 +++
 gcc/testsuite/gcc.dg/builtin-object-size-3.c  |  182 +++
 gcc/testsuite/gcc.dg/builtin-object-size-4.c  |  123 ++
 gcc/testsuite/gcc.dg/builtin-object-size-5.c  |   37 +
 gcc/tree-object-size.c                        | 1219 +++++++++++++----
 gcc/tree-object-size.h                        |   12 +-
 gcc/ubsan.c                                   |    5 +-
 37 files changed, 2392 insertions(+), 280 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size2.C
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-alloc-size.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-11.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-12.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-13.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-14.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-15.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-16.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-17.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-18.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-19.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-4.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-5.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-6.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-7.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-8.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-9.c

Comments

Siddhesh Poyarekar Nov. 26, 2021, 5:38 a.m. UTC | #1
On 11/26/21 10:58, Siddhesh Poyarekar wrote:
> sure it works) and saw no issues in any of those builds.  I did some
> rudimentary analysis of the generated binaries using fortify-metrics[1]
> to confirm that there was a difference in coverage between the two
> fortification levels.
> 
> Here is a summary of coverage in the above packages:
> 
> F = number of fortified calls
> T = Total number of calls to fortifiable functions (fortified as well as
> unfortified)
> C = F * 100/ T
> 
> Package		F(2)	T(2)	F(3)	T(3)	C(2)	C(3)
> bash		428	1220	1005	1196	35.08%	84.03%
> wpa_supplicant	1635	3232	2350	3408	50.59%	68.96%
> systemtap	324	1990	343	1994	16.28%	17.20%
> cmake		830	14181	958	14196	5.85%	6.75%
> 
> The numbers are slightly lower than the previous patch series because in
> the interim I pushed an improvement to folding of the _chk builtins so
> that they can use ranges to simplify the calls to their regular
> variants.  Also note that even _FORTIFY_SOURCE=2 coverage should be
> improved due to negative offset handling.

[1] https://github.com/siddhesh/fortify-metrics