mbox series

[00/10] __builtin_dynamic_object_size

Message ID 20211109190137.1107736-1-siddhesh@gotplt.org
Headers show
Series __builtin_dynamic_object_size | expand

Message

Siddhesh Poyarekar Nov. 9, 2021, 7:01 p.m. UTC
This patchset implements the __builtin_dynamic_object_size builtin for
gcc.  The primary motivation to have this builtin in gcc is to enable
_FORTIFY_SOURCE=3 support with gcc, thus allowing greater fortification
in use cases where the potential performance tradeoff is acceptable.

Semantics:
----------

__builtin_dynamic_object_size has the same signature as
__builtin_object_size; it accepts a pointer and type ranging from 0 to 3
and it returns an object size estimate for the pointer based on an
analysis of which objects the pointer could point to.  The actual
properties of the object size estimate are different:

- In the best case __builtin_dynamic_object_size evaluates to an
  expression that represents a precise size of the object being pointed
  to.

- In case a precise object size expression cannot be evaluated,
  __builtin_dynamic_object_size attempts to evaluate an estimate size
  expression based on the object size type.

- In what situations the builtin returns an estimate vs a precise
  expression is an implementation detail and may change in future.
  Users must always assume, as in the case of __builtin_object_size, that
  the returned value is the maximum or minimum based on the object size
  type they have provided.

- In the worst case of failure, __builtin_dynamic_object_size returns a
  constant (size_t)-1 or (size_t)0.

Implementation:
---------------

- The __builtin_dynamic_object_size support is implemented in
  tree-object-size.  In most cases the first pass (early_objsz) is able
  to evaluate object size expressions.  The second phase mainly ends up
  simplifying the __builtin_dynamic_object_size to
  __builtin_object_size.

- The patchset begins with structural modification of the
  tree-object-size pass, followed by enhancement to return size
  expressions.  I have split the implementation into one feature per
  patch (calls, function parameters, PHI, etc.) to hopefully ease
  review.

Performance:
------------

Expressions generated by this pass in theory could be arbitrarily
complex.  I have not made an attempt to limit nesting of objects since
it seemed too early to do that.  In practice based on the few
applications I built, most of the complexity of the expressions got
folded away.  Even so, the performance overhead is likely to be
non-zero.  If we find performance degradation to be significant, we
could later add nesting limits to bail out if a size expression gets too
complex.

I have also not implemented simplification of __*_chk to their normal
variants if we can determine at compile time that it is safe, which
still depends on the object size to be constant.  I hope to do this as a
minor performance improvement in stage 3.

Build time performance doesn't seem to be affected much based on an
unscientific check to time
`make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`.  It only increases by
about a couple of seconds when the dynamic tests are added and remains
more or less in the same ballpark otherwise.

Testing:
--------

I have added tests for dynamic object sizes as well as wrappers for all
__builtin_object_size tests to provide wide coverage.  With that in
place I have run full bootstrap builds on Fedora rawhide by backporting the
patches to the gcc11 branch and x86_64 and i686 have no failures in any
of the builtin-*object-size* tests and no new failures.

I have also built bash, cmake, zfs-fuse and systemtap with
_FORTIFY_SOURCE=3 (with a hacked up glibc to make sure it works) and saw
no issues in any of those builds.  I did some rudimentary analysis of
the generated binaries to see if there was any difference in coverage
and found that there was.  In terms of pure numbers, there were far more
_chk variants of calls than the regular ones due to _FORTIFY_SOURCE
(from about 4% to 70% in bash), but that could well be due to the _chk
variants not being folded into regular variants when safe.  However, on
manual inspection of some of these sites, it was clear that coverage was
increasing significantly where __builtin_object_size was previously
bailing out.

Specifically for bash, the coverage went from 30.81% to 86.87% (it was
84.5% with the v1 patch).  I actually hope to reduce this a bit with
folding improvements for __builtin___memcpy_chk, etc.

A bootstrap build is in progress on x86_64.

Limitations/Future work:
------------------------

- The most immediate work is to fold _chk variants of builtins into
  regular ones when it can be proven at compile time that the object
  size will alwasy be less than the length of the write.  I am working
  on it right now.

- I need to enable _FORTIFY_SOURCE=3 for gcc in glibc; currently it is
  llvm-only.  I've started working on these patches too on the side.

- Instead of bailing out on non-constant sizes with
  __builtin_object_size, it should be possible to use ranger to
  get an upper and lower bound on the size expression and use that to
  implement __builtin_object_size.

- More work could to be done to reduce the performance impact of the
  computation.  One way could be to add a heuristic where the pass keeps
  track of nesting in the expression and either bail out or compute an
  estimate if nesting crosses a threshold.  I'll take this up once we
  have more data on the nature of the bottlenecks.


Siddhesh Poyarekar (10):
  tree-object-size: Replace magic numbers with enums
  tree-object-size: Abstract object_sizes array
  tree-object-size: Use tree instead of HOST_WIDE_INT
  tree-object-size: Single pass dependency loop resolution
  __builtin_dynamic_object_size: Recognize builtin
  tree-object-size: Support dynamic sizes in conditions
  tree-object-size: Handle function parameters
  tree-object-size: Handle GIMPLE_CALL
  tree-object-size: Dynamic sizes for ADDR_EXPR
  tree-object-size: Handle dynamic offsets

 gcc/builtins.c                                |   22 +-
 gcc/builtins.def                              |    1 +
 gcc/doc/extend.texi                           |   13 +
 gcc/gimple-fold.c                             |    9 +-
 .../g++.dg/ext/builtin-dynamic-object-size1.C |    5 +
 .../g++.dg/ext/builtin-dynamic-object-size2.C |    5 +
 .../gcc.dg/builtin-dynamic-alloc-size.c       |    7 +
 .../gcc.dg/builtin-dynamic-object-size-0.c    |  463 +++++
 .../gcc.dg/builtin-dynamic-object-size-1.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-10.c   |    9 +
 .../gcc.dg/builtin-dynamic-object-size-11.c   |    7 +
 .../gcc.dg/builtin-dynamic-object-size-12.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-13.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-14.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-15.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-16.c   |    7 +
 .../gcc.dg/builtin-dynamic-object-size-17.c   |    8 +
 .../gcc.dg/builtin-dynamic-object-size-18.c   |    8 +
 .../gcc.dg/builtin-dynamic-object-size-19.c   |  104 +
 .../gcc.dg/builtin-dynamic-object-size-2.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-3.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-4.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-5.c    |    8 +
 .../gcc.dg/builtin-dynamic-object-size-6.c    |    5 +
 .../gcc.dg/builtin-dynamic-object-size-7.c    |    5 +
 .../gcc.dg/builtin-dynamic-object-size-8.c    |    5 +
 .../gcc.dg/builtin-dynamic-object-size-9.c    |    5 +
 gcc/testsuite/gcc.dg/builtin-object-size-1.c  |  160 +-
 gcc/testsuite/gcc.dg/builtin-object-size-16.c |    2 +
 gcc/testsuite/gcc.dg/builtin-object-size-17.c |    2 +
 gcc/testsuite/gcc.dg/builtin-object-size-2.c  |  134 ++
 gcc/testsuite/gcc.dg/builtin-object-size-3.c  |  151 ++
 gcc/testsuite/gcc.dg/builtin-object-size-4.c  |   99 +
 gcc/testsuite/gcc.dg/builtin-object-size-5.c  |   12 +
 gcc/tree-object-size.c                        | 1766 +++++++++++------
 gcc/tree-object-size.h                        |    3 +-
 gcc/ubsan.c                                   |   46 +-
 37 files changed, 2499 insertions(+), 620 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size2.C
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-alloc-size.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-11.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-12.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-13.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-14.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-15.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-16.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-17.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-18.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-19.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-4.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-5.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-6.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-7.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-8.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-9.c

Comments

Jakub Jelinek Nov. 19, 2021, 3:56 p.m. UTC | #1
On Wed, Nov 10, 2021 at 12:31:26AM +0530, Siddhesh Poyarekar wrote:
> - Instead of bailing out on non-constant sizes with
>   __builtin_object_size, it should be possible to use ranger to
>   get an upper and lower bound on the size expression and use that to
>   implement __builtin_object_size.

I'd prefer not to.  One thing is that VRP heavily relies on UB not happening
in the program, while __bos is typically used to catch UB in those programs.
And, I'm afraid fairly often VRP would result in runtime tests for limits
that aren't useful for security at all.  Say VRP figures out that some
integer isn't negative or doesn't have MSB set etc., suddenly we have range
of [0, INT_MAX] or similar and making that imply __builtin_object_size
INT_MAX rather than ~(size_t) 0 would mean we need to use __*_chk and
compare at runtime, even when it is very unlikely an object would be that
big...
The compiler computes some range, but there is not information on how much
the range actually maps to the actual range the program is using, or when it
is some much larger superset of the actual range (same problem is with
Martin's warnings BTW).  Some unrelated inlined function can perform some
comparison just in case, perhaps some jump threading is done and all of sudden
there is non-VARYING range.

	Jakub
Andrew Pinski Sept. 8, 2024, 10:01 p.m. UTC | #2
On Tue, Nov 9, 2021 at 11:02 AM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>
>
> This patchset implements the __builtin_dynamic_object_size builtin for
> gcc.  The primary motivation to have this builtin in gcc is to enable
> _FORTIFY_SOURCE=3 support with gcc, thus allowing greater fortification
> in use cases where the potential performance tradeoff is acceptable.
>
> Semantics:
> ----------
>
> __builtin_dynamic_object_size has the same signature as
> __builtin_object_size; it accepts a pointer and type ranging from 0 to 3
> and it returns an object size estimate for the pointer based on an
> analysis of which objects the pointer could point to.  The actual
> properties of the object size estimate are different:
>
> - In the best case __builtin_dynamic_object_size evaluates to an
>   expression that represents a precise size of the object being pointed
>   to.
>
> - In case a precise object size expression cannot be evaluated,
>   __builtin_dynamic_object_size attempts to evaluate an estimate size
>   expression based on the object size type.
>
> - In what situations the builtin returns an estimate vs a precise
>   expression is an implementation detail and may change in future.
>   Users must always assume, as in the case of __builtin_object_size, that
>   the returned value is the maximum or minimum based on the object size
>   type they have provided.
>
> - In the worst case of failure, __builtin_dynamic_object_size returns a
>   constant (size_t)-1 or (size_t)0.
>
> Implementation:
> ---------------
>
> - The __builtin_dynamic_object_size support is implemented in
>   tree-object-size.  In most cases the first pass (early_objsz) is able
>   to evaluate object size expressions.  The second phase mainly ends up
>   simplifying the __builtin_dynamic_object_size to
>   __builtin_object_size.
>
> - The patchset begins with structural modification of the
>   tree-object-size pass, followed by enhancement to return size
>   expressions.  I have split the implementation into one feature per
>   patch (calls, function parameters, PHI, etc.) to hopefully ease
>   review.
>
> Performance:
> ------------
>
> Expressions generated by this pass in theory could be arbitrarily
> complex.  I have not made an attempt to limit nesting of objects since
> it seemed too early to do that.  In practice based on the few
> applications I built, most of the complexity of the expressions got
> folded away.  Even so, the performance overhead is likely to be
> non-zero.  If we find performance degradation to be significant, we
> could later add nesting limits to bail out if a size expression gets too
> complex.
>
> I have also not implemented simplification of __*_chk to their normal
> variants if we can determine at compile time that it is safe, which
> still depends on the object size to be constant.  I hope to do this as a
> minor performance improvement in stage 3.
>
> Build time performance doesn't seem to be affected much based on an
> unscientific check to time
> `make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`.  It only increases by
> about a couple of seconds when the dynamic tests are added and remains
> more or less in the same ballpark otherwise.
>
> Testing:
> --------
>
> I have added tests for dynamic object sizes as well as wrappers for all
> __builtin_object_size tests to provide wide coverage.  With that in
> place I have run full bootstrap builds on Fedora rawhide by backporting the
> patches to the gcc11 branch and x86_64 and i686 have no failures in any
> of the builtin-*object-size* tests and no new failures.
>
> I have also built bash, cmake, zfs-fuse and systemtap with
> _FORTIFY_SOURCE=3 (with a hacked up glibc to make sure it works) and saw
> no issues in any of those builds.  I did some rudimentary analysis of
> the generated binaries to see if there was any difference in coverage
> and found that there was.  In terms of pure numbers, there were far more
> _chk variants of calls than the regular ones due to _FORTIFY_SOURCE
> (from about 4% to 70% in bash), but that could well be due to the _chk
> variants not being folded into regular variants when safe.  However, on
> manual inspection of some of these sites, it was clear that coverage was
> increasing significantly where __builtin_object_size was previously
> bailing out.
>
> Specifically for bash, the coverage went from 30.81% to 86.87% (it was
> 84.5% with the v1 patch).  I actually hope to reduce this a bit with
> folding improvements for __builtin___memcpy_chk, etc.
>
> A bootstrap build is in progress on x86_64.
>
> Limitations/Future work:
> ------------------------
>
> - The most immediate work is to fold _chk variants of builtins into
>   regular ones when it can be proven at compile time that the object
>   size will alwasy be less than the length of the write.  I am working
>   on it right now.
>
> - I need to enable _FORTIFY_SOURCE=3 for gcc in glibc; currently it is
>   llvm-only.  I've started working on these patches too on the side.
>
> - Instead of bailing out on non-constant sizes with
>   __builtin_object_size, it should be possible to use ranger to
>   get an upper and lower bound on the size expression and use that to
>   implement __builtin_object_size.

When I was implementing improvements into phiopt I ran into case where
objsz would fail now because we get:
tmp = PHI <CST0, CST1>
ptr = ptr + tmp

where before the pointer plus was inside each branch instead. So my
question is there any progress on implementing objsz with ranger or
has that work been put off?
I filed https://gcc.gnu.org/PR116556 for this. Do you have any start
of patches for this so it maybe it could be taken to finish? If not
then I am going to try to implement it since it blocks my work on
phiopt.

Thanks,
Andrew Pinski


>
> - More work could to be done to reduce the performance impact of the
>   computation.  One way could be to add a heuristic where the pass keeps
>   track of nesting in the expression and either bail out or compute an
>   estimate if nesting crosses a threshold.  I'll take this up once we
>   have more data on the nature of the bottlenecks.
>
>
> Siddhesh Poyarekar (10):
>   tree-object-size: Replace magic numbers with enums
>   tree-object-size: Abstract object_sizes array
>   tree-object-size: Use tree instead of HOST_WIDE_INT
>   tree-object-size: Single pass dependency loop resolution
>   __builtin_dynamic_object_size: Recognize builtin
>   tree-object-size: Support dynamic sizes in conditions
>   tree-object-size: Handle function parameters
>   tree-object-size: Handle GIMPLE_CALL
>   tree-object-size: Dynamic sizes for ADDR_EXPR
>   tree-object-size: Handle dynamic offsets
>
>  gcc/builtins.c                                |   22 +-
>  gcc/builtins.def                              |    1 +
>  gcc/doc/extend.texi                           |   13 +
>  gcc/gimple-fold.c                             |    9 +-
>  .../g++.dg/ext/builtin-dynamic-object-size1.C |    5 +
>  .../g++.dg/ext/builtin-dynamic-object-size2.C |    5 +
>  .../gcc.dg/builtin-dynamic-alloc-size.c       |    7 +
>  .../gcc.dg/builtin-dynamic-object-size-0.c    |  463 +++++
>  .../gcc.dg/builtin-dynamic-object-size-1.c    |    7 +
>  .../gcc.dg/builtin-dynamic-object-size-10.c   |    9 +
>  .../gcc.dg/builtin-dynamic-object-size-11.c   |    7 +
>  .../gcc.dg/builtin-dynamic-object-size-12.c   |    5 +
>  .../gcc.dg/builtin-dynamic-object-size-13.c   |    5 +
>  .../gcc.dg/builtin-dynamic-object-size-14.c   |    5 +
>  .../gcc.dg/builtin-dynamic-object-size-15.c   |    5 +
>  .../gcc.dg/builtin-dynamic-object-size-16.c   |    7 +
>  .../gcc.dg/builtin-dynamic-object-size-17.c   |    8 +
>  .../gcc.dg/builtin-dynamic-object-size-18.c   |    8 +
>  .../gcc.dg/builtin-dynamic-object-size-19.c   |  104 +
>  .../gcc.dg/builtin-dynamic-object-size-2.c    |    7 +
>  .../gcc.dg/builtin-dynamic-object-size-3.c    |    7 +
>  .../gcc.dg/builtin-dynamic-object-size-4.c    |    7 +
>  .../gcc.dg/builtin-dynamic-object-size-5.c    |    8 +
>  .../gcc.dg/builtin-dynamic-object-size-6.c    |    5 +
>  .../gcc.dg/builtin-dynamic-object-size-7.c    |    5 +
>  .../gcc.dg/builtin-dynamic-object-size-8.c    |    5 +
>  .../gcc.dg/builtin-dynamic-object-size-9.c    |    5 +
>  gcc/testsuite/gcc.dg/builtin-object-size-1.c  |  160 +-
>  gcc/testsuite/gcc.dg/builtin-object-size-16.c |    2 +
>  gcc/testsuite/gcc.dg/builtin-object-size-17.c |    2 +
>  gcc/testsuite/gcc.dg/builtin-object-size-2.c  |  134 ++
>  gcc/testsuite/gcc.dg/builtin-object-size-3.c  |  151 ++
>  gcc/testsuite/gcc.dg/builtin-object-size-4.c  |   99 +
>  gcc/testsuite/gcc.dg/builtin-object-size-5.c  |   12 +
>  gcc/tree-object-size.c                        | 1766 +++++++++++------
>  gcc/tree-object-size.h                        |    3 +-
>  gcc/ubsan.c                                   |   46 +-
>  37 files changed, 2499 insertions(+), 620 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size1.C
>  create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size2.C
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-alloc-size.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-11.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-12.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-13.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-14.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-15.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-16.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-17.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-18.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-19.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-6.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-7.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-8.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-9.c
>
> --
> 2.31.1
>
Siddhesh Poyarekar Sept. 8, 2024, 11:43 p.m. UTC | #3
On 2024-09-08 18:01, Andrew Pinski wrote:
> When I was implementing improvements into phiopt I ran into case where
> objsz would fail now because we get:
> tmp = PHI <CST0, CST1>
> ptr = ptr + tmp
> 
> where before the pointer plus was inside each branch instead. So my
> question is there any progress on implementing objsz with ranger or
> has that work been put off?
> I filed https://gcc.gnu.org/PR116556 for this. Do you have any start

Sorry I noticed it but couldn't get time to work on it.  __bos should 
trivially handle this  I think, I'm surprised that it doesn't.

As far as using ranger in __bos is concerned, I believe there was strong 
opposition to the idea and instead to continue evaluating sizes from 
first principles like we currently do in the tree-object-size pass.  I 
think the concern was that ranger can be more aggressive in evaluation 
and assume no undefined behaviour whereas __bos tries to detect 
undefined behaviour whenever it can and is more conservative about 
computations.

> of patches for this so it maybe it could be taken to finish? If not
> then I am going to try to implement it since it blocks my work on
> phiopt.

I'll be happy to review any changes you make to tree-object-size to fix 
this.

Thanks,
Sid