mbox series

[V3,0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

Message ID 20230825152425.2417656-1-qing.zhao@oracle.com
Headers show
Series New attribute "counted_by" to annotate bounds for C99 FAM(PR108896) | expand

Message

Qing Zhao Aug. 25, 2023, 3:24 p.m. UTC
This is the 3rd version of the patch, per our discussion based on the
review comments for the 1st and 2nd version, the major changes in this
version are:

***Against 1st version:
1. change the name "element_count" to "counted_by";
2. change the parameter for the attribute from a STRING to an
Identifier;
3. Add logic and testing cases to handle anonymous structure/unions;
4. Clarify documentation to permit the situation when the allocation
size is larger than what's specified by "counted_by", at the same time,
it's user's error if allocation size is smaller than what's specified by
"counted_by";
5. Add a complete testing case for using counted_by attribute in
__builtin_dynamic_object_size when there is mismatch between the
allocation size and the value of "counted_by", the expecting behavior
for each case and the explanation on why in the comments. 

***Against 2rd version:
1. Identify a tree node sharing issue and fixed it in the routine
   "component_ref_get_counted_ty" of tree.cc;
2. Update the documentation and testing cases with the clear usage
   of the fomula to compute the allocation size:
MAX (sizeof (struct A), offsetof (struct A, array[0]) + counted_by * sizeof(element))
   (the algorithm used in tree-object-size.cc is correct).

In this set of patches, the major functionality provided is:

1. a new attribute "counted_by";
2. use this new attribute in bound sanitizer;
3. use this new attribute in dynamic object size for subobject size;

As discussed, I plan to add two more separate patches sets after this initial
patch set is approved and committed.

set 1. A new warning option and a new sanitizer option for the user error
      when the allocation size is smaller than the value of "counted_by".
set 2. An improvement to __builtin_dynamic_object_size  for whole-object
      size of the structure with FAM annaoted with counted_by. 

there are also some existing bugs in tree-object-size.cc identified
during the study, and PRs were filed to record them. these bugs will 
be fixed seperately with individual patches:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111030
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111040

Bootstrapped and regression tested on both aarch64 and X86, no issue.

Please see more details on the description of this work on:

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619708.html

and more discussions on
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626376.html

Okay for committing?

thanks.

Qing

Qing Zhao (3):
  Provide counted_by attribute to flexible array member field (PR108896)
  Use the counted_by atribute info in builtin object size [PR108896]
  Use the counted_by attribute information in bound sanitizer[PR108896]

 gcc/c-family/c-attribs.cc                     |  54 ++++-
 gcc/c-family/c-common.cc                      |  13 ++
 gcc/c-family/c-common.h                       |   1 +
 gcc/c-family/c-ubsan.cc                       |  16 ++
 gcc/c/c-decl.cc                               |  79 +++++--
 gcc/doc/extend.texi                           |  77 +++++++
 .../gcc.dg/flex-array-counted-by-2.c          |  74 ++++++
 .../gcc.dg/flex-array-counted-by-3.c          | 210 ++++++++++++++++++
 gcc/testsuite/gcc.dg/flex-array-counted-by.c  |  40 ++++
 .../ubsan/flex-array-counted-by-bounds-2.c    |  27 +++
 .../ubsan/flex-array-counted-by-bounds.c      |  46 ++++
 gcc/tree-object-size.cc                       |  37 ++-
 gcc/tree.cc                                   | 133 +++++++++++
 gcc/tree.h                                    |  15 ++
 14 files changed, 797 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

Comments

Kees Cook Aug. 25, 2023, 7:51 p.m. UTC | #1
On Fri, Aug 25, 2023 at 03:24:22PM +0000, Qing Zhao wrote:
> This is the 3rd version of the patch, per our discussion based on the
> review comments for the 1st and 2nd version, the major changes in this

This tests out great for me; thanks you! I'm able to build the entire
kernel tree with 201 annotations[1] added. Things work as expected. :)

-Kees

[1] https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=devel/next-20230825/counted_by
Qing Zhao Sept. 8, 2023, 2:11 p.m. UTC | #2
Ping.  

Thanks.

Qing

> On Aug 25, 2023, at 11:24 AM, Qing Zhao <qing.zhao@oracle.com> wrote:
> 
> This is the 3rd version of the patch, per our discussion based on the
> review comments for the 1st and 2nd version, the major changes in this
> version are:
> 
> ***Against 1st version:
> 1. change the name "element_count" to "counted_by";
> 2. change the parameter for the attribute from a STRING to an
> Identifier;
> 3. Add logic and testing cases to handle anonymous structure/unions;
> 4. Clarify documentation to permit the situation when the allocation
> size is larger than what's specified by "counted_by", at the same time,
> it's user's error if allocation size is smaller than what's specified by
> "counted_by";
> 5. Add a complete testing case for using counted_by attribute in
> __builtin_dynamic_object_size when there is mismatch between the
> allocation size and the value of "counted_by", the expecting behavior
> for each case and the explanation on why in the comments. 
> 
> ***Against 2rd version:
> 1. Identify a tree node sharing issue and fixed it in the routine
>   "component_ref_get_counted_ty" of tree.cc;
> 2. Update the documentation and testing cases with the clear usage
>   of the fomula to compute the allocation size:
> MAX (sizeof (struct A), offsetof (struct A, array[0]) + counted_by * sizeof(element))
>   (the algorithm used in tree-object-size.cc is correct).
> 
> In this set of patches, the major functionality provided is:
> 
> 1. a new attribute "counted_by";
> 2. use this new attribute in bound sanitizer;
> 3. use this new attribute in dynamic object size for subobject size;
> 
> As discussed, I plan to add two more separate patches sets after this initial
> patch set is approved and committed.
> 
> set 1. A new warning option and a new sanitizer option for the user error
>      when the allocation size is smaller than the value of "counted_by".
> set 2. An improvement to __builtin_dynamic_object_size  for whole-object
>      size of the structure with FAM annaoted with counted_by. 
> 
> there are also some existing bugs in tree-object-size.cc identified
> during the study, and PRs were filed to record them. these bugs will 
> be fixed seperately with individual patches:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111030
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111040
> 
> Bootstrapped and regression tested on both aarch64 and X86, no issue.
> 
> Please see more details on the description of this work on:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619708.html
> 
> and more discussions on
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626376.html
> 
> Okay for committing?
> 
> thanks.
> 
> Qing
> 
> Qing Zhao (3):
>  Provide counted_by attribute to flexible array member field (PR108896)
>  Use the counted_by atribute info in builtin object size [PR108896]
>  Use the counted_by attribute information in bound sanitizer[PR108896]
> 
> gcc/c-family/c-attribs.cc                     |  54 ++++-
> gcc/c-family/c-common.cc                      |  13 ++
> gcc/c-family/c-common.h                       |   1 +
> gcc/c-family/c-ubsan.cc                       |  16 ++
> gcc/c/c-decl.cc                               |  79 +++++--
> gcc/doc/extend.texi                           |  77 +++++++
> .../gcc.dg/flex-array-counted-by-2.c          |  74 ++++++
> .../gcc.dg/flex-array-counted-by-3.c          | 210 ++++++++++++++++++
> gcc/testsuite/gcc.dg/flex-array-counted-by.c  |  40 ++++
> .../ubsan/flex-array-counted-by-bounds-2.c    |  27 +++
> .../ubsan/flex-array-counted-by-bounds.c      |  46 ++++
> gcc/tree-object-size.cc                       |  37 ++-
> gcc/tree.cc                                   | 133 +++++++++++
> gcc/tree.h                                    |  15 ++
> 14 files changed, 797 insertions(+), 25 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c
> create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
> create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c
> 
> -- 
> 2.31.1
>
Qing Zhao Sept. 20, 2023, 1:43 p.m. UTC | #3
Hi,

I’d like to ping this patch set one more time!

Thanks.

Qing

> On Aug 25, 2023, at 11:24 AM, Qing Zhao <qing.zhao@oracle.com> wrote:
> 
> This is the 3rd version of the patch, per our discussion based on the
> review comments for the 1st and 2nd version, the major changes in this
> version are:
> 
> ***Against 1st version:
> 1. change the name "element_count" to "counted_by";
> 2. change the parameter for the attribute from a STRING to an
> Identifier;
> 3. Add logic and testing cases to handle anonymous structure/unions;
> 4. Clarify documentation to permit the situation when the allocation
> size is larger than what's specified by "counted_by", at the same time,
> it's user's error if allocation size is smaller than what's specified by
> "counted_by";
> 5. Add a complete testing case for using counted_by attribute in
> __builtin_dynamic_object_size when there is mismatch between the
> allocation size and the value of "counted_by", the expecting behavior
> for each case and the explanation on why in the comments. 
> 
> ***Against 2rd version:
> 1. Identify a tree node sharing issue and fixed it in the routine
>   "component_ref_get_counted_ty" of tree.cc;
> 2. Update the documentation and testing cases with the clear usage
>   of the fomula to compute the allocation size:
> MAX (sizeof (struct A), offsetof (struct A, array[0]) + counted_by * sizeof(element))
>   (the algorithm used in tree-object-size.cc is correct).
> 
> In this set of patches, the major functionality provided is:
> 
> 1. a new attribute "counted_by";
> 2. use this new attribute in bound sanitizer;
> 3. use this new attribute in dynamic object size for subobject size;
> 
> As discussed, I plan to add two more separate patches sets after this initial
> patch set is approved and committed.
> 
> set 1. A new warning option and a new sanitizer option for the user error
>      when the allocation size is smaller than the value of "counted_by".
> set 2. An improvement to __builtin_dynamic_object_size  for whole-object
>      size of the structure with FAM annaoted with counted_by. 
> 
> there are also some existing bugs in tree-object-size.cc identified
> during the study, and PRs were filed to record them. these bugs will 
> be fixed seperately with individual patches:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111030
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111040
> 
> Bootstrapped and regression tested on both aarch64 and X86, no issue.
> 
> Please see more details on the description of this work on:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619708.html
> 
> and more discussions on
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626376.html
> 
> Okay for committing?
> 
> thanks.
> 
> Qing
> 
> Qing Zhao (3):
>  Provide counted_by attribute to flexible array member field (PR108896)
>  Use the counted_by atribute info in builtin object size [PR108896]
>  Use the counted_by attribute information in bound sanitizer[PR108896]
> 
> gcc/c-family/c-attribs.cc                     |  54 ++++-
> gcc/c-family/c-common.cc                      |  13 ++
> gcc/c-family/c-common.h                       |   1 +
> gcc/c-family/c-ubsan.cc                       |  16 ++
> gcc/c/c-decl.cc                               |  79 +++++--
> gcc/doc/extend.texi                           |  77 +++++++
> .../gcc.dg/flex-array-counted-by-2.c          |  74 ++++++
> .../gcc.dg/flex-array-counted-by-3.c          | 210 ++++++++++++++++++
> gcc/testsuite/gcc.dg/flex-array-counted-by.c  |  40 ++++
> .../ubsan/flex-array-counted-by-bounds-2.c    |  27 +++
> .../ubsan/flex-array-counted-by-bounds.c      |  46 ++++
> gcc/tree-object-size.cc                       |  37 ++-
> gcc/tree.cc                                   | 133 +++++++++++
> gcc/tree.h                                    |  15 ++
> 14 files changed, 797 insertions(+), 25 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c
> create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
> create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c
> 
> -- 
> 2.31.1
>
Siddhesh Poyarekar Oct. 5, 2023, 8:08 p.m. UTC | #4
On 2023-08-25 11:24, Qing Zhao wrote:
> This is the 3rd version of the patch, per our discussion based on the
> review comments for the 1st and 2nd version, the major changes in this
> version are:

Hi Qing,

I hope the review was helpful.  Overall, a couple of things to consider:

1. How would you handle potential reordering between assignment of the 
size to the counted_by field with the __bdos call that may consume it? 
You'll probably need to express some kind of dependency there or in the 
worst case, insert a barrier to disallow reordering.

2. How would you handle signedness of the size field?  The size gets 
converted to sizetype everywhere it is used and overflows/underflows may 
produce interesting results.  Do you want to limit the types to unsigned 
or do you want to add a disclaimer in the docs?  The former seems like 
the *right* thing to do given that it is a new feature; best to enforce 
the cleaner habit at the outset.

Thanks,
Sid

> 
> ***Against 1st version:
> 1. change the name "element_count" to "counted_by";
> 2. change the parameter for the attribute from a STRING to an
> Identifier;
> 3. Add logic and testing cases to handle anonymous structure/unions;
> 4. Clarify documentation to permit the situation when the allocation
> size is larger than what's specified by "counted_by", at the same time,
> it's user's error if allocation size is smaller than what's specified by
> "counted_by";
> 5. Add a complete testing case for using counted_by attribute in
> __builtin_dynamic_object_size when there is mismatch between the
> allocation size and the value of "counted_by", the expecting behavior
> for each case and the explanation on why in the comments.
> 
> ***Against 2rd version:
> 1. Identify a tree node sharing issue and fixed it in the routine
>     "component_ref_get_counted_ty" of tree.cc;
> 2. Update the documentation and testing cases with the clear usage
>     of the fomula to compute the allocation size:
> MAX (sizeof (struct A), offsetof (struct A, array[0]) + counted_by * sizeof(element))
>     (the algorithm used in tree-object-size.cc is correct).
> 
> In this set of patches, the major functionality provided is:
> 
> 1. a new attribute "counted_by";
> 2. use this new attribute in bound sanitizer;
> 3. use this new attribute in dynamic object size for subobject size;
> 
> As discussed, I plan to add two more separate patches sets after this initial
> patch set is approved and committed.
> 
> set 1. A new warning option and a new sanitizer option for the user error
>        when the allocation size is smaller than the value of "counted_by".
> set 2. An improvement to __builtin_dynamic_object_size  for whole-object
>        size of the structure with FAM annaoted with counted_by.
> 
> there are also some existing bugs in tree-object-size.cc identified
> during the study, and PRs were filed to record them. these bugs will
> be fixed seperately with individual patches:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111030
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111040
> 
> Bootstrapped and regression tested on both aarch64 and X86, no issue.
> 
> Please see more details on the description of this work on:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619708.html
> 
> and more discussions on
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626376.html
> 
> Okay for committing?
> 
> thanks.
> 
> Qing
> 
> Qing Zhao (3):
>    Provide counted_by attribute to flexible array member field (PR108896)
>    Use the counted_by atribute info in builtin object size [PR108896]
>    Use the counted_by attribute information in bound sanitizer[PR108896]
> 
>   gcc/c-family/c-attribs.cc                     |  54 ++++-
>   gcc/c-family/c-common.cc                      |  13 ++
>   gcc/c-family/c-common.h                       |   1 +
>   gcc/c-family/c-ubsan.cc                       |  16 ++
>   gcc/c/c-decl.cc                               |  79 +++++--
>   gcc/doc/extend.texi                           |  77 +++++++
>   .../gcc.dg/flex-array-counted-by-2.c          |  74 ++++++
>   .../gcc.dg/flex-array-counted-by-3.c          | 210 ++++++++++++++++++
>   gcc/testsuite/gcc.dg/flex-array-counted-by.c  |  40 ++++
>   .../ubsan/flex-array-counted-by-bounds-2.c    |  27 +++
>   .../ubsan/flex-array-counted-by-bounds.c      |  46 ++++
>   gcc/tree-object-size.cc                       |  37 ++-
>   gcc/tree.cc                                   | 133 +++++++++++
>   gcc/tree.h                                    |  15 ++
>   14 files changed, 797 insertions(+), 25 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c
>   create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c
>
Kees Cook Oct. 5, 2023, 10:35 p.m. UTC | #5
On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote:
> 2. How would you handle signedness of the size field?  The size gets
> converted to sizetype everywhere it is used and overflows/underflows may
> produce interesting results.  Do you want to limit the types to unsigned or
> do you want to add a disclaimer in the docs?  The former seems like the
> *right* thing to do given that it is a new feature; best to enforce the
> cleaner habit at the outset.

The Linux kernel has a lot of "int" counters, so the goal is to catch
negative offsets just like too-large offsets at runtime with the sanitizer
and report 0 for __bdos. Refactoring all these to be unsigned is going
to take time since at least some of them use the negative values as
special values unrelated to array indexing. :(

So, perhaps if unsigned counters are worth enforcing, can this be a
separate warning the kernel can turn off initially?

-Kees
Martin Uecker Oct. 6, 2023, 5:11 a.m. UTC | #6
Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook:
> On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote:
> > 2. How would you handle signedness of the size field?  The size gets
> > converted to sizetype everywhere it is used and overflows/underflows may
> > produce interesting results.  Do you want to limit the types to unsigned or
> > do you want to add a disclaimer in the docs?  The former seems like the
> > *right* thing to do given that it is a new feature; best to enforce the
> > cleaner habit at the outset.
> 
> The Linux kernel has a lot of "int" counters, so the goal is to catch
> negative offsets just like too-large offsets at runtime with the sanitizer
> and report 0 for __bdos. Refactoring all these to be unsigned is going
> to take time since at least some of them use the negative values as
> special values unrelated to array indexing. :(
> 
> So, perhaps if unsigned counters are worth enforcing, can this be a
> separate warning the kernel can turn off initially?
> 

I think unsigned counters are much more problematic than signed ones
because wraparound errors are more difficult to find.

With unsigned you could potentially diagnose wraparound, but only if we
add -fsanitize=unsigned-overflow *and* add mechanism to mark intentional
wraparound *and* everybody adds this annotation after carefully screening
their code *and* rewriting all operations such as (counter - 3) + 5
where the wraparound in the intermediate expression is harmless.

For this reason, I do not think we should ever enforce some rule that
the counter has to be unsigned.

What we could do, is detect *storing* negative values into the
counter at run-time using UBSan. (but if negative values are
used for special cases, one also should be able to turn this
off).

Martin
Siddhesh Poyarekar Oct. 6, 2023, 10:50 a.m. UTC | #7
On 2023-10-06 01:11, Martin Uecker wrote:
> Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook:
>> On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote:
>>> 2. How would you handle signedness of the size field?  The size gets
>>> converted to sizetype everywhere it is used and overflows/underflows may
>>> produce interesting results.  Do you want to limit the types to unsigned or
>>> do you want to add a disclaimer in the docs?  The former seems like the
>>> *right* thing to do given that it is a new feature; best to enforce the
>>> cleaner habit at the outset.
>>
>> The Linux kernel has a lot of "int" counters, so the goal is to catch
>> negative offsets just like too-large offsets at runtime with the sanitizer
>> and report 0 for __bdos. Refactoring all these to be unsigned is going
>> to take time since at least some of them use the negative values as
>> special values unrelated to array indexing. :(
>>
>> So, perhaps if unsigned counters are worth enforcing, can this be a
>> separate warning the kernel can turn off initially?
>>
> 
> I think unsigned counters are much more problematic than signed ones
> because wraparound errors are more difficult to find.
> 
> With unsigned you could potentially diagnose wraparound, but only if we
> add -fsanitize=unsigned-overflow *and* add mechanism to mark intentional
> wraparound *and* everybody adds this annotation after carefully screening
> their code *and* rewriting all operations such as (counter - 3) + 5
> where the wraparound in the intermediate expression is harmless.
> 
> For this reason, I do not think we should ever enforce some rule that
> the counter has to be unsigned.
> 
> What we could do, is detect *storing* negative values into the
> counter at run-time using UBSan. (but if negative values are
> used for special cases, one also should be able to turn this
> off).

All of the object size detection relies on object sizes being sizetype. 
The closest we could do with that is detect (sz != SIZE_MAX && sz > 
size_t / 2), since allocators typically cannot allocate more than 
SIZE_MAX / 2.

Sid
Martin Uecker Oct. 6, 2023, 8:01 p.m. UTC | #8
Am Freitag, dem 06.10.2023 um 06:50 -0400 schrieb Siddhesh Poyarekar:
> On 2023-10-06 01:11, Martin Uecker wrote:
> > Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook:
> > > On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote:
> > > > 2. How would you handle signedness of the size field?  The size gets
> > > > converted to sizetype everywhere it is used and overflows/underflows may
> > > > produce interesting results.  Do you want to limit the types to unsigned or
> > > > do you want to add a disclaimer in the docs?  The former seems like the
> > > > *right* thing to do given that it is a new feature; best to enforce the
> > > > cleaner habit at the outset.
> > > 
> > > The Linux kernel has a lot of "int" counters, so the goal is to catch
> > > negative offsets just like too-large offsets at runtime with the sanitizer
> > > and report 0 for __bdos. Refactoring all these to be unsigned is going
> > > to take time since at least some of them use the negative values as
> > > special values unrelated to array indexing. :(
> > > 
> > > So, perhaps if unsigned counters are worth enforcing, can this be a
> > > separate warning the kernel can turn off initially?
> > > 
> > 
> > I think unsigned counters are much more problematic than signed ones
> > because wraparound errors are more difficult to find.
> > 
> > With unsigned you could potentially diagnose wraparound, but only if we
> > add -fsanitize=unsigned-overflow *and* add mechanism to mark intentional
> > wraparound *and* everybody adds this annotation after carefully screening
> > their code *and* rewriting all operations such as (counter - 3) + 5
> > where the wraparound in the intermediate expression is harmless.
> > 
> > For this reason, I do not think we should ever enforce some rule that
> > the counter has to be unsigned.
> > 
> > What we could do, is detect *storing* negative values into the
> > counter at run-time using UBSan. (but if negative values are
> > used for special cases, one also should be able to turn this
> > off).
> 
> All of the object size detection relies on object sizes being sizetype. 
> The closest we could do with that is detect (sz != SIZE_MAX && sz > 
> size_t / 2), since allocators typically cannot allocate more than 
> SIZE_MAX / 2.

I was talking about the counter in:

struct {
  int counter;
  char buf[] __counted_by__((counter))
};

which could be checked to be positive either when stored to or 
when buf is used.

And yes, we could also check the size of buf.  Not sure what is
done for VLAs now, but I guess it could be similar.

Best,
Martin



> 
> Sid
Siddhesh Poyarekar Oct. 18, 2023, 3:37 p.m. UTC | #9
[Sorry, I forgot to respond to this]

On 2023-10-06 16:01, Martin Uecker wrote:
> Am Freitag, dem 06.10.2023 um 06:50 -0400 schrieb Siddhesh Poyarekar:
>> On 2023-10-06 01:11, Martin Uecker wrote:
>>> Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook:
>>>> On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote:
>>>>> 2. How would you handle signedness of the size field?  The size gets
>>>>> converted to sizetype everywhere it is used and overflows/underflows may
>>>>> produce interesting results.  Do you want to limit the types to unsigned or
>>>>> do you want to add a disclaimer in the docs?  The former seems like the
>>>>> *right* thing to do given that it is a new feature; best to enforce the
>>>>> cleaner habit at the outset.
>>>>
>>>> The Linux kernel has a lot of "int" counters, so the goal is to catch
>>>> negative offsets just like too-large offsets at runtime with the sanitizer
>>>> and report 0 for __bdos. Refactoring all these to be unsigned is going
>>>> to take time since at least some of them use the negative values as
>>>> special values unrelated to array indexing. :(
>>>>
>>>> So, perhaps if unsigned counters are worth enforcing, can this be a
>>>> separate warning the kernel can turn off initially?
>>>>
>>>
>>> I think unsigned counters are much more problematic than signed ones
>>> because wraparound errors are more difficult to find.
>>>
>>> With unsigned you could potentially diagnose wraparound, but only if we
>>> add -fsanitize=unsigned-overflow *and* add mechanism to mark intentional
>>> wraparound *and* everybody adds this annotation after carefully screening
>>> their code *and* rewriting all operations such as (counter - 3) + 5
>>> where the wraparound in the intermediate expression is harmless.
>>>
>>> For this reason, I do not think we should ever enforce some rule that
>>> the counter has to be unsigned.
>>>
>>> What we could do, is detect *storing* negative values into the
>>> counter at run-time using UBSan. (but if negative values are
>>> used for special cases, one also should be able to turn this
>>> off).
>>
>> All of the object size detection relies on object sizes being sizetype.
>> The closest we could do with that is detect (sz != SIZE_MAX && sz >
>> size_t / 2), since allocators typically cannot allocate more than
>> SIZE_MAX / 2.
> 
> I was talking about the counter in:
> 
> struct {
>    int counter;
>    char buf[] __counted_by__((counter))
> };
> 
> which could be checked to be positive either when stored to or
> when buf is used.
> 
> And yes, we could also check the size of buf.  Not sure what is
> done for VLAs now, but I guess it could be similar.

Right now all object sizes are cast to sizetype and the generated 
dynamic expressions are such that overflows will result in the computed 
object size being zero.  Non-generated expressions (like we could get 
with __counted_by__) will simply be cast; there's probably scope for 
improvement here, where we wrap that with an expression that returns 0 
if the size exceeds SIZE_MAX / 2 since that's typically the limit for 
allocators.  We use that heuristic elsewhere in the __bos/__bdos logic too.

Thanks,
Sid
Qing Zhao Oct. 18, 2023, 7:35 p.m. UTC | #10
> On Oct 6, 2023, at 4:01 PM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Freitag, dem 06.10.2023 um 06:50 -0400 schrieb Siddhesh Poyarekar:
>> On 2023-10-06 01:11, Martin Uecker wrote:
>>> Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook:
>>>> On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote:
>>>>> 2. How would you handle signedness of the size field?  The size gets
>>>>> converted to sizetype everywhere it is used and overflows/underflows may
>>>>> produce interesting results.  Do you want to limit the types to unsigned or
>>>>> do you want to add a disclaimer in the docs?  The former seems like the
>>>>> *right* thing to do given that it is a new feature; best to enforce the
>>>>> cleaner habit at the outset.
>>>> 
>>>> The Linux kernel has a lot of "int" counters, so the goal is to catch
>>>> negative offsets just like too-large offsets at runtime with the sanitizer
>>>> and report 0 for __bdos. Refactoring all these to be unsigned is going
>>>> to take time since at least some of them use the negative values as
>>>> special values unrelated to array indexing. :(
>>>> 
>>>> So, perhaps if unsigned counters are worth enforcing, can this be a
>>>> separate warning the kernel can turn off initially?
>>>> 
>>> 
>>> I think unsigned counters are much more problematic than signed ones
>>> because wraparound errors are more difficult to find.
>>> 
>>> With unsigned you could potentially diagnose wraparound, but only if we
>>> add -fsanitize=unsigned-overflow *and* add mechanism to mark intentional
>>> wraparound *and* everybody adds this annotation after carefully screening
>>> their code *and* rewriting all operations such as (counter - 3) + 5
>>> where the wraparound in the intermediate expression is harmless.
>>> 
>>> For this reason, I do not think we should ever enforce some rule that
>>> the counter has to be unsigned.
>>> 
>>> What we could do, is detect *storing* negative values into the
>>> counter at run-time using UBSan. (but if negative values are
>>> used for special cases, one also should be able to turn this
>>> off).
>> 
>> All of the object size detection relies on object sizes being sizetype. 
>> The closest we could do with that is detect (sz != SIZE_MAX && sz > 
>> size_t / 2), since allocators typically cannot allocate more than 
>> SIZE_MAX / 2.
> 
> I was talking about the counter in:
> 
> struct {
>  int counter;
>  char buf[] __counted_by__((counter))
> };
> 
> which could be checked to be positive either when stored to or 
> when buf is used.
> 
> And yes, we could also check the size of buf.  Not sure what is
> done for VLAs now, but I guess it could be similar.
> 
For VLAs, the bounds expression could be both signed or unsigned. 
But we have added a sanitizer option -fsanitize=vla-bound to catch the cases when the size of the VLA is not positive.

For example:

opc@qinzhao-ol8u3-x86 Martin]$ cat t3.c
#include <stdio.h>
size_t foo(int m)
{
  char t[m];

  return sizeof(t);
}

int main()
{
  printf ("the sizeof flexm is %lu \n", foo(-100000000));
  return 0;
}
[opc@qinzhao-ol8u3-x86 Martin]$ sh t
/home/opc/Install/latest-d/bin/gcc -fsanitize=undefined -O2 -Wall -Wpedantic t3.c
t3.c:4:8: runtime error: variable length array bound evaluates to non-positive value -100000000
the sizeof flexm is 18446744073609551616 


We can do the same thing for “counted_by”. i.e:

1. No specification for signed or unsigned for counted_by field.
2. Add an sanitizer option -fsanitize=counted-by-bound to catch the cases when the size of the counted-by is not positive.

Is this good enough?

Qing
> Best,
> Martin
> 
> 
> 
>> 
>> Sid
Qing Zhao Oct. 18, 2023, 9:11 p.m. UTC | #11
> On Oct 5, 2023, at 4:08 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> 
> On 2023-08-25 11:24, Qing Zhao wrote:
>> This is the 3rd version of the patch, per our discussion based on the
>> review comments for the 1st and 2nd version, the major changes in this
>> version are:
> 
> Hi Qing,
> 
> I hope the review was helpful.  Overall, a couple of things to consider:
> 
> 1. How would you handle potential reordering between assignment of the size to the counted_by field with the __bdos call that may consume it? You'll probably need to express some kind of dependency there or in the worst case, insert a barrier to disallow reordering.

Good point! 

So, your example in the respond to [V3][PATCH 2/3]Use the counted_by atribute info in builtin object size [PR108896]:
“
Maybe another test where the allocation, size assignment and __bdos call happen in the same function, where the allocator is not recognized by gcc:

void *
__attribute__ ((noinline))
alloc (size_t sz)
{
 return __builtin_malloc (sz);
}

void test (size_t sz)
{
 array_annotated = alloc (sz);
 array_annotated->b = sz;
 return __builtin_dynamic_object_size (array_annotated->c, 1);
}

The interesting thing to test (and ensure in the codegen) is that the assignment to array_annotated->b does not get reordered to below the __builtin_dynamic_object_size call since technically there is no data dependency between the two.
“
Will test on this. 

Not sure whether the current GCC alias analysis is able to distinguish one field of a structure from another field of the same structure, if YES, then
We need to add an explicit dependency edge from the write to “array_annotated->b” to the call to “__builtin_dynamic_object_size(array_annotated->c,1)”.
I will check on this and see how to resolve this issue.

I guess the possible solution is that we can add an implicit ref to “array_annotated->b” at the call to “__builtin_dynamic_object_size(array_annotated->c, 1)” if the counted_by attribute is available. That should resolve the issue.

Richard, what do you think on this?

> 
> 2. How would you handle signedness of the size field?  The size gets converted to sizetype everywhere it is used and overflows/underflows may produce interesting results.  Do you want to limit the types to unsigned or do you want to add a disclaimer in the docs?  The former seems like the *right* thing to do given that it is a new feature; best to enforce the cleaner habit at the outset.

As I replied to Martin in another email, I plan to do the following to resolve this issue:

1. No specification for signed or unsigned for counted_by field.
2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases when the size of the counted-by is not positive.

Then, we will be consistent with the handling of VLA. 

So, I will not change anything for the current patch.
However, I will add the sanitizer option in a followup patch set.

Let me know your opinion.

thanks.

Qing

> 
> Thanks,
> Sid
> 
>> ***Against 1st version:
>> 1. change the name "element_count" to "counted_by";
>> 2. change the parameter for the attribute from a STRING to an
>> Identifier;
>> 3. Add logic and testing cases to handle anonymous structure/unions;
>> 4. Clarify documentation to permit the situation when the allocation
>> size is larger than what's specified by "counted_by", at the same time,
>> it's user's error if allocation size is smaller than what's specified by
>> "counted_by";
>> 5. Add a complete testing case for using counted_by attribute in
>> __builtin_dynamic_object_size when there is mismatch between the
>> allocation size and the value of "counted_by", the expecting behavior
>> for each case and the explanation on why in the comments.
>> ***Against 2rd version:
>> 1. Identify a tree node sharing issue and fixed it in the routine
>>    "component_ref_get_counted_ty" of tree.cc;
>> 2. Update the documentation and testing cases with the clear usage
>>    of the fomula to compute the allocation size:
>> MAX (sizeof (struct A), offsetof (struct A, array[0]) + counted_by * sizeof(element))
>>    (the algorithm used in tree-object-size.cc is correct).
>> In this set of patches, the major functionality provided is:
>> 1. a new attribute "counted_by";
>> 2. use this new attribute in bound sanitizer;
>> 3. use this new attribute in dynamic object size for subobject size;
>> As discussed, I plan to add two more separate patches sets after this initial
>> patch set is approved and committed.
>> set 1. A new warning option and a new sanitizer option for the user error
>>       when the allocation size is smaller than the value of "counted_by".
>> set 2. An improvement to __builtin_dynamic_object_size  for whole-object
>>       size of the structure with FAM annaoted with counted_by.
>> there are also some existing bugs in tree-object-size.cc identified
>> during the study, and PRs were filed to record them. these bugs will
>> be fixed seperately with individual patches:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111030
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111040
>> Bootstrapped and regression tested on both aarch64 and X86, no issue.
>> Please see more details on the description of this work on:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619708.html
>> and more discussions on
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626376.html
>> Okay for committing?
>> thanks.
>> Qing
>> Qing Zhao (3):
>>   Provide counted_by attribute to flexible array member field (PR108896)
>>   Use the counted_by atribute info in builtin object size [PR108896]
>>   Use the counted_by attribute information in bound sanitizer[PR108896]
>>  gcc/c-family/c-attribs.cc                     |  54 ++++-
>>  gcc/c-family/c-common.cc                      |  13 ++
>>  gcc/c-family/c-common.h                       |   1 +
>>  gcc/c-family/c-ubsan.cc                       |  16 ++
>>  gcc/c/c-decl.cc                               |  79 +++++--
>>  gcc/doc/extend.texi                           |  77 +++++++
>>  .../gcc.dg/flex-array-counted-by-2.c          |  74 ++++++
>>  .../gcc.dg/flex-array-counted-by-3.c          | 210 ++++++++++++++++++
>>  gcc/testsuite/gcc.dg/flex-array-counted-by.c  |  40 ++++
>>  .../ubsan/flex-array-counted-by-bounds-2.c    |  27 +++
>>  .../ubsan/flex-array-counted-by-bounds.c      |  46 ++++
>>  gcc/tree-object-size.cc                       |  37 ++-
>>  gcc/tree.cc                                   | 133 +++++++++++
>>  gcc/tree.h                                    |  15 ++
>>  14 files changed, 797 insertions(+), 25 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
>>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c
>>  create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c
Kees Cook Oct. 19, 2023, 11:33 p.m. UTC | #12
On Wed, Oct 18, 2023 at 09:11:43PM +0000, Qing Zhao wrote:
> As I replied to Martin in another email, I plan to do the following to resolve this issue:
> 
> 1. No specification for signed or unsigned for counted_by field.
> 2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases when the size of the counted-by is not positive.

I don't understand why this needs to be a runtime sanitizer. The
signedness is known at compile time, so I would expect a -W option. Or
do you mean you'd split up -fsanitize=bounds between unsigned and signed
indexes? I'd find that kind of awkward for the kernel... but I feel like
I've misunderstood something. :)

-Kees
Martin Uecker Oct. 20, 2023, 9:50 a.m. UTC | #13
Am Donnerstag, dem 19.10.2023 um 16:33 -0700 schrieb Kees Cook:
> On Wed, Oct 18, 2023 at 09:11:43PM +0000, Qing Zhao wrote:
> > As I replied to Martin in another email, I plan to do the following to resolve this issue:
> > 
> > 1. No specification for signed or unsigned for counted_by field.
> > 2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases when the size of the counted-by is not positive.
> 
> I don't understand why this needs to be a runtime sanitizer. The
> signedness is known at compile time, so I would expect a -W option.

The signedness of the type but not of the value.

But I would not want to have a warning for signed 
counter  types by default because I would prefer
to use signed types (for various reasons including
better overflow detection).

>  Or
> do you mean you'd split up -fsanitize=bounds between unsigned and signed
> indexes? I'd find that kind of awkward for the kernel... but I feel like
> I've misunderstood something. :)
> 
> -Kees

The idea would be to detect at run-time the case
if  x->buf  is used at a time where   x->counter 
is negative and also when x->counter * sizeof(x->buf[0])
overflows or is too big.

This would be similar to

int a[n];

where it is detected at run-time if n is not-positive.

Martin
Qing Zhao Oct. 20, 2023, 5:08 p.m. UTC | #14
Sid,

(Richard, can you please help me to make sure this? Thanks a lot)

I studied a little bit more on the following question you raised during the review process:

For the following small testing case: 

  1 struct annotated {
  2   int foo;
  3   char array[] __attribute__((counted_by (foo)));
  4 };
  5 
  6 extern struct annotated * alloc_buf (int);
  7 
  8 int test (int sz)
  9 {
 10   struct annotated * array_annotated = alloc_buf (sz);
 11   array_annotated->foo = sz;
 12   return __builtin_dynamic_object_size (array_annotated->array, 1);
 13 }

Whether the assignment of the size to the counted_by field at line 11 and the consumer of the size at line 12 at call to __bdos might be reordered by GCC? 

The following is my thought:

1. _bdos computation passes (both pass_early_object_sizes and pass_object_sizes) are in the early stage of SSA optimizations. In which, pass_early_object_sizes happens before almost all the optimizations, no reordering is possible in this pass;

2. Then how about the pass “pass_object_sizes”?

   Immediately after the pass_build_ssa,  the IR for the routine “test” is  with the SSA form: (compiled with -O3):

  1 int test (int sz)
  2 {
  3   struct annotated * array_annotated;
  4   char[0:] * _1;
  5   long unsigned int _2;
  6   int _8;
  7   
  8   <bb 2> :
  9   array_annotated_6 = alloc_buf (sz_4(D));
 10   array_annotated_6->foo = sz_4(D);
 11   _1 = &array_annotated_6->array;
 12   _2 = __builtin_dynamic_object_size (_1, 1);
 13   _8 = (int) _2;
 14   return _8; 
 15 } 

In the above IR, the key portion is line 10 and line 11: (whether these two lines might be reordered with SSA optimization?)

 10   array_annotated_6->foo = sz_4(D);
 11   _1 = &array_annotated_6->array;

The major question here is: whether the SSA optimizations are able to distinguish the object “array_annotated_6->foo” at line 10 is independent with
the object “array_annotated-_6->array” at line 11?

If the SSA optimizations can distinguish “array_annotated_6->foo” from “array_annotated_6->array”, then these two lines might be reordered.
Otherwise, these two lines will not be reordered by SSA optimizations.

I am not very familiar with the details of the SSA optimizations, but my guess is, two fields of the same structure might not be distinguished by the SSA optimizations, then line 10 and line 11 will not be reordered by SSA optimizations.

Richard, is my guess correct?

Thanks a lot for your help.

Qing

>> On Oct 5, 2023, at 4:08 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>> 
>> I hope the review was helpful.  Overall, a couple of things to consider:
>> 
>> 1. How would you handle potential reordering between assignment of the size to the counted_by field with the __bdos call that may consume it? You'll probably need to express some kind of dependency there or in the worst case, insert a barrier to disallow reordering.
> 
> Good point! 
> 
> So, your example in the respond to [V3][PATCH 2/3]Use the counted_by atribute info in builtin object size [PR108896]:
> “
> Maybe another test where the allocation, size assignment and __bdos call happen in the same function, where the allocator is not recognized by gcc:
> 
> void *
> __attribute__ ((noinline))
> alloc (size_t sz)
> {
> return __builtin_malloc (sz);
> }
> 
> void test (size_t sz)
> {
> array_annotated = alloc (sz);
> array_annotated->b = sz;
> return __builtin_dynamic_object_size (array_annotated->c, 1);
> }
> 
> The interesting thing to test (and ensure in the codegen) is that the assignment to array_annotated->b does not get reordered to below the __builtin_dynamic_object_size call since technically there is no data dependency between the two.
> “
> Will test on this. 
> 
> Not sure whether the current GCC alias analysis is able to distinguish one field of a structure from another field of the same structure, if YES, then
> We need to add an explicit dependency edge from the write to “array_annotated->b” to the call to “__builtin_dynamic_object_size(array_annotated->c,1)”.
> I will check on this and see how to resolve this issue.
> 
> I guess the possible solution is that we can add an implicit ref to “array_annotated->b” at the call to “__builtin_dynamic_object_size(array_annotated->c, 1)” if the counted_by attribute is available. That should resolve the issue.
> 
> Richard, what do you think on this?
>
Richard Biener Oct. 20, 2023, 6:22 p.m. UTC | #15
> Am 20.10.2023 um 19:09 schrieb Qing Zhao <qing.zhao@oracle.com>:
> 
> Sid,
> 
> (Richard, can you please help me to make sure this? Thanks a lot)
> 
> I studied a little bit more on the following question you raised during the review process:
> 
> For the following small testing case: 
> 
>  1 struct annotated {
>  2   int foo;
>  3   char array[] __attribute__((counted_by (foo)));
>  4 };
>  5 
>  6 extern struct annotated * alloc_buf (int);
>  7 
>  8 int test (int sz)
>  9 {
> 10   struct annotated * array_annotated = alloc_buf (sz);
> 11   array_annotated->foo = sz;
> 12   return __builtin_dynamic_object_size (array_annotated->array, 1);
> 13 }
> 
> Whether the assignment of the size to the counted_by field at line 11 and the consumer of the size at line 12 at call to __bdos might be reordered by GCC? 
> 
> The following is my thought:
> 
> 1. _bdos computation passes (both pass_early_object_sizes and pass_object_sizes) are in the early stage of SSA optimizations. In which, pass_early_object_sizes happens before almost all the optimizations, no reordering is possible in this pass;
> 
> 2. Then how about the pass “pass_object_sizes”?
> 
>   Immediately after the pass_build_ssa,  the IR for the routine “test” is  with the SSA form: (compiled with -O3):
> 
>  1 int test (int sz)
>  2 {
>  3   struct annotated * array_annotated;
>  4   char[0:] * _1;
>  5   long unsigned int _2;
>  6   int _8;
>  7   
>  8   <bb 2> :
>  9   array_annotated_6 = alloc_buf (sz_4(D));
> 10   array_annotated_6->foo = sz_4(D);
> 11   _1 = &array_annotated_6->array;
> 12   _2 = __builtin_dynamic_object_size (_1, 1);
> 13   _8 = (int) _2;
> 14   return _8; 
> 15 } 
> 
> In the above IR, the key portion is line 10 and line 11: (whether these two lines might be reordered with SSA optimization?)
> 
> 10   array_annotated_6->foo = sz_4(D);
> 11   _1 = &array_annotated_6->array;
> 
> The major question here is: whether the SSA optimizations are able to distinguish the object “array_annotated_6->foo” at line 10 is independent with
> the object “array_annotated-_6->array” at line 11?
> 
> If the SSA optimizations can distinguish “array_annotated_6->foo” from “array_annotated_6->array”, then these two lines might be reordered.
> Otherwise, these two lines will not be reordered by SSA optimizations.
> 
> I am not very familiar with the details of the SSA optimizations, but my guess is, two fields of the same structure might not be distinguished by the SSA optimizations, then line 10 and line 11 will not be reordered by SSA optimizations.
> 
> Richard, is my guess correct?

There is no data dependence between the memory access and the address computation so nothing prevents the reordering.  If you put another same bos call before the access I expect the addresses to be CSEd, effectively moving the later before the access.

Richard 

> Thanks a lot for your help.
> 
> Qing
> 
>>>> On Oct 5, 2023, at 4:08 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>> 
>>> I hope the review was helpful.  Overall, a couple of things to consider:
>>> 
>>> 1. How would you handle potential reordering between assignment of the size to the counted_by field with the __bdos call that may consume it? You'll probably need to express some kind of dependency there or in the worst case, insert a barrier to disallow reordering.
>> 
>> Good point! 
>> 
>> So, your example in the respond to [V3][PATCH 2/3]Use the counted_by atribute info in builtin object size [PR108896]:
>> “
>> Maybe another test where the allocation, size assignment and __bdos call happen in the same function, where the allocator is not recognized by gcc:
>> 
>> void *
>> __attribute__ ((noinline))
>> alloc (size_t sz)
>> {
>> return __builtin_malloc (sz);
>> }
>> 
>> void test (size_t sz)
>> {
>> array_annotated = alloc (sz);
>> array_annotated->b = sz;
>> return __builtin_dynamic_object_size (array_annotated->c, 1);
>> }
>> 
>> The interesting thing to test (and ensure in the codegen) is that the assignment to array_annotated->b does not get reordered to below the __builtin_dynamic_object_size call since technically there is no data dependency between the two.
>> “
>> Will test on this. 
>> 
>> Not sure whether the current GCC alias analysis is able to distinguish one field of a structure from another field of the same structure, if YES, then
>> We need to add an explicit dependency edge from the write to “array_annotated->b” to the call to “__builtin_dynamic_object_size(array_annotated->c,1)”.
>> I will check on this and see how to resolve this issue.
>> 
>> I guess the possible solution is that we can add an implicit ref to “array_annotated->b” at the call to “__builtin_dynamic_object_size(array_annotated->c, 1)” if the counted_by attribute is available. That should resolve the issue.
>> 
>> Richard, what do you think on this?
>> 
>
Kees Cook Oct. 20, 2023, 6:34 p.m. UTC | #16
On Fri, Oct 20, 2023 at 11:50:11AM +0200, Martin Uecker wrote:
> Am Donnerstag, dem 19.10.2023 um 16:33 -0700 schrieb Kees Cook:
> > On Wed, Oct 18, 2023 at 09:11:43PM +0000, Qing Zhao wrote:
> > > As I replied to Martin in another email, I plan to do the following to resolve this issue:
> > > 
> > > 1. No specification for signed or unsigned for counted_by field.
> > > 2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases when the size of the counted-by is not positive.
> > 
> > I don't understand why this needs to be a runtime sanitizer. The
> > signedness is known at compile time, so I would expect a -W option.
> 
> The signedness of the type but not of the value.
> 
> But I would not want to have a warning for signed 
> counter  types by default because I would prefer
> to use signed types (for various reasons including
> better overflow detection).
> 
> >  Or
> > do you mean you'd split up -fsanitize=bounds between unsigned and signed
> > indexes? I'd find that kind of awkward for the kernel... but I feel like
> > I've misunderstood something. :)
> > 
> > -Kees
> 
> The idea would be to detect at run-time the case
> if  x->buf  is used at a time where   x->counter 
> is negative and also when x->counter * sizeof(x->buf[0])
> overflows or is too big.
> 
> This would be similar to
> 
> int a[n];
> 
> where it is detected at run-time if n is not-positive.

Right. I guess what I mean to say is that I would expect this case to
already be caught by -fsanitize=bounds -- I don't see a reason to add an
additional sanitizer option.

struct foo {
	int count;
	int array[] __counted_by(count);
};

	foo->count = 5;
	foo->array[0] = 1;	// ok
	foo->array[10] = 1;	// -fsanitize=bounds will catch this
	foo->array[-10] = 1;	// -fsanitize=bounds will catch this too
Qing Zhao Oct. 20, 2023, 6:38 p.m. UTC | #17
> On Oct 20, 2023, at 2:22 PM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> 
> 
>> Am 20.10.2023 um 19:09 schrieb Qing Zhao <qing.zhao@oracle.com>:
>> 
>> Sid,
>> 
>> (Richard, can you please help me to make sure this? Thanks a lot)
>> 
>> I studied a little bit more on the following question you raised during the review process:
>> 
>> For the following small testing case: 
>> 
>> 1 struct annotated {
>> 2   int foo;
>> 3   char array[] __attribute__((counted_by (foo)));
>> 4 };
>> 5 
>> 6 extern struct annotated * alloc_buf (int);
>> 7 
>> 8 int test (int sz)
>> 9 {
>> 10   struct annotated * array_annotated = alloc_buf (sz);
>> 11   array_annotated->foo = sz;
>> 12   return __builtin_dynamic_object_size (array_annotated->array, 1);
>> 13 }
>> 
>> Whether the assignment of the size to the counted_by field at line 11 and the consumer of the size at line 12 at call to __bdos might be reordered by GCC? 
>> 
>> The following is my thought:
>> 
>> 1. _bdos computation passes (both pass_early_object_sizes and pass_object_sizes) are in the early stage of SSA optimizations. In which, pass_early_object_sizes happens before almost all the optimizations, no reordering is possible in this pass;
>> 
>> 2. Then how about the pass “pass_object_sizes”?
>> 
>>  Immediately after the pass_build_ssa,  the IR for the routine “test” is  with the SSA form: (compiled with -O3):
>> 
>> 1 int test (int sz)
>> 2 {
>> 3   struct annotated * array_annotated;
>> 4   char[0:] * _1;
>> 5   long unsigned int _2;
>> 6   int _8;
>> 7   
>> 8   <bb 2> :
>> 9   array_annotated_6 = alloc_buf (sz_4(D));
>> 10   array_annotated_6->foo = sz_4(D);
>> 11   _1 = &array_annotated_6->array;
>> 12   _2 = __builtin_dynamic_object_size (_1, 1);
>> 13   _8 = (int) _2;
>> 14   return _8; 
>> 15 } 
>> 
>> In the above IR, the key portion is line 10 and line 11: (whether these two lines might be reordered with SSA optimization?)
>> 
>> 10   array_annotated_6->foo = sz_4(D);
>> 11   _1 = &array_annotated_6->array;
>> 
>> The major question here is: whether the SSA optimizations are able to distinguish the object “array_annotated_6->foo” at line 10 is independent with
>> the object “array_annotated-_6->array” at line 11?
>> 
>> If the SSA optimizations can distinguish “array_annotated_6->foo” from “array_annotated_6->array”, then these two lines might be reordered.
>> Otherwise, these two lines will not be reordered by SSA optimizations.
>> 
>> I am not very familiar with the details of the SSA optimizations, but my guess is, two fields of the same structure might not be distinguished by the SSA optimizations, then line 10 and line 11 will not be reordered by SSA optimizations.
>> 
>> Richard, is my guess correct?
> 
> There is no data dependence between the memory access and the address computation so nothing prevents the reordering.  

Okay, I see.  then:

10   array_annotated_6->foo = sz_4(D);
11   _1 = &array_annotated_6->array;

Line 10 and line 11 could be reordered.

And then
10   array_annotated_6->foo = sz_4(D);
12   _2 = __builtin_dynamic_object_size (_1, 1);

Line 10 and 12 could be reordered too.

Then what’s the best way to add such data dependence in the IR?

How about the following:

  Add one more parameter to __builtin_dynamic_object_size(), i.e 

__builtin_dynamic_object_size (_1,1,array_annotated->foo)? 

When we see the structure field has counted_by attribute. 

Then we can enforce such data dependence and avoid potential reordering.

What’s your opinion? Do you have other suggestion on the solution?

Qing



If you put another same bos call before the access I expect the addresses to be CSEd, effectively moving the later before the access.
> 
> Richard 
> 
>> Thanks a lot for your help.
>> 
>> Qing
>> 
>>>>> On Oct 5, 2023, at 4:08 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>>> 
>>>> I hope the review was helpful.  Overall, a couple of things to consider:
>>>> 
>>>> 1. How would you handle potential reordering between assignment of the size to the counted_by field with the __bdos call that may consume it? You'll probably need to express some kind of dependency there or in the worst case, insert a barrier to disallow reordering.
>>> 
>>> Good point! 
>>> 
>>> So, your example in the respond to [V3][PATCH 2/3]Use the counted_by atribute info in builtin object size [PR108896]:
>>> “
>>> Maybe another test where the allocation, size assignment and __bdos call happen in the same function, where the allocator is not recognized by gcc:
>>> 
>>> void *
>>> __attribute__ ((noinline))
>>> alloc (size_t sz)
>>> {
>>> return __builtin_malloc (sz);
>>> }
>>> 
>>> void test (size_t sz)
>>> {
>>> array_annotated = alloc (sz);
>>> array_annotated->b = sz;
>>> return __builtin_dynamic_object_size (array_annotated->c, 1);
>>> }
>>> 
>>> The interesting thing to test (and ensure in the codegen) is that the assignment to array_annotated->b does not get reordered to below the __builtin_dynamic_object_size call since technically there is no data dependency between the two.
>>> “
>>> Will test on this. 
>>> 
>>> Not sure whether the current GCC alias analysis is able to distinguish one field of a structure from another field of the same structure, if YES, then
>>> We need to add an explicit dependency edge from the write to “array_annotated->b” to the call to “__builtin_dynamic_object_size(array_annotated->c,1)”.
>>> I will check on this and see how to resolve this issue.
>>> 
>>> I guess the possible solution is that we can add an implicit ref to “array_annotated->b” at the call to “__builtin_dynamic_object_size(array_annotated->c, 1)” if the counted_by attribute is available. That should resolve the issue.
>>> 
>>> Richard, what do you think on this?
Qing Zhao Oct. 20, 2023, 6:48 p.m. UTC | #18
> On Oct 20, 2023, at 2:34 PM, Kees Cook <keescook@chromium.org> wrote:
> 
> On Fri, Oct 20, 2023 at 11:50:11AM +0200, Martin Uecker wrote:
>> Am Donnerstag, dem 19.10.2023 um 16:33 -0700 schrieb Kees Cook:
>>> On Wed, Oct 18, 2023 at 09:11:43PM +0000, Qing Zhao wrote:
>>>> As I replied to Martin in another email, I plan to do the following to resolve this issue:
>>>> 
>>>> 1. No specification for signed or unsigned for counted_by field.
>>>> 2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases when the size of the counted-by is not positive.
>>> 
>>> I don't understand why this needs to be a runtime sanitizer. The
>>> signedness is known at compile time, so I would expect a -W option.
>> 
>> The signedness of the type but not of the value.
>> 
>> But I would not want to have a warning for signed 
>> counter  types by default because I would prefer
>> to use signed types (for various reasons including
>> better overflow detection).
>> 
>>> Or
>>> do you mean you'd split up -fsanitize=bounds between unsigned and signed
>>> indexes? I'd find that kind of awkward for the kernel... but I feel like
>>> I've misunderstood something. :)
>>> 
>>> -Kees
>> 
>> The idea would be to detect at run-time the case
>> if  x->buf  is used at a time where   x->counter 
>> is negative and also when x->counter * sizeof(x->buf[0])
>> overflows or is too big.
>> 
>> This would be similar to
>> 
>> int a[n];
>> 
>> where it is detected at run-time if n is not-positive.
> 
> Right. I guess what I mean to say is that I would expect this case to
> already be caught by -fsanitize=bounds -- I don't see a reason to add an
> additional sanitizer option.
> 
> struct foo {
> 	int count;
> 	int array[] __counted_by(count);
> };
> 
> 	foo->count = 5;
> 	foo->array[0] = 1;	// ok
> 	foo->array[10] = 1;	// -fsanitize=bounds will catch this
> 	foo->array[-10] = 1;	// -fsanitize=bounds will catch this too
> 
> 

just checked this testing case with my GCC, and YES, -fsanitize=bounds indeed caught this error:

ttt_1.c:31:12: runtime error: index 10 out of bounds for type 'char [*]'
ttt_1.c:32:12: runtime error: index -10 out of bounds for type 'char [*]’

Qing


> -- 
> Kees Cook
Siddhesh Poyarekar Oct. 20, 2023, 7:10 p.m. UTC | #19
On 2023-10-20 14:38, Qing Zhao wrote:
> How about the following:
> 
>    Add one more parameter to __builtin_dynamic_object_size(), i.e
> 
> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
> 
> When we see the structure field has counted_by attribute.

Or maybe add a barrier preventing any assignments to 
array_annotated->foo from being reordered below the __bdos call? 
Basically an __asm__ with array_annotated->foo in the clobber list ought 
to do it I think.

It may not work for something like this though:

static size_t
get_size_of (void *ptr)
{
   return __bdos (ptr, 1);
}

void
foo (size_t sz)
{
   array_annotated = __builtin_malloc (sz);
   array_annotated = sz;

   ...
   __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
   ...
}

because the call to get_size_of () may not have been inlined that early.

The more fool-proof alternative may be to put a compile time barrier 
right below the assignment to array_annotated->foo; I reckon you could 
do that early in the front end by marking the size identifier and then 
tracking assignments to that identifier.  That may have a slight runtime 
performance overhead since it may prevent even legitimate reordering.  I 
can't think of another alternative at the moment...

Sid
Martin Uecker Oct. 20, 2023, 7:54 p.m. UTC | #20
Am Freitag, dem 20.10.2023 um 18:48 +0000 schrieb Qing Zhao:
> 
> > On Oct 20, 2023, at 2:34 PM, Kees Cook <keescook@chromium.org> wrote:
> > 
> > On Fri, Oct 20, 2023 at 11:50:11AM +0200, Martin Uecker wrote:
> > > Am Donnerstag, dem 19.10.2023 um 16:33 -0700 schrieb Kees Cook:
> > > > On Wed, Oct 18, 2023 at 09:11:43PM +0000, Qing Zhao wrote:
> > > > > As I replied to Martin in another email, I plan to do the following to resolve this issue:
> > > > > 
> > > > > 1. No specification for signed or unsigned for counted_by field.
> > > > > 2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases when the size of the counted-by is not positive.
> > > > 
> > > > I don't understand why this needs to be a runtime sanitizer. The
> > > > signedness is known at compile time, so I would expect a -W option.
> > > 
> > > The signedness of the type but not of the value.
> > > 
> > > But I would not want to have a warning for signed 
> > > counter  types by default because I would prefer
> > > to use signed types (for various reasons including
> > > better overflow detection).
> > > 
> > > > Or
> > > > do you mean you'd split up -fsanitize=bounds between unsigned and signed
> > > > indexes? I'd find that kind of awkward for the kernel... but I feel like
> > > > I've misunderstood something. :)
> > > > 
> > > > -Kees
> > > 
> > > The idea would be to detect at run-time the case
> > > if  x->buf  is used at a time where   x->counter 
> > > is negative and also when x->counter * sizeof(x->buf[0])
> > > overflows or is too big.
> > > 
> > > This would be similar to
> > > 
> > > int a[n];
> > > 
> > > where it is detected at run-time if n is not-positive.
> > 
> > Right. I guess what I mean to say is that I would expect this case to
> > already be caught by -fsanitize=bounds -- I don't see a reason to add an
> > additional sanitizer option.
> > 
> > struct foo {
> > 	int count;
> > 	int array[] __counted_by(count);
> > };
> > 
> > 	foo->count = 5;
> > 	foo->array[0] = 1;	// ok
> > 	foo->array[10] = 1;	// -fsanitize=bounds will catch this
> > 	foo->array[-10] = 1;	// -fsanitize=bounds will catch this too
> > 
> > 
> 
> just checked this testing case with my GCC, and YES, -fsanitize=bounds indeed caught this error:
> 
> ttt_1.c:31:12: runtime error: index 10 out of bounds for type 'char [*]'
> ttt_1.c:32:12: runtime error: index -10 out of bounds for type 'char [*]’
> 

Yes, but I thought we were discussing the case where count is
set to a negative value:

foo->count = -1;
int x = foo->array[3]; // UBSan should diagnose this

And also the case when foo->array becomes too big.

Martin
Qing Zhao Oct. 20, 2023, 8:41 p.m. UTC | #21
> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> 
> On 2023-10-20 14:38, Qing Zhao wrote:
>> How about the following:
>>   Add one more parameter to __builtin_dynamic_object_size(), i.e
>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>> When we see the structure field has counted_by attribute.
> 
> Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.

Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?

But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?

This might be the simplest solution?

Qing

> 
> It may not work for something like this though:
> 
> static size_t
> get_size_of (void *ptr)
> {
>  return __bdos (ptr, 1);
> }
> 
> void
> foo (size_t sz)
> {
>  array_annotated = __builtin_malloc (sz);
>  array_annotated = sz;
> 
>  ...
>  __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
>  ...
> }
> 
> because the call to get_size_of () may not have been inlined that early.
> 
> The more fool-proof alternative may be to put a compile time barrier right below the assignment to array_annotated->foo; I reckon you could do that early in the front end by marking the size identifier and then tracking assignments to that identifier.  That may have a slight runtime performance overhead since it may prevent even legitimate reordering.  I can't think of another alternative at the moment...
> 
> Sid
Richard Biener Oct. 23, 2023, 7:57 a.m. UTC | #22
On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>
>
>
> > On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> >
> > On 2023-10-20 14:38, Qing Zhao wrote:
> >> How about the following:
> >>   Add one more parameter to __builtin_dynamic_object_size(), i.e
> >> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
> >> When we see the structure field has counted_by attribute.
> >
> > Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
>
> Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
>
> But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
>
> This might be the simplest solution?

If the dynamic object size is derived of a field then I think you need to
put the "load" of that memory location at the point (as argument)
of the __bos call right at parsing time.  I know that's awkward because
you try to play tricks "discovering" that field only late, but that's not
going to work.

A related issue is that assignment to the field and storage allocation
are not tied together - if there's no use of the size data we might
remove the store of it as dead.

Of course I guess __bos then behaves like sizeof ().

Richard.

>
> Qing
>
> >
> > It may not work for something like this though:
> >
> > static size_t
> > get_size_of (void *ptr)
> > {
> >  return __bdos (ptr, 1);
> > }
> >
> > void
> > foo (size_t sz)
> > {
> >  array_annotated = __builtin_malloc (sz);
> >  array_annotated = sz;
> >
> >  ...
> >  __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
> >  ...
> > }
> >
> > because the call to get_size_of () may not have been inlined that early.
> >
> > The more fool-proof alternative may be to put a compile time barrier right below the assignment to array_annotated->foo; I reckon you could do that early in the front end by marking the size identifier and then tracking assignments to that identifier.  That may have a slight runtime performance overhead since it may prevent even legitimate reordering.  I can't think of another alternative at the moment...
> >
> > Sid
>
Siddhesh Poyarekar Oct. 23, 2023, 11:27 a.m. UTC | #23
On 2023-10-23 03:57, Richard Biener wrote:
> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>
>>
>>
>>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>>
>>> On 2023-10-20 14:38, Qing Zhao wrote:
>>>> How about the following:
>>>>    Add one more parameter to __builtin_dynamic_object_size(), i.e
>>>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>>>> When we see the structure field has counted_by attribute.
>>>
>>> Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
>>
>> Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
>>
>> But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
>>
>> This might be the simplest solution?
> 
> If the dynamic object size is derived of a field then I think you need to
> put the "load" of that memory location at the point (as argument)
> of the __bos call right at parsing time.  I know that's awkward because
> you try to play tricks "discovering" that field only late, but that's not
> going to work.
> 
> A related issue is that assignment to the field and storage allocation
> are not tied together - if there's no use of the size data we might
> remove the store of it as dead.

Maybe the trick then is to treat the size data as volatile?  That ought 
to discourage reordering and also prevent elimination of the "dead" store?

Thanks,
Sid
Richard Biener Oct. 23, 2023, 12:34 p.m. UTC | #24
On Mon, Oct 23, 2023 at 1:27 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>
> On 2023-10-23 03:57, Richard Biener wrote:
> > On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
> >>
> >>
> >>
> >>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> >>>
> >>> On 2023-10-20 14:38, Qing Zhao wrote:
> >>>> How about the following:
> >>>>    Add one more parameter to __builtin_dynamic_object_size(), i.e
> >>>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
> >>>> When we see the structure field has counted_by attribute.
> >>>
> >>> Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
> >>
> >> Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
> >>
> >> But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
> >>
> >> This might be the simplest solution?
> >
> > If the dynamic object size is derived of a field then I think you need to
> > put the "load" of that memory location at the point (as argument)
> > of the __bos call right at parsing time.  I know that's awkward because
> > you try to play tricks "discovering" that field only late, but that's not
> > going to work.
> >
> > A related issue is that assignment to the field and storage allocation
> > are not tied together - if there's no use of the size data we might
> > remove the store of it as dead.
>
> Maybe the trick then is to treat the size data as volatile?  That ought
> to discourage reordering and also prevent elimination of the "dead" store?

But we are an optimizing compiler, not a static analysis machine, so I
fail to see how this is a useful suggestion.

I think Martins suggestion to approach this as a language extension
is more useful and would make it easier to handle this?

Richard.

> Thanks,
> Sid
Siddhesh Poyarekar Oct. 23, 2023, 1:23 p.m. UTC | #25
On 2023-10-23 08:34, Richard Biener wrote:
>>> A related issue is that assignment to the field and storage allocation
>>> are not tied together - if there's no use of the size data we might
>>> remove the store of it as dead.
>>
>> Maybe the trick then is to treat the size data as volatile?  That ought
>> to discourage reordering and also prevent elimination of the "dead" store?
> 
> But we are an optimizing compiler, not a static analysis machine, so I
> fail to see how this is a useful suggestion.

Sorry I didn't meant to suggest doing this in the middle-end.

> I think Martins suggestion to approach this as a language extension
> is more useful and would make it easier to handle this?

I think handling for this (e.g. treating any storage allocated for the 
size member in the struct as volatile to prevent reordering or 
elimination) would have to be implemented in the front-end, regardless 
of whether it is a language extension or as a gcc attribute.  How would 
making it a language extension vs a gcc attribute make it different?

Thanks,
Sid
Qing Zhao Oct. 23, 2023, 2:56 p.m. UTC | #26
> On Oct 23, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>> 
>> 
>> 
>>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>> 
>>> On 2023-10-20 14:38, Qing Zhao wrote:
>>>> How about the following:
>>>>  Add one more parameter to __builtin_dynamic_object_size(), i.e
>>>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>>>> When we see the structure field has counted_by attribute.
>>> 
>>> Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
>> 
>> Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
>> 
>> But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
>> 
>> This might be the simplest solution?
> 
> If the dynamic object size is derived of a field then I think you need to
> put the "load" of that memory location at the point (as argument)
> of the __bos call right at parsing time.  I know that's awkward because
> you try to play tricks "discovering" that field only late, but that's not
> going to work.

Is it better to do this at gimplification phase instead of FE? 

VLA decls are handled in gimplification phase, the size calculation and call to alloca are all generated during this phase. (gimplify_vla_decl).

For __bdos calls, we can add an additional argument if the object’s first argument’s type include the counted_by attribute, i.e

***During gimplification, 
For a call to __builtin_dynamic_object_size (ptr, type)
Check whether the type of ptr includes counted_by attribute, if so, change the call to
__builtin_dynamic_object_size (ptr, type, counted_by field)

Then the correct data dependence should be represented well in the IR.

**During object size phase,

The call to __builtin_dynamic_object_size will become an expression includes the counted_by field or -1/0 when we cannot decide the size, the correct data dependence will be kept even the call to __builtin_dynamic_object_size is gone. 


> 
> A related issue is that assignment to the field and storage allocation
> are not tied together

Yes, this is different from VLA, in which, the size assignment and the storage allocation are generated and tied together by the compiler.

For the flexible array member, the storage allocation and the size assignment are all done by the user. So, We need to clarify such requirement  in the document to guide user to write correct code.  And also, we might need to provide tools (warnings and sanitizer option) to help users to catch such coding error.

> - if there's no use of the size data we might
> remove the store of it as dead.

Yes, when __bdos cannot decide the size, we need to remove the dead store to the field.
I guess that the compiler should be able to do this automatically?

thanks.

Qing
> 
> Of course I guess __bos then behaves like sizeof ().
> 
> Richard.
> 
>> 
>> Qing
>> 
>>> 
>>> It may not work for something like this though:
>>> 
>>> static size_t
>>> get_size_of (void *ptr)
>>> {
>>> return __bdos (ptr, 1);
>>> }
>>> 
>>> void
>>> foo (size_t sz)
>>> {
>>> array_annotated = __builtin_malloc (sz);
>>> array_annotated = sz;
>>> 
>>> ...
>>> __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
>>> ...
>>> }
>>> 
>>> because the call to get_size_of () may not have been inlined that early.
>>> 
>>> The more fool-proof alternative may be to put a compile time barrier right below the assignment to array_annotated->foo; I reckon you could do that early in the front end by marking the size identifier and then tracking assignments to that identifier.  That may have a slight runtime performance overhead since it may prevent even legitimate reordering.  I can't think of another alternative at the moment...
>>> 
>>> Sid
Qing Zhao Oct. 23, 2023, 3:14 p.m. UTC | #27
> On Oct 23, 2023, at 8:34 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Mon, Oct 23, 2023 at 1:27 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>> 
>> On 2023-10-23 03:57, Richard Biener wrote:
>>> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>>>> 
>>>>> On 2023-10-20 14:38, Qing Zhao wrote:
>>>>>> How about the following:
>>>>>>   Add one more parameter to __builtin_dynamic_object_size(), i.e
>>>>>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>>>>>> When we see the structure field has counted_by attribute.
>>>>> 
>>>>> Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
>>>> 
>>>> Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
>>>> 
>>>> But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
>>>> 
>>>> This might be the simplest solution?
>>> 
>>> If the dynamic object size is derived of a field then I think you need to
>>> put the "load" of that memory location at the point (as argument)
>>> of the __bos call right at parsing time.  I know that's awkward because
>>> you try to play tricks "discovering" that field only late, but that's not
>>> going to work.
>>> 
>>> A related issue is that assignment to the field and storage allocation
>>> are not tied together - if there's no use of the size data we might
>>> remove the store of it as dead.
>> 
>> Maybe the trick then is to treat the size data as volatile?  That ought
>> to discourage reordering and also prevent elimination of the "dead" store?
> 
> But we are an optimizing compiler, not a static analysis machine, so I
> fail to see how this is a useful suggestion.
> 
> I think Martins suggestion to approach this as a language extension
> is more useful and would make it easier to handle this?

I agree that making this as a language extension is a better and cleaner approach.

As we discussed before, the major issues with the language extension approach are:
1. Harder to be adopted by the existing source code due to the potential ABI/API change.
2. Much more effort and much longer time to be accepted.

In addition to the above issues, I guess the same issue exists even with a language extension, 
Since for FMA, it’s the user (not the compiler) to allocate the storage for the FMA. (Should we 
Also move this into compiler for the language extension? Then the existing source code need to
Be changed a lot to adopt the new language extension).

As a result, the size  and the storage allocation cannot be guaranteed to be tied together too.

Qing

> 
> Richard.
> 
>> Thanks,
>> Sid
Richard Biener Oct. 23, 2023, 3:57 p.m. UTC | #28
> Am 23.10.2023 um 16:56 schrieb Qing Zhao <qing.zhao@oracle.com>:
> 
> 
> 
>> On Oct 23, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>>> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>> 
>>> 
>>> 
>>>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>>> 
>>>> On 2023-10-20 14:38, Qing Zhao wrote:
>>>>> How about the following:
>>>>> Add one more parameter to __builtin_dynamic_object_size(), i.e
>>>>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>>>>> When we see the structure field has counted_by attribute.
>>>> 
>>>> Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
>>> 
>>> Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
>>> 
>>> But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
>>> 
>>> This might be the simplest solution?
>> 
>> If the dynamic object size is derived of a field then I think you need to
>> put the "load" of that memory location at the point (as argument)
>> of the __bos call right at parsing time.  I know that's awkward because
>> you try to play tricks "discovering" that field only late, but that's not
>> going to work.
> 
> Is it better to do this at gimplification phase instead of FE? 
> 
> VLA decls are handled in gimplification phase, the size calculation and call to alloca are all generated during this phase. (gimplify_vla_decl).
> 
> For __bdos calls, we can add an additional argument if the object’s first argument’s type include the counted_by attribute, i.e
> 
> ***During gimplification, 
> For a call to __builtin_dynamic_object_size (ptr, type)
> Check whether the type of ptr includes counted_by attribute, if so, change the call to
> __builtin_dynamic_object_size (ptr, type, counted_by field)
> 
> Then the correct data dependence should be represented well in the IR.
> 
> **During object size phase,
> 
> The call to __builtin_dynamic_object_size will become an expression includes the counted_by field or -1/0 when we cannot decide the size, the correct data dependence will be kept even the call to __builtin_dynamic_object_size is gone. 

But the whole point of the BOS pass is to derive information that is not available at parsing time, and that’s the cases you are after.  The case where the connection to the field with the length is apparent during parsing is easy - you simply insert a load of the value before the BOS call.  For the late case there’s no way to invent data flow dependence without inadvertently pessimizing optimization.

Richard 

> 
>> 
>> A related issue is that assignment to the field and storage allocation
>> are not tied together
> 
> Yes, this is different from VLA, in which, the size assignment and the storage allocation are generated and tied together by the compiler.
> 
> For the flexible array member, the storage allocation and the size assignment are all done by the user. So, We need to clarify such requirement  in the document to guide user to write correct code.  And also, we might need to provide tools (warnings and sanitizer option) to help users to catch such coding error.
> 
>> - if there's no use of the size data we might
>> remove the store of it as dead.
> 
> Yes, when __bdos cannot decide the size, we need to remove the dead store to the field.
> I guess that the compiler should be able to do this automatically?
> 
> thanks.
> 
> Qing
>> 
>> Of course I guess __bos then behaves like sizeof ().
>> 
>> Richard.
>> 
>>> 
>>> Qing
>>> 
>>>> 
>>>> It may not work for something like this though:
>>>> 
>>>> static size_t
>>>> get_size_of (void *ptr)
>>>> {
>>>> return __bdos (ptr, 1);
>>>> }
>>>> 
>>>> void
>>>> foo (size_t sz)
>>>> {
>>>> array_annotated = __builtin_malloc (sz);
>>>> array_annotated = sz;
>>>> 
>>>> ...
>>>> __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
>>>> ...
>>>> }
>>>> 
>>>> because the call to get_size_of () may not have been inlined that early.
>>>> 
>>>> The more fool-proof alternative may be to put a compile time barrier right below the assignment to array_annotated->foo; I reckon you could do that early in the front end by marking the size identifier and then tracking assignments to that identifier.  That may have a slight runtime performance overhead since it may prevent even legitimate reordering.  I can't think of another alternative at the moment...
>>>> 
>>>> Sid
>
Qing Zhao Oct. 23, 2023, 4:37 p.m. UTC | #29
> On Oct 23, 2023, at 11:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> 
> 
>> Am 23.10.2023 um 16:56 schrieb Qing Zhao <qing.zhao@oracle.com>:
>> 
>> 
>> 
>>> On Oct 23, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>> 
>>>> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>>>> 
>>>>> On 2023-10-20 14:38, Qing Zhao wrote:
>>>>>> How about the following:
>>>>>> Add one more parameter to __builtin_dynamic_object_size(), i.e
>>>>>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>>>>>> When we see the structure field has counted_by attribute.
>>>>> 
>>>>> Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
>>>> 
>>>> Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
>>>> 
>>>> But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
>>>> 
>>>> This might be the simplest solution?
>>> 
>>> If the dynamic object size is derived of a field then I think you need to
>>> put the "load" of that memory location at the point (as argument)
>>> of the __bos call right at parsing time.  I know that's awkward because
>>> you try to play tricks "discovering" that field only late, but that's not
>>> going to work.
>> 
>> Is it better to do this at gimplification phase instead of FE? 
>> 
>> VLA decls are handled in gimplification phase, the size calculation and call to alloca are all generated during this phase. (gimplify_vla_decl).
>> 
>> For __bdos calls, we can add an additional argument if the object’s first argument’s type include the counted_by attribute, i.e
>> 
>> ***During gimplification, 
>> For a call to __builtin_dynamic_object_size (ptr, type)
>> Check whether the type of ptr includes counted_by attribute, if so, change the call to
>> __builtin_dynamic_object_size (ptr, type, counted_by field)
>> 
>> Then the correct data dependence should be represented well in the IR.
>> 
>> **During object size phase,
>> 
>> The call to __builtin_dynamic_object_size will become an expression includes the counted_by field or -1/0 when we cannot decide the size, the correct data dependence will be kept even the call to __builtin_dynamic_object_size is gone. 
> 
> But the whole point of the BOS pass is to derive information that is not available at parsing time, and that’s the cases you are after.  The case where the connection to the field with the length is apparent during parsing is easy - you simply insert a load of the value before the BOS call.

Yes, this is true. 
I prefer to implement this in gimplification phase since I am more familiar with the code there.. (I think that implementing it in gimplification should be very similar as implementing it in FE? Or do I miss anything here?)

Joseph, if implement this in FE, where in the FE I should look at? 

Thanks a lot for the help.

Qing

>  For the late case there’s no way to invent data flow dependence without inadvertently pessimizing optimization.
> 
> Richard 
> 
>> 
>>> 
>>> A related issue is that assignment to the field and storage allocation
>>> are not tied together
>> 
>> Yes, this is different from VLA, in which, the size assignment and the storage allocation are generated and tied together by the compiler.
>> 
>> For the flexible array member, the storage allocation and the size assignment are all done by the user. So, We need to clarify such requirement  in the document to guide user to write correct code.  And also, we might need to provide tools (warnings and sanitizer option) to help users to catch such coding error.
>> 
>>> - if there's no use of the size data we might
>>> remove the store of it as dead.
>> 
>> Yes, when __bdos cannot decide the size, we need to remove the dead store to the field.
>> I guess that the compiler should be able to do this automatically?
>> 
>> thanks.
>> 
>> Qing
>>> 
>>> Of course I guess __bos then behaves like sizeof ().
>>> 
>>> Richard.
>>> 
>>>> 
>>>> Qing
>>>> 
>>>>> 
>>>>> It may not work for something like this though:
>>>>> 
>>>>> static size_t
>>>>> get_size_of (void *ptr)
>>>>> {
>>>>> return __bdos (ptr, 1);
>>>>> }
>>>>> 
>>>>> void
>>>>> foo (size_t sz)
>>>>> {
>>>>> array_annotated = __builtin_malloc (sz);
>>>>> array_annotated = sz;
>>>>> 
>>>>> ...
>>>>> __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
>>>>> ...
>>>>> }
>>>>> 
>>>>> because the call to get_size_of () may not have been inlined that early.
>>>>> 
>>>>> The more fool-proof alternative may be to put a compile time barrier right below the assignment to array_annotated->foo; I reckon you could do that early in the front end by marking the size identifier and then tracking assignments to that identifier.  That may have a slight runtime performance overhead since it may prevent even legitimate reordering.  I can't think of another alternative at the moment...
>>>>> 
>>>>> Sid
>>
Martin Uecker Oct. 23, 2023, 6:06 p.m. UTC | #30
Am Montag, dem 23.10.2023 um 16:37 +0000 schrieb Qing Zhao:
> 
> > On Oct 23, 2023, at 11:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> > 
> > 
> > 
> > > Am 23.10.2023 um 16:56 schrieb Qing Zhao <qing.zhao@oracle.com>:
> > > 
> > > 
> > > 
> > > > On Oct 23, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > 
> > > > > On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > > On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> > > > > > 
> > > > > > On 2023-10-20 14:38, Qing Zhao wrote:
> > > > > > > How about the following:
> > > > > > > Add one more parameter to __builtin_dynamic_object_size(), i.e
> > > > > > > __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
> > > > > > > When we see the structure field has counted_by attribute.
> > > > > > 
> > > > > > Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
> > > > > 
> > > > > Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
> > > > > 
> > > > > But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
> > > > > 
> > > > > This might be the simplest solution?
> > > > 
> > > > If the dynamic object size is derived of a field then I think you need to
> > > > put the "load" of that memory location at the point (as argument)
> > > > of the __bos call right at parsing time.  I know that's awkward because
> > > > you try to play tricks "discovering" that field only late, but that's not
> > > > going to work.
> > > 
> > > Is it better to do this at gimplification phase instead of FE? 
> > > 
> > > VLA decls are handled in gimplification phase, the size calculation and call to alloca are all generated during this phase. (gimplify_vla_decl).
> > > 
> > > For __bdos calls, we can add an additional argument if the object’s first argument’s type include the counted_by attribute, i.e
> > > 
> > > ***During gimplification, 
> > > For a call to __builtin_dynamic_object_size (ptr, type)
> > > Check whether the type of ptr includes counted_by attribute, if so, change the call to
> > > __builtin_dynamic_object_size (ptr, type, counted_by field)
> > > 
> > > Then the correct data dependence should be represented well in the IR.
> > > 
> > > **During object size phase,
> > > 
> > > The call to __builtin_dynamic_object_size will become an expression includes the counted_by field or -1/0 when we cannot decide the size, the correct data dependence will be kept even the call to __builtin_dynamic_object_size is gone. 
> > 
> > But the whole point of the BOS pass is to derive information that is not available at parsing time, and that’s the cases you are after.  The case where the connection to the field with the length is apparent during parsing is easy - you simply insert a load of the value before the BOS call.
> 
> Yes, this is true. 
> I prefer to implement this in gimplification phase since I am more familiar with the code there.. (I think that implementing it in gimplification should be very similar as implementing it in FE? Or do I miss anything here?)
> 
> Joseph, if implement this in FE, where in the FE I should look at? 
> 

We should aim for a good integration with the BDOS pass, so
that it can propagate the information further, e.g. the 
following should work:

struct { int L; char buf[] __counted_by(L) } x;
x.L = N;
x.buf = ...;
char *p = &x->f;
__bdos(p) -> N

So we need to be smart on how we provide the size
information for x->f to the backend. 

This would also be desirable for the language extension. 

Martin


> Thanks a lot for the help.
> 
> Qing
> 
> >  For the late case there’s no way to invent data flow dependence without inadvertently pessimizing optimization.
> > 
> > Richard 
> > 
> > > 
> > > > 
> > > > A related issue is that assignment to the field and storage allocation
> > > > are not tied together
> > > 
> > > Yes, this is different from VLA, in which, the size assignment and the storage allocation are generated and tied together by the compiler.
> > > 
> > > For the flexible array member, the storage allocation and the size assignment are all done by the user. So, We need to clarify such requirement  in the document to guide user to write correct code.  And also, we might need to provide tools (warnings and sanitizer option) to help users to catch such coding error.
> > > 
> > > > - if there's no use of the size data we might
> > > > remove the store of it as dead.
> > > 
> > > Yes, when __bdos cannot decide the size, we need to remove the dead store to the field.
> > > I guess that the compiler should be able to do this automatically?
> > > 
> > > thanks.
> > > 
> > > Qing
> > > > 
> > > > Of course I guess __bos then behaves like sizeof ().
> > > > 
> > > > Richard.
> > > > 
> > > > > 
> > > > > Qing
> > > > > 
> > > > > > 
> > > > > > It may not work for something like this though:
> > > > > > 
> > > > > > static size_t
> > > > > > get_size_of (void *ptr)
> > > > > > {
> > > > > > return __bdos (ptr, 1);
> > > > > > }
> > > > > > 
> > > > > > void
> > > > > > foo (size_t sz)
> > > > > > {
> > > > > > array_annotated = __builtin_malloc (sz);
> > > > > > array_annotated = sz;
> > > > > > 
> > > > > > ...
> > > > > > __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
> > > > > > ...
> > > > > > }
> > > > > > 
> > > > > > because the call to get_size_of () may not have been inlined that early.
> > > > > > 
> > > > > > The more fool-proof alternative may be to put a compile time barrier right below the assignment to array_annotated->foo; I reckon you could do that early in the front end by marking the size identifier and then tracking assignments to that identifier.  That may have a slight runtime performance overhead since it may prevent even legitimate reordering.  I can't think of another alternative at the moment...
> > > > > > 
> > > > > > Sid
> > > 
>
Joseph Myers Oct. 23, 2023, 6:10 p.m. UTC | #31
On Mon, 23 Oct 2023, Qing Zhao wrote:

> I prefer to implement this in gimplification phase since I am more 
> familiar with the code there.. (I think that implementing it in 
> gimplification should be very similar as implementing it in FE? Or do I 
> miss anything here?)
> 
> Joseph, if implement this in FE, where in the FE I should look at? 

I tend to think that gimplification time is appropriate for adding this 
dependency, but if you wish to rewrite a built-in function call in the 
front end before then, it could be done in build_function_call_vec.
Qing Zhao Oct. 23, 2023, 6:17 p.m. UTC | #32
> On Oct 20, 2023, at 3:54 PM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Freitag, dem 20.10.2023 um 18:48 +0000 schrieb Qing Zhao:
>> 
>>> On Oct 20, 2023, at 2:34 PM, Kees Cook <keescook@chromium.org> wrote:
>>> 
>>> On Fri, Oct 20, 2023 at 11:50:11AM +0200, Martin Uecker wrote:
>>>> Am Donnerstag, dem 19.10.2023 um 16:33 -0700 schrieb Kees Cook:
>>>>> On Wed, Oct 18, 2023 at 09:11:43PM +0000, Qing Zhao wrote:
>>>>>> As I replied to Martin in another email, I plan to do the following to resolve this issue:
>>>>>> 
>>>>>> 1. No specification for signed or unsigned for counted_by field.
>>>>>> 2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases when the size of the counted-by is not positive.
>>>>> 
>>>>> I don't understand why this needs to be a runtime sanitizer. The
>>>>> signedness is known at compile time, so I would expect a -W option.
>>>> 
>>>> The signedness of the type but not of the value.
>>>> 
>>>> But I would not want to have a warning for signed 
>>>> counter  types by default because I would prefer
>>>> to use signed types (for various reasons including
>>>> better overflow detection).
>>>> 
>>>>> Or
>>>>> do you mean you'd split up -fsanitize=bounds between unsigned and signed
>>>>> indexes? I'd find that kind of awkward for the kernel... but I feel like
>>>>> I've misunderstood something. :)
>>>>> 
>>>>> -Kees
>>>> 
>>>> The idea would be to detect at run-time the case
>>>> if  x->buf  is used at a time where   x->counter 
>>>> is negative and also when x->counter * sizeof(x->buf[0])
>>>> overflows or is too big.
>>>> 
>>>> This would be similar to
>>>> 
>>>> int a[n];
>>>> 
>>>> where it is detected at run-time if n is not-positive.
>>> 
>>> Right. I guess what I mean to say is that I would expect this case to
>>> already be caught by -fsanitize=bounds -- I don't see a reason to add an
>>> additional sanitizer option.
>>> 
>>> struct foo {
>>> 	int count;
>>> 	int array[] __counted_by(count);
>>> };
>>> 
>>> 	foo->count = 5;
>>> 	foo->array[0] = 1;	// ok
>>> 	foo->array[10] = 1;	// -fsanitize=bounds will catch this
>>> 	foo->array[-10] = 1;	// -fsanitize=bounds will catch this too
>>> 
>>> 
>> 
>> just checked this testing case with my GCC, and YES, -fsanitize=bounds indeed caught this error:
>> 
>> ttt_1.c:31:12: runtime error: index 10 out of bounds for type 'char [*]'
>> ttt_1.c:32:12: runtime error: index -10 out of bounds for type 'char [*]’
>> 
> 
> Yes, but I thought we were discussing the case where count is
> set to a negative value:
> 
> foo->count = -1;
> int x = foo->array[3]; // UBSan should diagnose this
> 
> And also the case when foo->array becomes too big.

Oops, yes, you are right. 

Thanks.

Qing
> 
> Martin
Martin Uecker Oct. 23, 2023, 6:31 p.m. UTC | #33
Am Montag, dem 23.10.2023 um 20:06 +0200 schrieb Martin Uecker:
> Am Montag, dem 23.10.2023 um 16:37 +0000 schrieb Qing Zhao:
> > 
> > > On Oct 23, 2023, at 11:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> > > 
> > > 
> > > 
> > > > Am 23.10.2023 um 16:56 schrieb Qing Zhao <qing.zhao@oracle.com>:
> > > > 
> > > > 
> > > > 
> > > > > On Oct 23, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > 
> > > > > > On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> > > > > > > 
> > > > > > > On 2023-10-20 14:38, Qing Zhao wrote:
> > > > > > > > How about the following:
> > > > > > > > Add one more parameter to __builtin_dynamic_object_size(), i.e
> > > > > > > > __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
> > > > > > > > When we see the structure field has counted_by attribute.
> > > > > > > 
> > > > > > > Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
> > > > > > 
> > > > > > Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
> > > > > > 
> > > > > > But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
> > > > > > 
> > > > > > This might be the simplest solution?
> > > > > 
> > > > > If the dynamic object size is derived of a field then I think you need to
> > > > > put the "load" of that memory location at the point (as argument)
> > > > > of the __bos call right at parsing time.  I know that's awkward because
> > > > > you try to play tricks "discovering" that field only late, but that's not
> > > > > going to work.
> > > > 
> > > > Is it better to do this at gimplification phase instead of FE? 
> > > > 
> > > > VLA decls are handled in gimplification phase, the size calculation and call to alloca are all generated during this phase. (gimplify_vla_decl).
> > > > 
> > > > For __bdos calls, we can add an additional argument if the object’s first argument’s type include the counted_by attribute, i.e
> > > > 
> > > > ***During gimplification, 
> > > > For a call to __builtin_dynamic_object_size (ptr, type)
> > > > Check whether the type of ptr includes counted_by attribute, if so, change the call to
> > > > __builtin_dynamic_object_size (ptr, type, counted_by field)
> > > > 
> > > > Then the correct data dependence should be represented well in the IR.
> > > > 
> > > > **During object size phase,
> > > > 
> > > > The call to __builtin_dynamic_object_size will become an expression includes the counted_by field or -1/0 when we cannot decide the size, the correct data dependence will be kept even the call to __builtin_dynamic_object_size is gone. 
> > > 
> > > But the whole point of the BOS pass is to derive information that is not available at parsing time, and that’s the cases you are after.  The case where the connection to the field with the length is apparent during parsing is easy - you simply insert a load of the value before the BOS call.
> > 
> > Yes, this is true. 
> > I prefer to implement this in gimplification phase since I am more familiar with the code there.. (I think that implementing it in gimplification should be very similar as implementing it in FE? Or do I miss anything here?)
> > 
> > Joseph, if implement this in FE, where in the FE I should look at? 
> > 
> 
> We should aim for a good integration with the BDOS pass, so
> that it can propagate the information further, e.g. the 
> following should work:
> 
> struct { int L; char buf[] __counted_by(L) } x;
> x.L = N;
> x.buf = ...;
> char *p = &x->f;
> __bdos(p) -> N
> 
> So we need to be smart on how we provide the size
> information for x->f to the backend. 

To follow up on this. I do not think we should change the
builtin in the FE or gimplification. Instead, we want 
to change the field access and compute the size there. 

In my toy patch I then made this have a VLA type that 
encodes the size.  Here, this would need to be done 
differently.

But still, what we are missing in both cases
is a proper way to pass the information down to BDOS.

For VLAs this works because BDOS can see the size of
the definition.  For calls to allocation functions
it is read from an attribute. 

But I am not sure what would be the best way to encode
this information so that BDOS can later access it.

Martin




> 
> This would also be desirable for the language extension. 
> 
> Martin
> 
> 
> > Thanks a lot for the help.
> > 
> > Qing
> > 
> > >  For the late case there’s no way to invent data flow dependence without inadvertently pessimizing optimization.
> > > 
> > > Richard 
> > > 
> > > > 
> > > > > 
> > > > > A related issue is that assignment to the field and storage allocation
> > > > > are not tied together
> > > > 
> > > > Yes, this is different from VLA, in which, the size assignment and the storage allocation are generated and tied together by the compiler.
> > > > 
> > > > For the flexible array member, the storage allocation and the size assignment are all done by the user. So, We need to clarify such requirement  in the document to guide user to write correct code.  And also, we might need to provide tools (warnings and sanitizer option) to help users to catch such coding error.
> > > > 
> > > > > - if there's no use of the size data we might
> > > > > remove the store of it as dead.
> > > > 
> > > > Yes, when __bdos cannot decide the size, we need to remove the dead store to the field.
> > > > I guess that the compiler should be able to do this automatically?
> > > > 
> > > > thanks.
> > > > 
> > > > Qing
> > > > > 
> > > > > Of course I guess __bos then behaves like sizeof ().
> > > > > 
> > > > > Richard.
> > > > > 
> > > > > > 
> > > > > > Qing
> > > > > > 
> > > > > > > 
> > > > > > > It may not work for something like this though:
> > > > > > > 
> > > > > > > static size_t
> > > > > > > get_size_of (void *ptr)
> > > > > > > {
> > > > > > > return __bdos (ptr, 1);
> > > > > > > }
> > > > > > > 
> > > > > > > void
> > > > > > > foo (size_t sz)
> > > > > > > {
> > > > > > > array_annotated = __builtin_malloc (sz);
> > > > > > > array_annotated = sz;
> > > > > > > 
> > > > > > > ...
> > > > > > > __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
> > > > > > > ...
> > > > > > > }
> > > > > > > 
> > > > > > > because the call to get_size_of () may not have been inlined that early.
> > > > > > > 
> > > > > > > The more fool-proof alternative may be to put a compile time barrier right below the assignment to array_annotated->foo; I reckon you could do that early in the front end by marking the size identifier and then tracking assignments to that identifier.  That may have a slight runtime performance overhead since it may prevent even legitimate reordering.  I can't think of another alternative at the moment...
> > > > > > > 
> > > > > > > Sid
> > > > 
> > 
>
Qing Zhao Oct. 23, 2023, 6:33 p.m. UTC | #34
> On Oct 23, 2023, at 2:06 PM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Montag, dem 23.10.2023 um 16:37 +0000 schrieb Qing Zhao:
>> 
>>> On Oct 23, 2023, at 11:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>> 
>>> 
>>> 
>>>> Am 23.10.2023 um 16:56 schrieb Qing Zhao <qing.zhao@oracle.com>:
>>>> 
>>>> 
>>>> 
>>>>> On Oct 23, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>>>> 
>>>>>> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>>>>>> 
>>>>>>> On 2023-10-20 14:38, Qing Zhao wrote:
>>>>>>>> How about the following:
>>>>>>>> Add one more parameter to __builtin_dynamic_object_size(), i.e
>>>>>>>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>>>>>>>> When we see the structure field has counted_by attribute.
>>>>>>> 
>>>>>>> Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
>>>>>> 
>>>>>> Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
>>>>>> 
>>>>>> But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
>>>>>> 
>>>>>> This might be the simplest solution?
>>>>> 
>>>>> If the dynamic object size is derived of a field then I think you need to
>>>>> put the "load" of that memory location at the point (as argument)
>>>>> of the __bos call right at parsing time.  I know that's awkward because
>>>>> you try to play tricks "discovering" that field only late, but that's not
>>>>> going to work.
>>>> 
>>>> Is it better to do this at gimplification phase instead of FE? 
>>>> 
>>>> VLA decls are handled in gimplification phase, the size calculation and call to alloca are all generated during this phase. (gimplify_vla_decl).
>>>> 
>>>> For __bdos calls, we can add an additional argument if the object’s first argument’s type include the counted_by attribute, i.e
>>>> 
>>>> ***During gimplification, 
>>>> For a call to __builtin_dynamic_object_size (ptr, type)
>>>> Check whether the type of ptr includes counted_by attribute, if so, change the call to
>>>> __builtin_dynamic_object_size (ptr, type, counted_by field)
>>>> 
>>>> Then the correct data dependence should be represented well in the IR.
>>>> 
>>>> **During object size phase,
>>>> 
>>>> The call to __builtin_dynamic_object_size will become an expression includes the counted_by field or -1/0 when we cannot decide the size, the correct data dependence will be kept even the call to __builtin_dynamic_object_size is gone. 
>>> 
>>> But the whole point of the BOS pass is to derive information that is not available at parsing time, and that’s the cases you are after.  The case where the connection to the field with the length is apparent during parsing is easy - you simply insert a load of the value before the BOS call.
>> 
>> Yes, this is true. 
>> I prefer to implement this in gimplification phase since I am more familiar with the code there.. (I think that implementing it in gimplification should be very similar as implementing it in FE? Or do I miss anything here?)
>> 
>> Joseph, if implement this in FE, where in the FE I should look at? 
>> 
> 
> We should aim for a good integration with the BDOS pass, so
> that it can propagate the information further, e.g. the 
> following should work:
> 
> struct { int L; char buf[] __counted_by(L) } x;
> x.L = N;
> x.buf = ...;
> char *p = &x->f;
Is the above line should be: 
char *p = &x.buf
?
> __bdos(p) -> N
> 
> So we need to be smart on how we provide the size
> information for x->f to the backend. 

Do you have any other suggestion here?

(Right now, what we’d like to do is to add one more argument for the function __bdos as
 __bdos (p, type, x.L))
> 
> This would also be desirable for the language extension. 

Yes.

Qing
> 
> Martin
> 
> 
>> Thanks a lot for the help.
>> 
>> Qing
>> 
>>> For the late case there’s no way to invent data flow dependence without inadvertently pessimizing optimization.
>>> 
>>> Richard 
>>> 
>>>> 
>>>>> 
>>>>> A related issue is that assignment to the field and storage allocation
>>>>> are not tied together
>>>> 
>>>> Yes, this is different from VLA, in which, the size assignment and the storage allocation are generated and tied together by the compiler.
>>>> 
>>>> For the flexible array member, the storage allocation and the size assignment are all done by the user. So, We need to clarify such requirement  in the document to guide user to write correct code.  And also, we might need to provide tools (warnings and sanitizer option) to help users to catch such coding error.
>>>> 
>>>>> - if there's no use of the size data we might
>>>>> remove the store of it as dead.
>>>> 
>>>> Yes, when __bdos cannot decide the size, we need to remove the dead store to the field.
>>>> I guess that the compiler should be able to do this automatically?
>>>> 
>>>> thanks.
>>>> 
>>>> Qing
>>>>> 
>>>>> Of course I guess __bos then behaves like sizeof ().
>>>>> 
>>>>> Richard.
>>>>> 
>>>>>> 
>>>>>> Qing
>>>>>> 
>>>>>>> 
>>>>>>> It may not work for something like this though:
>>>>>>> 
>>>>>>> static size_t
>>>>>>> get_size_of (void *ptr)
>>>>>>> {
>>>>>>> return __bdos (ptr, 1);
>>>>>>> }
>>>>>>> 
>>>>>>> void
>>>>>>> foo (size_t sz)
>>>>>>> {
>>>>>>> array_annotated = __builtin_malloc (sz);
>>>>>>> array_annotated = sz;
>>>>>>> 
>>>>>>> ...
>>>>>>> __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
>>>>>>> ...
>>>>>>> }
>>>>>>> 
>>>>>>> because the call to get_size_of () may not have been inlined that early.
>>>>>>> 
>>>>>>> The more fool-proof alternative may be to put a compile time barrier right below the assignment to array_annotated->foo; I reckon you could do that early in the front end by marking the size identifier and then tracking assignments to that identifier.  That may have a slight runtime performance overhead since it may prevent even legitimate reordering.  I can't think of another alternative at the moment...
>>>>>>> 
>>>>>>> Sid
Siddhesh Poyarekar Oct. 23, 2023, 6:43 p.m. UTC | #35
On 2023-10-23 14:06, Martin Uecker wrote:
> We should aim for a good integration with the BDOS pass, so
> that it can propagate the information further, e.g. the
> following should work:
> 
> struct { int L; char buf[] __counted_by(L) } x;
> x.L = N;
> x.buf = ...;
> char *p = &x->f;
> __bdos(p) -> N
> 
> So we need to be smart on how we provide the size
> information for x->f to the backend.
> 
> This would also be desirable for the language extension.

This is essentially why there need to be frontend rules constraining 
reordering and reachability semantics of x.L, thus restricting DSE and 
reordering for it.  This is not really a __bdos/__bos question, because 
that bit is trivial; if the structure is visible, the value is simply 
x.L.  This is also why adding a reference to x.L in __bos/__bdos is not 
sufficient or even possible in, e.g. the above case you note.

Thanks,
Sid
Martin Uecker Oct. 23, 2023, 6:55 p.m. UTC | #36
Am Montag, dem 23.10.2023 um 14:43 -0400 schrieb Siddhesh Poyarekar:
> On 2023-10-23 14:06, Martin Uecker wrote:
> > We should aim for a good integration with the BDOS pass, so
> > that it can propagate the information further, e.g. the
> > following should work:
> > 
> > struct { int L; char buf[] __counted_by(L) } x;
> > x.L = N;
> > x.buf = ...;
> > char *p = &x->f;
> > __bdos(p) -> N
> > 
> > So we need to be smart on how we provide the size
> > information for x->f to the backend.
> > 
> > This would also be desirable for the language extension.
> 
> This is essentially why there need to be frontend rules constraining 
> reordering and reachability semantics of x.L, thus restricting DSE and 
> reordering for it. 

Yes, this too.

>  This is not really a __bdos/__bos question, because 
> that bit is trivial; if the structure is visible, the value is simply 
> x.L.  This is also why adding a reference to x.L in __bos/__bdos is not 
> sufficient or even possible in, e.g. the above case you note.

The value x.L may change in time. I would argue that it needs
to be the value of x.L at the time where x.buf (not x->f, sorry) 
is accessed.  So the FE needs to evaluate x.L when x.buf is
accessed and store the value somewhere where __bdos can find
it later.  In the type information would make sense.

But I am not sure how to do this in the best way so that this 
information is not removed later when not used explicitely
before __bdos tries to look at it.

Martin
Qing Zhao Oct. 23, 2023, 7 p.m. UTC | #37
> On Oct 23, 2023, at 2:31 PM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Montag, dem 23.10.2023 um 20:06 +0200 schrieb Martin Uecker:
>> Am Montag, dem 23.10.2023 um 16:37 +0000 schrieb Qing Zhao:
>>> 
>>>> On Oct 23, 2023, at 11:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> Am 23.10.2023 um 16:56 schrieb Qing Zhao <qing.zhao@oracle.com>:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Oct 23, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>>>>> 
>>>>>>> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>>>>>>> 
>>>>>>>> On 2023-10-20 14:38, Qing Zhao wrote:
>>>>>>>>> How about the following:
>>>>>>>>> Add one more parameter to __builtin_dynamic_object_size(), i.e
>>>>>>>>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>>>>>>>>> When we see the structure field has counted_by attribute.
>>>>>>>> 
>>>>>>>> Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
>>>>>>> 
>>>>>>> Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
>>>>>>> 
>>>>>>> But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
>>>>>>> 
>>>>>>> This might be the simplest solution?
>>>>>> 
>>>>>> If the dynamic object size is derived of a field then I think you need to
>>>>>> put the "load" of that memory location at the point (as argument)
>>>>>> of the __bos call right at parsing time.  I know that's awkward because
>>>>>> you try to play tricks "discovering" that field only late, but that's not
>>>>>> going to work.
>>>>> 
>>>>> Is it better to do this at gimplification phase instead of FE? 
>>>>> 
>>>>> VLA decls are handled in gimplification phase, the size calculation and call to alloca are all generated during this phase. (gimplify_vla_decl).
>>>>> 
>>>>> For __bdos calls, we can add an additional argument if the object’s first argument’s type include the counted_by attribute, i.e
>>>>> 
>>>>> ***During gimplification, 
>>>>> For a call to __builtin_dynamic_object_size (ptr, type)
>>>>> Check whether the type of ptr includes counted_by attribute, if so, change the call to
>>>>> __builtin_dynamic_object_size (ptr, type, counted_by field)
>>>>> 
>>>>> Then the correct data dependence should be represented well in the IR.
>>>>> 
>>>>> **During object size phase,
>>>>> 
>>>>> The call to __builtin_dynamic_object_size will become an expression includes the counted_by field or -1/0 when we cannot decide the size, the correct data dependence will be kept even the call to __builtin_dynamic_object_size is gone. 
>>>> 
>>>> But the whole point of the BOS pass is to derive information that is not available at parsing time, and that’s the cases you are after.  The case where the connection to the field with the length is apparent during parsing is easy - you simply insert a load of the value before the BOS call.
>>> 
>>> Yes, this is true. 
>>> I prefer to implement this in gimplification phase since I am more familiar with the code there.. (I think that implementing it in gimplification should be very similar as implementing it in FE? Or do I miss anything here?)
>>> 
>>> Joseph, if implement this in FE, where in the FE I should look at? 
>>> 
>> 
>> We should aim for a good integration with the BDOS pass, so
>> that it can propagate the information further, e.g. the 
>> following should work:
>> 
>> struct { int L; char buf[] __counted_by(L) } x;
>> x.L = N;
>> x.buf = ...;
>> char *p = &x->f;
>> __bdos(p) -> N
>> 
>> So we need to be smart on how we provide the size
>> information for x->f to the backend. 
> 
> To follow up on this. I do not think we should change the
> builtin in the FE or gimplification. Instead, we want 
> to change the field access and compute the size there. 
Could you please clarify on this? What do you mean by "change the field access and compute the size there”?
> 
> In my toy patch I then made this have a VLA type that 
> encodes the size.  Here, this would need to be done 
> differently.
> 
> But still, what we are missing in both cases
> is a proper way to pass the information down to BDOS.

What’ s the issue with adding a new argument (x.L) to the BDOS call? What’s missing with this approach?

> 
> For VLAs this works because BDOS can see the size of
> the definition.  For calls to allocation functions
> it is read from an attribute. 

You mean for VLA, BDOS see the size of the definition from the attribute for the allocation function?
Yes, that’s the case for VLA. 

For VLA, the size computation and storage allocation are all done by the compiler (through “gimplify_vla_decl” in gimplification phase), 
So these two can be tied together by the compiler. 

However, for FMA with counted_by attribute, the storage allocation and the counted_by assignment are done by the user.  

Qing
> 
> But I am not sure what would be the best way to encode
> this information so that BDOS can later access it.
> 
> Martin
> 
> 
> 
> 
>> 
>> This would also be desirable for the language extension. 
>> 
>> Martin
>> 
>> 
>>> Thanks a lot for the help.
>>> 
>>> Qing
>>> 
>>>> For the late case there’s no way to invent data flow dependence without inadvertently pessimizing optimization.
>>>> 
>>>> Richard 
>>>> 
>>>>> 
>>>>>> 
>>>>>> A related issue is that assignment to the field and storage allocation
>>>>>> are not tied together
>>>>> 
>>>>> Yes, this is different from VLA, in which, the size assignment and the storage allocation are generated and tied together by the compiler.
>>>>> 
>>>>> For the flexible array member, the storage allocation and the size assignment are all done by the user. So, We need to clarify such requirement  in the document to guide user to write correct code.  And also, we might need to provide tools (warnings and sanitizer option) to help users to catch such coding error.
>>>>> 
>>>>>> - if there's no use of the size data we might
>>>>>> remove the store of it as dead.
>>>>> 
>>>>> Yes, when __bdos cannot decide the size, we need to remove the dead store to the field.
>>>>> I guess that the compiler should be able to do this automatically?
>>>>> 
>>>>> thanks.
>>>>> 
>>>>> Qing
>>>>>> 
>>>>>> Of course I guess __bos then behaves like sizeof ().
>>>>>> 
>>>>>> Richard.
>>>>>> 
>>>>>>> 
>>>>>>> Qing
>>>>>>> 
>>>>>>>> 
>>>>>>>> It may not work for something like this though:
>>>>>>>> 
>>>>>>>> static size_t
>>>>>>>> get_size_of (void *ptr)
>>>>>>>> {
>>>>>>>> return __bdos (ptr, 1);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> void
>>>>>>>> foo (size_t sz)
>>>>>>>> {
>>>>>>>> array_annotated = __builtin_malloc (sz);
>>>>>>>> array_annotated = sz;
>>>>>>>> 
>>>>>>>> ...
>>>>>>>> __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
>>>>>>>> ...
>>>>>>>> }
>>>>>>>> 
>>>>>>>> because the call to get_size_of () may not have been inlined that early.
>>>>>>>> 
>>>>>>>> The more fool-proof alternative may be to put a compile time barrier right below the assignment to array_annotated->foo; I reckon you could do that early in the front end by marking the size identifier and then tracking assignments to that identifier.  That may have a slight runtime performance overhead since it may prevent even legitimate reordering.  I can't think of another alternative at the moment...
>>>>>>>> 
>>>>>>>> Sid
Martin Uecker Oct. 23, 2023, 7:37 p.m. UTC | #38
Am Montag, dem 23.10.2023 um 19:00 +0000 schrieb Qing Zhao:
> 
> > On Oct 23, 2023, at 2:31 PM, Martin Uecker <uecker@tugraz.at> wrote:
> > 
> > Am Montag, dem 23.10.2023 um 20:06 +0200 schrieb Martin Uecker:
> > > Am Montag, dem 23.10.2023 um 16:37 +0000 schrieb Qing Zhao:
> > > > 
> > > > > On Oct 23, 2023, at 11:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > > Am 23.10.2023 um 16:56 schrieb Qing Zhao <qing.zhao@oracle.com>:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > On Oct 23, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > > 
> > > > > > > > On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> > > > > > > > > 
> > > > > > > > > On 2023-10-20 14:38, Qing Zhao wrote:
> > > > > > > > > > How about the following:
> > > > > > > > > > Add one more parameter to __builtin_dynamic_object_size(), i.e
> > > > > > > > > > __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
> > > > > > > > > > When we see the structure field has counted_by attribute.
> > > > > > > > > 
> > > > > > > > > Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
> > > > > > > > 
> > > > > > > > Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
> > > > > > > > 
> > > > > > > > But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
> > > > > > > > 
> > > > > > > > This might be the simplest solution?
> > > > > > > 
> > > > > > > If the dynamic object size is derived of a field then I think you need to
> > > > > > > put the "load" of that memory location at the point (as argument)
> > > > > > > of the __bos call right at parsing time.  I know that's awkward because
> > > > > > > you try to play tricks "discovering" that field only late, but that's not
> > > > > > > going to work.
> > > > > > 
> > > > > > Is it better to do this at gimplification phase instead of FE? 
> > > > > > 
> > > > > > VLA decls are handled in gimplification phase, the size calculation and call to alloca are all generated during this phase. (gimplify_vla_decl).
> > > > > > 
> > > > > > For __bdos calls, we can add an additional argument if the object’s first argument’s type include the counted_by attribute, i.e
> > > > > > 
> > > > > > ***During gimplification, 
> > > > > > For a call to __builtin_dynamic_object_size (ptr, type)
> > > > > > Check whether the type of ptr includes counted_by attribute, if so, change the call to
> > > > > > __builtin_dynamic_object_size (ptr, type, counted_by field)
> > > > > > 
> > > > > > Then the correct data dependence should be represented well in the IR.
> > > > > > 
> > > > > > **During object size phase,
> > > > > > 
> > > > > > The call to __builtin_dynamic_object_size will become an expression includes the counted_by field or -1/0 when we cannot decide the size, the correct data dependence will be kept even the call to __builtin_dynamic_object_size is gone. 
> > > > > 
> > > > > But the whole point of the BOS pass is to derive information that is not available at parsing time, and that’s the cases you are after.  The case where the connection to the field with the length is apparent during parsing is easy - you simply insert a load of the value before the BOS call.
> > > > 
> > > > Yes, this is true. 
> > > > I prefer to implement this in gimplification phase since I am more familiar with the code there.. (I think that implementing it in gimplification should be very similar as implementing it in FE? Or do I miss anything here?)
> > > > 
> > > > Joseph, if implement this in FE, where in the FE I should look at? 
> > > > 
> > > 
> > > We should aim for a good integration with the BDOS pass, so
> > > that it can propagate the information further, e.g. the 
> > > following should work:
> > > 
> > > struct { int L; char buf[] __counted_by(L) } x;
> > > x.L = N;
> > > x.buf = ...;
> > > char *p = &x->f;
> > > __bdos(p) -> N
> > > 
> > > So we need to be smart on how we provide the size
> > > information for x->f to the backend. 
> > 
> > To follow up on this. I do not think we should change the
> > builtin in the FE or gimplification. Instead, we want 
> > to change the field access and compute the size there. 
> Could you please clarify on this? What do you mean by
> "change the field access and compute the size there”?

I think the FE should essentially give the
type

char [buf.L]

to buf.x;

If the type (or its size) could be preserved
at this point so that it can be later
discovered by __bdos, then it could know 
the size and propagate it further.

For the attribute, this is not exactly what
the FE could do because the semantic type
can not change, but this is roughly the idea.


> > 
> > In my toy patch I then made this have a VLA type that 
> > encodes the size.  Here, this would need to be done 
> > differently.
> > 
> > But still, what we are missing in both cases
> > is a proper way to pass the information down to BDOS.
> 
> What’ s the issue with adding a new argument (x.L) to the BDOS call? What’s missing with this approach?
> 

See the example above. the BDOS call might come much
later when the relationship of the pointer to the
field access is no longer there.

> > 
> > For VLAs this works because BDOS can see the size of
> > the definition.  For calls to allocation functions
> > it is read from an attribute. 
> 
> You mean for VLA, BDOS see the size of the definition
> from the attribute for the allocation function?
> Yes, that’s the case for VLA. 

Ok, I am wrong about how it works for VLAs. They
get transformed to an alloca.

But all calls marked with alloc_size and other
allocations functions are detected in BDOS.  


> 
> For VLA, the size computation and storage allocation are all done by the compiler (through “gimplify_vla_decl” in gimplification phase), 
> So these two can be tied together by the compiler. 
> 
> However, for FMA with counted_by attribute, the
> storage allocation and the counted_by assignment
> are done by the user.  

Yes.

Martin

> 
> Qing
> > 
> > But I am not sure what would be the best way to encode
> > this information so that BDOS can later access it.
> > 
> > Martin
> > 
> > 
> > 
> > 
> > > 
> > > This would also be desirable for the language extension. 
> > > 
> > > Martin
> > > 
> > > 
> > > > Thanks a lot for the help.
> > > > 
> > > > Qing
> > > > 
> > > > > For the late case there’s no way to invent data flow dependence without inadvertently pessimizing optimization.
> > > > > 
> > > > > Richard 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > A related issue is that assignment to the field and storage allocation
> > > > > > > are not tied together
> > > > > > 
> > > > > > Yes, this is different from VLA, in which, the size assignment and the storage allocation are generated and tied together by the compiler.
> > > > > > 
> > > > > > For the flexible array member, the storage allocation and the size assignment are all done by the user. So, We need to clarify such requirement  in the document to guide user to write correct code.  And also, we might need to provide tools (warnings and sanitizer option) to help users to catch such coding error.
> > > > > > 
> > > > > > > - if there's no use of the size data we might
> > > > > > > remove the store of it as dead.
> > > > > > 
> > > > > > Yes, when __bdos cannot decide the size, we need to remove the dead store to the field.
> > > > > > I guess that the compiler should be able to do this automatically?
> > > > > > 
> > > > > > thanks.
> > > > > > 
> > > > > > Qing
> > > > > > > 
> > > > > > > Of course I guess __bos then behaves like sizeof ().
> > > > > > > 
> > > > > > > Richard.
> > > > > > > 
> > > > > > > > 
> > > > > > > > Qing
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > It may not work for something like this though:
> > > > > > > > > 
> > > > > > > > > static size_t
> > > > > > > > > get_size_of (void *ptr)
> > > > > > > > > {
> > > > > > > > > return __bdos (ptr, 1);
> > > > > > > > > }
> > > > > > > > > 
> > > > > > > > > void
> > > > > > > > > foo (size_t sz)
> > > > > > > > > {
> > > > > > > > > array_annotated = __builtin_malloc (sz);
> > > > > > > > > array_annotated = sz;
> > > > > > > > > 
> > > > > > > > > ...
> > > > > > > > > __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
> > > > > > > > > ...
> > > > > > > > > }
> > > > > > > > > 
> > > > > > > > > because the call to get_size_of () may not have been inlined that early.
> > > > > > > > > 
> > > > > > > > > The more fool-proof alternative may be to put a compile time barrier right below the assignment to array_annotated->foo; I reckon you could do that early in the front end by marking the size identifier and then tracking assignments to that identifier.  That may have a slight runtime performance overhead since it may prevent even legitimate reordering.  I can't think of another alternative at the moment...
> > > > > > > > > 
> > > > > > > > > Sid
>
Qing Zhao Oct. 23, 2023, 7:43 p.m. UTC | #39
> On Oct 23, 2023, at 2:43 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> 
> On 2023-10-23 14:06, Martin Uecker wrote:
>> We should aim for a good integration with the BDOS pass, so
>> that it can propagate the information further, e.g. the
>> following should work:
>> struct { int L; char buf[] __counted_by(L) } x;
>> x.L = N;
>> x.buf = ...;
>> char *p = &x->f;
>> __bdos(p) -> N
>> So we need to be smart on how we provide the size
>> information for x->f to the backend.
>> This would also be desirable for the language extension.
> 
> This is essentially why there need to be frontend rules constraining reordering and reachability semantics of x.L, thus restricting DSE and reordering for it.

My understanding is that Restricting DSE and reordering should be done by the proper data flow information, with a new argument added to the BDOS call, this correct data flow information could be maintained, and then the DSE and reordering will not happen. 

I don’t quite understand what kind of frontend rules should be added to constrain reordering and reachability semantics? Can you explain this a little bit more? Do you mean to add some rules or requirment to the new attribute that the users of the attribute should follow in the source code? 

>  This is not really a __bdos/__bos question, because that bit is trivial; if the structure is visible, the value is simply x.L.  This is also why adding a reference to x.L in __bos/__bdos is not sufficient or even possible in, e.g. the above case you note.

I am a little confused here, are we discussing how to resolve the potential reordering issue of the following:

"
struct annotated {
  size_t foo;
  char array[] __attribute__((counted_by (foo)));
};

  p->foo = 10;
  size = __builtin_dynamic_object_size (p->array,1);
“?

Or a bigger issue?

Qing

> 
> Thanks,
> Sid
Kees Cook Oct. 23, 2023, 7:52 p.m. UTC | #40
On Fri, Oct 20, 2023 at 09:54:05PM +0200, Martin Uecker wrote:
> Am Freitag, dem 20.10.2023 um 18:48 +0000 schrieb Qing Zhao:
> > 
> > > On Oct 20, 2023, at 2:34 PM, Kees Cook <keescook@chromium.org> wrote:
> > > 
> > > On Fri, Oct 20, 2023 at 11:50:11AM +0200, Martin Uecker wrote:
> > > > Am Donnerstag, dem 19.10.2023 um 16:33 -0700 schrieb Kees Cook:
> > > > > On Wed, Oct 18, 2023 at 09:11:43PM +0000, Qing Zhao wrote:
> > > > > > As I replied to Martin in another email, I plan to do the following to resolve this issue:
> > > > > > 
> > > > > > 1. No specification for signed or unsigned for counted_by field.
> > > > > > 2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases when the size of the counted-by is not positive.
> > > > > 
> > > > > I don't understand why this needs to be a runtime sanitizer. The
> > > > > signedness is known at compile time, so I would expect a -W option.
> > > > 
> > > > The signedness of the type but not of the value.
> > > > 
> > > > But I would not want to have a warning for signed 
> > > > counter  types by default because I would prefer
> > > > to use signed types (for various reasons including
> > > > better overflow detection).
> > > > 
> > > > > Or
> > > > > do you mean you'd split up -fsanitize=bounds between unsigned and signed
> > > > > indexes? I'd find that kind of awkward for the kernel... but I feel like
> > > > > I've misunderstood something. :)
> > > > > 
> > > > > -Kees
> > > > 
> > > > The idea would be to detect at run-time the case
> > > > if  x->buf  is used at a time where   x->counter 
> > > > is negative and also when x->counter * sizeof(x->buf[0])
> > > > overflows or is too big.
> > > > 
> > > > This would be similar to
> > > > 
> > > > int a[n];
> > > > 
> > > > where it is detected at run-time if n is not-positive.
> > > 
> > > Right. I guess what I mean to say is that I would expect this case to
> > > already be caught by -fsanitize=bounds -- I don't see a reason to add an
> > > additional sanitizer option.
> > > 
> > > struct foo {
> > > 	int count;
> > > 	int array[] __counted_by(count);
> > > };
> > > 
> > > 	foo->count = 5;
> > > 	foo->array[0] = 1;	// ok
> > > 	foo->array[10] = 1;	// -fsanitize=bounds will catch this
> > > 	foo->array[-10] = 1;	// -fsanitize=bounds will catch this too
> > > 
> > > 
> > 
> > just checked this testing case with my GCC, and YES, -fsanitize=bounds indeed caught this error:
> > 
> > ttt_1.c:31:12: runtime error: index 10 out of bounds for type 'char [*]'
> > ttt_1.c:32:12: runtime error: index -10 out of bounds for type 'char [*]’
> > 
> 
> Yes, but I thought we were discussing the case where count is
> set to a negative value:
> 
> foo->count = -1;
> int x = foo->array[3]; // UBSan should diagnose this

Oh right, I keep thinking about it backwards.

Yeah, we can't trap the "count" assignment, because it may be getting used
for other purposes. But yeah, access to "array" should trap if "count"
is negative.

> And also the case when foo->array becomes too big.

How do you mean?
Martin Uecker Oct. 23, 2023, 7:57 p.m. UTC | #41
Am Montag, dem 23.10.2023 um 12:52 -0700 schrieb Kees Cook:
> On Fri, Oct 20, 2023 at 09:54:05PM +0200, Martin Uecker wrote:
> > Am Freitag, dem 20.10.2023 um 18:48 +0000 schrieb Qing Zhao:
> > > 
> > > > On Oct 20, 2023, at 2:34 PM, Kees Cook <keescook@chromium.org> wrote:
> > > > 
> > > > On Fri, Oct 20, 2023 at 11:50:11AM +0200, Martin Uecker wrote:
> > > > > Am Donnerstag, dem 19.10.2023 um 16:33 -0700 schrieb Kees Cook:
> > > > > > On Wed, Oct 18, 2023 at 09:11:43PM +0000, Qing Zhao wrote:
> > > > > > > As I replied to Martin in another email, I plan to do the following to resolve this issue:
> > > > > > > 
> > > > > > > 1. No specification for signed or unsigned for counted_by field.
> > > > > > > 2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases when the size of the counted-by is not positive.
> > > > > > 
> > > > > > I don't understand why this needs to be a runtime sanitizer. The
> > > > > > signedness is known at compile time, so I would expect a -W option.
> > > > > 
> > > > > The signedness of the type but not of the value.
> > > > > 
> > > > > But I would not want to have a warning for signed 
> > > > > counter  types by default because I would prefer
> > > > > to use signed types (for various reasons including
> > > > > better overflow detection).
> > > > > 
> > > > > > Or
> > > > > > do you mean you'd split up -fsanitize=bounds between unsigned and signed
> > > > > > indexes? I'd find that kind of awkward for the kernel... but I feel like
> > > > > > I've misunderstood something. :)
> > > > > > 
> > > > > > -Kees
> > > > > 
> > > > > The idea would be to detect at run-time the case
> > > > > if  x->buf  is used at a time where   x->counter 
> > > > > is negative and also when x->counter * sizeof(x->buf[0])
> > > > > overflows or is too big.
> > > > > 
> > > > > This would be similar to
> > > > > 
> > > > > int a[n];
> > > > > 
> > > > > where it is detected at run-time if n is not-positive.
> > > > 
> > > > Right. I guess what I mean to say is that I would expect this case to
> > > > already be caught by -fsanitize=bounds -- I don't see a reason to add an
> > > > additional sanitizer option.
> > > > 
> > > > struct foo {
> > > > 	int count;
> > > > 	int array[] __counted_by(count);
> > > > };
> > > > 
> > > > 	foo->count = 5;
> > > > 	foo->array[0] = 1;	// ok
> > > > 	foo->array[10] = 1;	// -fsanitize=bounds will catch this
> > > > 	foo->array[-10] = 1;	// -fsanitize=bounds will catch this too
> > > > 
> > > > 
> > > 
> > > just checked this testing case with my GCC, and YES, -fsanitize=bounds indeed caught this error:
> > > 
> > > ttt_1.c:31:12: runtime error: index 10 out of bounds for type 'char [*]'
> > > ttt_1.c:32:12: runtime error: index -10 out of bounds for type 'char [*]’
> > > 
> > 
> > Yes, but I thought we were discussing the case where count is
> > set to a negative value:
> > 
> > foo->count = -1;
> > int x = foo->array[3]; // UBSan should diagnose this
> 
> Oh right, I keep thinking about it backwards.
> 
> Yeah, we can't trap the "count" assignment, because it may be getting used
> for other purposes. But yeah, access to "array" should trap if "count"
> is negative.
> 
> > And also the case when foo->array becomes too big.
> 
> How do you mean?

count * sizeof(member) could overflow or otherwise be
bigger than allowed.

Martin
Qing Zhao Oct. 23, 2023, 8:33 p.m. UTC | #42
> On Oct 23, 2023, at 3:37 PM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Montag, dem 23.10.2023 um 19:00 +0000 schrieb Qing Zhao:
>> 
>>> On Oct 23, 2023, at 2:31 PM, Martin Uecker <uecker@tugraz.at> wrote:
>>> 
>>> Am Montag, dem 23.10.2023 um 20:06 +0200 schrieb Martin Uecker:
>>>> Am Montag, dem 23.10.2023 um 16:37 +0000 schrieb Qing Zhao:
>>>>> 
>>>>>> On Oct 23, 2023, at 11:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Am 23.10.2023 um 16:56 schrieb Qing Zhao <qing.zhao@oracle.com>:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Oct 23, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>>>>>>>>> 
>>>>>>>>>> On 2023-10-20 14:38, Qing Zhao wrote:
>>>>>>>>>>> How about the following:
>>>>>>>>>>> Add one more parameter to __builtin_dynamic_object_size(), i.e
>>>>>>>>>>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>>>>>>>>>>> When we see the structure field has counted_by attribute.
>>>>>>>>>> 
>>>>>>>>>> Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think.
>>>>>>>>> 
>>>>>>>>> Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough?
>>>>>>>>> 
>>>>>>>>> But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
>>>>>>>>> 
>>>>>>>>> This might be the simplest solution?
>>>>>>>> 
>>>>>>>> If the dynamic object size is derived of a field then I think you need to
>>>>>>>> put the "load" of that memory location at the point (as argument)
>>>>>>>> of the __bos call right at parsing time.  I know that's awkward because
>>>>>>>> you try to play tricks "discovering" that field only late, but that's not
>>>>>>>> going to work.
>>>>>>> 
>>>>>>> Is it better to do this at gimplification phase instead of FE? 
>>>>>>> 
>>>>>>> VLA decls are handled in gimplification phase, the size calculation and call to alloca are all generated during this phase. (gimplify_vla_decl).
>>>>>>> 
>>>>>>> For __bdos calls, we can add an additional argument if the object’s first argument’s type include the counted_by attribute, i.e
>>>>>>> 
>>>>>>> ***During gimplification, 
>>>>>>> For a call to __builtin_dynamic_object_size (ptr, type)
>>>>>>> Check whether the type of ptr includes counted_by attribute, if so, change the call to
>>>>>>> __builtin_dynamic_object_size (ptr, type, counted_by field)
>>>>>>> 
>>>>>>> Then the correct data dependence should be represented well in the IR.
>>>>>>> 
>>>>>>> **During object size phase,
>>>>>>> 
>>>>>>> The call to __builtin_dynamic_object_size will become an expression includes the counted_by field or -1/0 when we cannot decide the size, the correct data dependence will be kept even the call to __builtin_dynamic_object_size is gone. 
>>>>>> 
>>>>>> But the whole point of the BOS pass is to derive information that is not available at parsing time, and that’s the cases you are after.  The case where the connection to the field with the length is apparent during parsing is easy - you simply insert a load of the value before the BOS call.
>>>>> 
>>>>> Yes, this is true. 
>>>>> I prefer to implement this in gimplification phase since I am more familiar with the code there.. (I think that implementing it in gimplification should be very similar as implementing it in FE? Or do I miss anything here?)
>>>>> 
>>>>> Joseph, if implement this in FE, where in the FE I should look at? 
>>>>> 
>>>> 
>>>> We should aim for a good integration with the BDOS pass, so
>>>> that it can propagate the information further, e.g. the 
>>>> following should work:
>>>> 
>>>> struct { int L; char buf[] __counted_by(L) } x;
>>>> x.L = N;
>>>> x.buf = ...;
>>>> char *p = &x->f;
>>>> __bdos(p) -> N
>>>> 
>>>> So we need to be smart on how we provide the size
>>>> information for x->f to the backend. 
>>> 
>>> To follow up on this. I do not think we should change the
>>> builtin in the FE or gimplification. Instead, we want 
>>> to change the field access and compute the size there. 
>> Could you please clarify on this? What do you mean by
>> "change the field access and compute the size there”?
> 
> I think the FE should essentially give the
> type
> 
> char [buf.L]
> 
> to buf.x;
> 
> If the type (or its size) could be preserved
> at this point so that it can be later
> discovered by __bdos, then it could know 
> the size and propagate it further.

Currently, we already store the size info  x.L of x.buf into the attribute list 
of the field_decl of “x.buf”, __bdos readily to use it without any issue. 

Putting “x.L” into TYPE of x.buf is the other approach, make it into a language extension. 

So, Do you mean to implement the attribute similar as the language extension now? 
i.e, convert the “attribute” info into the TYPE system at FE, then middle end will only use the TYPE info, 
not the attribute anymore? 

> 
> For the attribute, this is not exactly what
> the FE could do because the semantic type
> can not change, but this is roughly the idea.

So, the attribute still cannot be put into the regular TYPE system at FE, we need to come up with new field in the current TYPE system to
Carry such info? 

Then what’s the benefit from this new field in the TYPE system to my current approach (the attribute list of the field_decl)? 

Can this new approach resolve the reordering issue? 
> 
> 
>>> 
>>> In my toy patch I then made this have a VLA type that 
>>> encodes the size.  Here, this would need to be done 
>>> differently.
>>> 
>>> But still, what we are missing in both cases
>>> is a proper way to pass the information down to BDOS.
>> 
>> What’ s the issue with adding a new argument (x.L) to the BDOS call? What’s missing with this approach?
>> 
> 
> See the example above. the BDOS call might come much
> later when the relationship of the pointer to the
> field access is no longer there.

Why the relationship of the pointer to the field access is no longer there in _BDOS call in the above example? 
My understanding is that the relationship still there, that is recorded in the attribute list of the
 field_decl of the structure TYPE. BDOS call can access such information without any issue.

I tried to come up with a small testing case with your above example, but failed with a compilation error.

#include <stdint.h>
#include <malloc.h>

struct annotated {
  size_t L;
  char buf[] __attribute__((counted_by (L)));
};

int main ()
{
  struct annotated x; 
  x.L = 10;
  x.buf = (char *) malloc (x.L * sizeof (char));
  char *p = &(x.buf);
  size_t size = __builtin_dynamic_object_size (p, 1);
  printf("the size of q is %lu \n", size); 
  return 0;
}
/home/opc/Install/latest-d/bin/gcc -O3   t4.c
t4.c: In function ‘main’:
t4.c:13:9: error: invalid use of flexible array member
   13 |   x.buf = (char *) malloc (x.L * sizeof (char));
      |         ^
t4.c:14:13: warning: initialization of ‘char *’ from incompatible pointer type ‘char (*)[]’ [-Wincompatible-pointer-types]
   14 |   char *p = &(x.buf);
      |             ^
Could you please provide me a working testing case for this?

On the other hand, the following small testing case works without any issue with my GCC:
#include <stdint.h>
#include <malloc.h>

struct annotated {
  size_t foo;
  char array[] __attribute__((counted_by (foo)));
};

#define noinline __attribute__((__noinline__))

static struct annotated * noinline alloc_buf (int index)
{
  struct annotated *p;
  p = malloc(sizeof (*p) + (index) * sizeof (char));
  return p;
}

int main ()
{
  size_t size = 0;	 
  struct annotated *p = alloc_buf (10); 
  p->foo = 10;
  char *q = p->array;
  size = __builtin_dynamic_object_size (q, 1);
  printf("the size of q is %lu \n", size); 
  return 0;
}
[opc@qinzhao-ol8u3-x86 Sid]$ sh t
/home/opc/Install/latest-d/bin/gcc -O3 t3.c
the size of q is 10 


> 
>>> 
>>> For VLAs this works because BDOS can see the size of
>>> the definition.  For calls to allocation functions
>>> it is read from an attribute. 
>> 
>> You mean for VLA, BDOS see the size of the definition
>> from the attribute for the allocation function?
>> Yes, that’s the case for VLA. 
> 
> Ok, I am wrong about how it works for VLAs. They
> get transformed to an alloca.
> 
> But all calls marked with alloc_size and other
> allocations functions are detected in BDOS.  

Yes. 

Qing
> 
> 
>> 
>> For VLA, the size computation and storage allocation are all done by the compiler (through “gimplify_vla_decl” in gimplification phase), 
>> So these two can be tied together by the compiler. 
>> 
>> However, for FMA with counted_by attribute, the
>> storage allocation and the counted_by assignment
>> are done by the user.  
> 
> Yes.
> 
> Martin
> 
>> 
>> Qing
>>> 
>>> But I am not sure what would be the best way to encode
>>> this information so that BDOS can later access it.
>>> 
>>> Martin
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> This would also be desirable for the language extension. 
>>>> 
>>>> Martin
>>>> 
>>>> 
>>>>> Thanks a lot for the help.
>>>>> 
>>>>> Qing
>>>>> 
>>>>>> For the late case there’s no way to invent data flow dependence without inadvertently pessimizing optimization.
>>>>>> 
>>>>>> Richard 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> A related issue is that assignment to the field and storage allocation
>>>>>>>> are not tied together
>>>>>>> 
>>>>>>> Yes, this is different from VLA, in which, the size assignment and the storage allocation are generated and tied together by the compiler.
>>>>>>> 
>>>>>>> For the flexible array member, the storage allocation and the size assignment are all done by the user. So, We need to clarify such requirement  in the document to guide user to write correct code.  And also, we might need to provide tools (warnings and sanitizer option) to help users to catch such coding error.
>>>>>>> 
>>>>>>>> - if there's no use of the size data we might
>>>>>>>> remove the store of it as dead.
>>>>>>> 
>>>>>>> Yes, when __bdos cannot decide the size, we need to remove the dead store to the field.
>>>>>>> I guess that the compiler should be able to do this automatically?
>>>>>>> 
>>>>>>> thanks.
>>>>>>> 
>>>>>>> Qing
>>>>>>>> 
>>>>>>>> Of course I guess __bos then behaves like sizeof ().
>>>>>>>> 
>>>>>>>> Richard.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Qing
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> It may not work for something like this though:
>>>>>>>>>> 
>>>>>>>>>> static size_t
>>>>>>>>>> get_size_of (void *ptr)
>>>>>>>>>> {
>>>>>>>>>> return __bdos (ptr, 1);
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> void
>>>>>>>>>> foo (size_t sz)
>>>>>>>>>> {
>>>>>>>>>> array_annotated = __builtin_malloc (sz);
>>>>>>>>>> array_annotated = sz;
>>>>>>>>>> 
>>>>>>>>>> ...
>>>>>>>>>> __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
>>>>>>>>>> ...
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> because the call to get_size_of () may not have been inlined that early.
>>>>>>>>>> 
>>>>>>>>>> The more fool-proof alternative may be to put a compile time barrier right below the assignment to array_annotated->foo; I reckon you could do that early in the front end by marking the size identifier and then tracking assignments to that identifier.  That may have a slight runtime performance overhead since it may prevent even legitimate reordering.  I can't think of another alternative at the moment...
>>>>>>>>>> 
>>>>>>>>>> Sid
>> 
>
Kees Cook Oct. 23, 2023, 10:03 p.m. UTC | #43
On Mon, Oct 23, 2023 at 09:57:45PM +0200, Martin Uecker wrote:
> Am Montag, dem 23.10.2023 um 12:52 -0700 schrieb Kees Cook:
> > On Fri, Oct 20, 2023 at 09:54:05PM +0200, Martin Uecker wrote:
> > > Am Freitag, dem 20.10.2023 um 18:48 +0000 schrieb Qing Zhao:
> > > > 
> > > > > On Oct 20, 2023, at 2:34 PM, Kees Cook <keescook@chromium.org> wrote:
> > > > > 
> > > > > On Fri, Oct 20, 2023 at 11:50:11AM +0200, Martin Uecker wrote:
> > > > > > Am Donnerstag, dem 19.10.2023 um 16:33 -0700 schrieb Kees Cook:
> > > > > > > On Wed, Oct 18, 2023 at 09:11:43PM +0000, Qing Zhao wrote:
> > > > > > > > As I replied to Martin in another email, I plan to do the following to resolve this issue:
> > > > > > > > 
> > > > > > > > 1. No specification for signed or unsigned for counted_by field.
> > > > > > > > 2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases when the size of the counted-by is not positive.
> > > > > > > 
> > > > > > > I don't understand why this needs to be a runtime sanitizer. The
> > > > > > > signedness is known at compile time, so I would expect a -W option.
> > > > > > 
> > > > > > The signedness of the type but not of the value.
> > > > > > 
> > > > > > But I would not want to have a warning for signed 
> > > > > > counter  types by default because I would prefer
> > > > > > to use signed types (for various reasons including
> > > > > > better overflow detection).
> > > > > > 
> > > > > > > Or
> > > > > > > do you mean you'd split up -fsanitize=bounds between unsigned and signed
> > > > > > > indexes? I'd find that kind of awkward for the kernel... but I feel like
> > > > > > > I've misunderstood something. :)
> > > > > > > 
> > > > > > > -Kees
> > > > > > 
> > > > > > The idea would be to detect at run-time the case
> > > > > > if  x->buf  is used at a time where   x->counter 
> > > > > > is negative and also when x->counter * sizeof(x->buf[0])
> > > > > > overflows or is too big.
> > > > > > 
> > > > > > This would be similar to
> > > > > > 
> > > > > > int a[n];
> > > > > > 
> > > > > > where it is detected at run-time if n is not-positive.
> > > > > 
> > > > > Right. I guess what I mean to say is that I would expect this case to
> > > > > already be caught by -fsanitize=bounds -- I don't see a reason to add an
> > > > > additional sanitizer option.
> > > > > 
> > > > > struct foo {
> > > > > 	int count;
> > > > > 	int array[] __counted_by(count);
> > > > > };
> > > > > 
> > > > > 	foo->count = 5;
> > > > > 	foo->array[0] = 1;	// ok
> > > > > 	foo->array[10] = 1;	// -fsanitize=bounds will catch this
> > > > > 	foo->array[-10] = 1;	// -fsanitize=bounds will catch this too
> > > > > 
> > > > > 
> > > > 
> > > > just checked this testing case with my GCC, and YES, -fsanitize=bounds indeed caught this error:
> > > > 
> > > > ttt_1.c:31:12: runtime error: index 10 out of bounds for type 'char [*]'
> > > > ttt_1.c:32:12: runtime error: index -10 out of bounds for type 'char [*]’
> > > > 
> > > 
> > > Yes, but I thought we were discussing the case where count is
> > > set to a negative value:
> > > 
> > > foo->count = -1;
> > > int x = foo->array[3]; // UBSan should diagnose this
> > 
> > Oh right, I keep thinking about it backwards.
> > 
> > Yeah, we can't trap the "count" assignment, because it may be getting used
> > for other purposes. But yeah, access to "array" should trap if "count"
> > is negative.
> > 
> > > And also the case when foo->array becomes too big.
> > 
> > How do you mean?
> 
> count * sizeof(member) could overflow or otherwise be
> bigger than allowed.

Ah! Yes.

foo->count = SIZE_MAX;
foo->array[0]; // UBSan diagnose:
               // SIZE_MAX * sizeof(int) is larger than can be represented

> 
> Martin
> 
>
Siddhesh Poyarekar Oct. 23, 2023, 10:48 p.m. UTC | #44
On 2023-10-23 15:43, Qing Zhao wrote:
> 
> 
>> On Oct 23, 2023, at 2:43 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>
>> On 2023-10-23 14:06, Martin Uecker wrote:
>>> We should aim for a good integration with the BDOS pass, so
>>> that it can propagate the information further, e.g. the
>>> following should work:
>>> struct { int L; char buf[] __counted_by(L) } x;
>>> x.L = N;
>>> x.buf = ...;
>>> char *p = &x->f;
>>> __bdos(p) -> N
>>> So we need to be smart on how we provide the size
>>> information for x->f to the backend.
>>> This would also be desirable for the language extension.
>>
>> This is essentially why there need to be frontend rules constraining reordering and reachability semantics of x.L, thus restricting DSE and reordering for it.
> 
> My understanding is that Restricting DSE and reordering should be done by the proper data flow information, with a new argument added to the BDOS call, this correct data flow information could be maintained, and then the DSE and reordering will not happen.
> 
> I don’t quite understand what kind of frontend rules should be added to constrain reordering and reachability semantics? Can you explain this a little bit more? Do you mean to add some rules or requirment to the new attribute that the users of the attribute should follow in the source code?

Yes, but let me try and summarize the issues and the potential solutions 
at the end:

> 
>>   This is not really a __bdos/__bos question, because that bit is trivial; if the structure is visible, the value is simply x.L.  This is also why adding a reference to x.L in __bos/__bdos is not sufficient or even possible in, e.g. the above case you note.
> 
> I am a little confused here, are we discussing how to resolve the potential reordering issue of the following:
> 
> "
> struct annotated {
>    size_t foo;
>    char array[] __attribute__((counted_by (foo)));
> };
> 
>    p->foo = 10;
>    size = __builtin_dynamic_object_size (p->array,1);
> “?
> 
> Or a bigger issue?

Right, so the problem we're trying to solve is the reordering of __bdos 
w.r.t. initialization of the size parameter but to also account for DSE 
of the assignment, we can abstract this problem to that of DFA being 
unable to see implicit use of the size parameter.  __bdos is the one 
such implicit user of the size parameter and you're proposing to solve 
this by encoding the relationship between buffer and size at the __bdos 
call site.  But what about the case when the instantiation of the object 
is not at the same place as the __bdos call site, i.e. the DFA is unable 
to make that relationship?

The example Martin showed where the subobject gets "hidden" behind a 
pointer was a trivial one where DFA *may* actually work in practice 
(because the object-size pass can thread through these assignments) but 
think about this one:

struct A
{
   size_t size;
   char buf[] __attribute__((counted_by(size)));
}

static size_t
get_size_of (void *ptr)
{
   return __bdos (ptr, 1);
}

void
foo (size_t sz)
{
   struct A *obj = __builtin_malloc (sz);
   obj->size = sz;

   ...
   __builtin_printf ("%zu\n", get_size_of (obj->array));
   ...
}

Until get_size_of is inlined, no DFA can see the __bdos call in the same 
place as the point where obj is allocated.  As a result, the assignment 
to obj->size could get reordered (or the store eliminated) w.r.t. the 
__bdos call until the inlining happens.

As a result, the relationship between buf and size established by the 
attribute needs to be encoded into the type somehow.  There are two options:

Option 1: Encode the relationship in the type of buf

This is kinda what you end up doing with component_ref_has_counted_by 
and it does show the relationship if one is looking (through that call), 
but nothing more that can be used to, e.g. prevent reordering or tell 
the optimizer that the reference to the buf member may imply a reference 
to the size member as well.  This could be remedied by somehow encoding 
the USES relationship for size into the type of buf that the 
optimization passes can see.  I feel like this may be a bit convoluted 
to specify in a future language extension in a way that will actually be 
well understood by developers, but it will likely generate faster 
runtime code.  This will also likely require a bigger change across passes.

Option 2: Encode the relationship in the type of size

The other option is to enhance the type of size somehow so that it 
discourages reordering and store elimination, basically pessimizing 
code.  I think volatile semantics might be the way to do this and may 
even be straightforward to specify in the future language extension 
given that it builds on a known language construct and is thematically 
related.  However it does pessimize output for code that implements 
__counted_by__.

Thanks,
Sid
Qing Zhao Oct. 24, 2023, 8:30 p.m. UTC | #45
Hi, Sid,

Really appreciate for your example and detailed explanation. Very helpful.
I think that this example is an excellent example to show (almost) all the issues we need to consider.

I slightly modified this example to make it to be compilable and run-able, as following: 
(but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)

  1 #include <malloc.h>
  2 struct A
  3 {
  4  size_t size;
  5  char buf[] __attribute__((counted_by(size)));
  6 };
  7 
  8 static size_t
  9 get_size_from (void *ptr)
 10 {
 11  return __builtin_dynamic_object_size (ptr, 1);
 12 }
 13 
 14 void
 15 foo (size_t sz)
 16 {
 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 18  obj->size = sz;
 19  obj->buf[0] = 2;
 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
 21  return;
 22 }
 23 
 24 int main ()
 25 {
 26  foo (20);
 27  return 0;
 28 }

With my GCC, it was compiled and worked:
[opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
[opc@qinzhao-ol8u3-x86 ]$ ./a.out
20
Situation 1: With O1 and above, the routine “get_size_from” was inlined into “foo”, therefore, the call to __bdos is in the same routine as the instantiation of the object, and the TYPE information and the attached counted_by attribute information in the TYPE of the object can be USED by the __bdos call to compute the final object size. 

[opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
[opc@qinzhao-ol8u3-x86 ]$ ./a.out
-1
Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call. 

Keep in mind of the above 2 situations, we will refer them in below:

1. First,  the problem we are trying to resolve is:

(Your description):

>  the reordering of __bdos w.r.t. initialization of the size parameter but to also account for DSE of the assignment, we can abstract this problem to that of DFA being unable to see implicit use of the size parameter in the __bdos call.

basically is correct.  However, with the following exception:

The implicit use of the size parameter in the __bdos call is not always there, it ONLY exists WHEN the __bdos is able to evaluated to an expression of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the above example. 
 In the “Situation 2”, when the __bdos does not see the TYPE of the real object,  it does not see the counted_by information from the TYPE, therefore,  it is not able to evaluate the size of the object through the counted_by information.  As a result, the implicit use of the size parameter in the __bdos call does NOT exist at all.  The optimizer can freely reorder the initialization of the size parameter with the __bdos call since there is no data flow dependency between these two. 

With this exception in mind, we can see that your proposed “option 2” (making the type of size “volatile”) is too conservative, it will  disable many optimizations  unnecessarily, even though it’s safe and simple to implement. 

As a compiler optimization person for many many years, I really don’t want to take this approach at this moment.  -:)

2. Some facts I’d like to mention:

A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE optimization stage. During RTL stage,  the __bdos call has already been replaced by an expression of the size parameter or a constant, the data dependency is explicitly in the IR already.  I believe that the data analysis in RTL stage should pick up the data dependency correctly, No special handling is needed in RTL.

B. If the __bdos call cannot see the real object , it has no way to get the “counted_by” field from the TYPE of the real object. So, if we try to add the implicit use of the “counted_by” field to the __bdos call, the object instantiation should be in the same routine as the __bdos call.  Both the FE and the gimplification phase are too early to do this work. 

2. Then, what’s the best approach to resolve this problem:

There were several suggestions so far:

A.  Add an additional argument, the size parameter,  to __bdos, 
      A.1, during FE;
      A.2, during gimplification phase;
B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
C.  Encode the implicit USE  in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.

As I explained in the above, 
** Approach A (both A.1 and A.2) does not work;
** Approach B will have big performance impact, I’d prefer not to take this approach at this moment.
** Approach C will be a lot of change in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.

So, all the above proposed approaches, A, B, C, are not very good. 

Then, maybe the following might work better?

In the tree optimization stage, 
    * After the inlining transformation applied,  
+  * Before the data-flow related optimization happens, 
+  * when the data flow analysis is constructed, 

For each call to __bdos, add the implicit use of size parameter. 

Is this doable? 

Otherwise, we might need to take the “volatile” approach. 

Let me know your suggestion and comment.

Thanks a lot.

Qing


>  __bdos is the one such implicit user of the size parameter and you're proposing to solve this by encoding the relationship between buffer and size at the __bdos call site.  But what about the case when the instantiation of the object is not at the same place as the __bdos call site, i.e. the DFA is unable to make that relationship?
> 
> The example Martin showed where the subobject gets "hidden" behind a pointer was a trivial one where DFA *may* actually work in practice (because the object-size pass can thread through these assignments) but think about this one:
> 
> struct A
> {
>  size_t size;
>  char buf[] __attribute__((counted_by(size)));
> }
> 
> static size_t
> get_size_of (void *ptr)
> {
>  return __bdos (ptr, 1);
> }
> 
> void
> foo (size_t sz)
> {
>  struct A *obj = __builtin_malloc (sz);
>  obj->size = sz;
> 
>  ...
>  __builtin_printf ("%zu\n", get_size_of (obj->array));
>  ...
> }
> 
> Until get_size_of is inlined, no DFA can see the __bdos call in the same place as the point where obj is allocated.  As a result, the assignment to obj->size could get reordered (or the store eliminated) w.r.t. the __bdos call until the inlining happens.
> 
> As a result, the relationship between buf and size established by the attribute needs to be encoded into the type somehow.  There are two options:
> 
> Option 1: Encode the relationship in the type of buf
> 
> This is kinda what you end up doing with component_ref_has_counted_by and it does show the relationship if one is looking (through that call), but nothing more that can be used to, e.g. prevent reordering or tell the optimizer that the reference to the buf member may imply a reference to the size member as well.  This could be remedied by somehow encoding the USES relationship for size into the type of buf that the optimization passes can see.  I feel like this may be a bit convoluted to specify in a future language extension in a way that will actually be well understood by developers, but it will likely generate faster runtime code.  This will also likely require a bigger change across passes.
> 
> Option 2: Encode the relationship in the type of size
> 
> The other option is to enhance the type of size somehow so that it discourages reordering and store elimination, basically pessimizing code.  I think volatile semantics might be the way to do this and may even be straightforward to specify in the future language extension given that it builds on a known language construct and is thematically related.  However it does pessimize output for code that implements __counted_by__.
> 
> Thanks,
> Sid
Martin Uecker Oct. 24, 2023, 8:38 p.m. UTC | #46
Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> Hi, Sid,
> 
> Really appreciate for your example and detailed explanation. Very helpful.
> I think that this example is an excellent example to show (almost) all the issues we need to consider.
> 
> I slightly modified this example to make it to be compilable and run-able, as following: 
> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
> 
>   1 #include <malloc.h>
>   2 struct A
>   3 {
>   4  size_t size;
>   5  char buf[] __attribute__((counted_by(size)));
>   6 };
>   7 
>   8 static size_t
>   9 get_size_from (void *ptr)
>  10 {
>  11  return __builtin_dynamic_object_size (ptr, 1);
>  12 }
>  13 
>  14 void
>  15 foo (size_t sz)
>  16 {
>  17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>  18  obj->size = sz;
>  19  obj->buf[0] = 2;
>  20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>  21  return;
>  22 }
>  23 
>  24 int main ()
>  25 {
>  26  foo (20);
>  27  return 0;
>  28 }
> 
> With my GCC, it was compiled and worked:
> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> 20
> Situation 1: With O1 and above, the routine “get_size_from” was inlined into “foo”, therefore, the call to __bdos is in the same routine as the instantiation of the object, and the TYPE information and the attached counted_by attribute information in the TYPE of the object can be USED by the __bdos call to compute the final object size. 
> 
> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> -1
> Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call. 
> 
> Keep in mind of the above 2 situations, we will refer them in below:
> 
> 1. First,  the problem we are trying to resolve is:
> 
> (Your description):
> 
> >  the reordering of __bdos w.r.t. initialization of the size parameter but to also account for DSE of the assignment, we can abstract this problem to that of DFA being unable to see implicit use of the size parameter in the __bdos call.
> 
> basically is correct.  However, with the following exception:
> 
> The implicit use of the size parameter in the __bdos call is not always there, it ONLY exists WHEN the __bdos is able to evaluated to an expression of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the above example. 
>  In the “Situation 2”, when the __bdos does not see the TYPE of the real object,  it does not see the counted_by information from the TYPE, therefore,  it is not able to evaluate the size of the object through the counted_by information.  As a result, the implicit use of the size parameter in the __bdos call does NOT exist at all.  The optimizer can freely reorder the initialization of the size parameter with the __bdos call since there is no data flow dependency between these two. 
> 
> With this exception in mind, we can see that your proposed “option 2” (making the type of size “volatile”) is too conservative, it will  disable many optimizations  unnecessarily, even though it’s safe and simple to implement. 
> 
> As a compiler optimization person for many many years, I really don’t want to take this approach at this moment.  -:)
> 
> 2. Some facts I’d like to mention:
> 
> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE optimization stage. During RTL stage,  the __bdos call has already been replaced by an expression of the size parameter or a constant, the data dependency is explicitly in the IR already.  I believe that the data analysis in RTL stage should pick up the data dependency correctly, No special handling is needed in RTL.
> 
> B. If the __bdos call cannot see the real object , it has no way to get the “counted_by” field from the TYPE of the real object. So, if we try to add the implicit use of the “counted_by” field to the __bdos call, the object instantiation should be in the same routine as the __bdos call.  Both the FE and the gimplification phase are too early to do this work. 
> 
> 2. Then, what’s the best approach to resolve this problem:
> 
> There were several suggestions so far:
> 
> A.  Add an additional argument, the size parameter,  to __bdos, 
>       A.1, during FE;
>       A.2, during gimplification phase;
> B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
> C.  Encode the implicit USE  in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.
> 
> As I explained in the above, 
> ** Approach A (both A.1 and A.2) does not work;
> ** Approach B will have big performance impact, I’d prefer not to take this approach at this moment.
> ** Approach C will be a lot of change in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.
> 
> So, all the above proposed approaches, A, B, C, are not very good. 
> 
> Then, maybe the following might work better?
> 
> In the tree optimization stage, 
>     * After the inlining transformation applied,  
> +  * Before the data-flow related optimization happens, 
> +  * when the data flow analysis is constructed, 
> 
> For each call to __bdos, add the implicit use of size parameter. 
> 
> Is this doable? 

Here is another proposal:  Add a new builtin function

__builtin_with_size(x, size)

that return x but behaves similar to an allocation
function in that BDOS can look at the size argument
to discover the size.

The FE insers this function when the field is accessed:

__builtin_with_size(x.buf, x.L);


Martin



> 
> Otherwise, we might need to take the “volatile” approach. 
> 
> Let me know your suggestion and comment.
> 
> Thanks a lot.
> 
> Qing
> 
> 
> >  __bdos is the one such implicit user of the size parameter and you're proposing to solve this by encoding the relationship between buffer and size at the __bdos call site.  But what about the case when the instantiation of the object is not at the same place as the __bdos call site, i.e. the DFA is unable to make that relationship?
> > 
> > The example Martin showed where the subobject gets "hidden" behind a pointer was a trivial one where DFA *may* actually work in practice (because the object-size pass can thread through these assignments) but think about this one:
> > 
> > struct A
> > {
> >  size_t size;
> >  char buf[] __attribute__((counted_by(size)));
> > }
> > 
> > static size_t
> > get_size_of (void *ptr)
> > {
> >  return __bdos (ptr, 1);
> > }
> > 
> > void
> > foo (size_t sz)
> > {
> >  struct A *obj = __builtin_malloc (sz);
> >  obj->size = sz;
> > 
> >  ...
> >  __builtin_printf ("%zu\n", get_size_of (obj->array));
> >  ...
> > }
> > 
> > Until get_size_of is inlined, no DFA can see the __bdos call in the same place as the point where obj is allocated.  As a result, the assignment to obj->size could get reordered (or the store eliminated) w.r.t. the __bdos call until the inlining happens.
> > 
> > As a result, the relationship between buf and size established by the attribute needs to be encoded into the type somehow.  There are two options:
> > 
> > Option 1: Encode the relationship in the type of buf
> > 
> > This is kinda what you end up doing with component_ref_has_counted_by and it does show the relationship if one is looking (through that call), but nothing more that can be used to, e.g. prevent reordering or tell the optimizer that the reference to the buf member may imply a reference to the size member as well.  This could be remedied by somehow encoding the USES relationship for size into the type of buf that the optimization passes can see.  I feel like this may be a bit convoluted to specify in a future language extension in a way that will actually be well understood by developers, but it will likely generate faster runtime code.  This will also likely require a bigger change across passes.
> > 
> > Option 2: Encode the relationship in the type of size
> > 
> > The other option is to enhance the type of size somehow so that it discourages reordering and store elimination, basically pessimizing code.  I think volatile semantics might be the way to do this and may even be straightforward to specify in the future language extension given that it builds on a known language construct and is thematically related.  However it does pessimize output for code that implements __counted_by__.
> > 
> > Thanks,
> > Sid
>
Siddhesh Poyarekar Oct. 24, 2023, 9:03 p.m. UTC | #47
On 2023-10-24 16:30, Qing Zhao wrote:
> Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call.
> 

But __bos/__bdos are barely useful without optimization; you need a 
minimum of -O1.  You're right that if the call is never inlined then we 
don't care because the __bdos call does not get expanded to obj->size.

However, the point of situation 2 is that the TYPE info cannot be used 
by the __bdos call *only for a while* (i.e. until the call gets inlined) 
and that window is an opportunity for the reordering/DSE to break things.

Thanks.
Sid
Siddhesh Poyarekar Oct. 24, 2023, 9:09 p.m. UTC | #48
On 2023-10-24 16:38, Martin Uecker wrote:
> Here is another proposal:  Add a new builtin function
> 
> __builtin_with_size(x, size)
> 
> that return x but behaves similar to an allocation
> function in that BDOS can look at the size argument
> to discover the size.
> 
> The FE insers this function when the field is accessed:
> 
> __builtin_with_size(x.buf, x.L);
> 

In fact if we do this at the allocation site for x, it may also help 
with future warnings, where the compiler could flag a warning or error 
when it encounters this builtin but does not see an assignment to x.L.

Thanks,
Sid
Qing Zhao Oct. 24, 2023, 10:41 p.m. UTC | #49
> On Oct 24, 2023, at 5:03 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> 
> On 2023-10-24 16:30, Qing Zhao wrote:
>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call.
> 
> But __bos/__bdos are barely useful without optimization; you need a minimum of -O1.  You're right that if the call is never inlined then we don't care because the __bdos call does not get expanded to obj->size.
> 
> However, the point of situation 2 is that the TYPE info cannot be used by the __bdos call *only for a while* (i.e. until the call gets inlined) and that window is an opportunity for the reordering/DSE to break things.

The main point of situation 2 I tried made: there are situations where obj->size is not used at all by the __bdos, marking it as volatile is too conservative, unnecessarily prevent useful optimizations from happening.  -:)

Qing
> 
> Thanks.
> Sid
Qing Zhao Oct. 24, 2023, 10:51 p.m. UTC | #50
> On Oct 24, 2023, at 4:38 PM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>> Hi, Sid,
>> 
>> Really appreciate for your example and detailed explanation. Very helpful.
>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>> 
>> I slightly modified this example to make it to be compilable and run-able, as following: 
>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>> 
>>  1 #include <malloc.h>
>>  2 struct A
>>  3 {
>>  4  size_t size;
>>  5  char buf[] __attribute__((counted_by(size)));
>>  6 };
>>  7 
>>  8 static size_t
>>  9 get_size_from (void *ptr)
>> 10 {
>> 11  return __builtin_dynamic_object_size (ptr, 1);
>> 12 }
>> 13 
>> 14 void
>> 15 foo (size_t sz)
>> 16 {
>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> 18  obj->size = sz;
>> 19  obj->buf[0] = 2;
>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>> 21  return;
>> 22 }
>> 23 
>> 24 int main ()
>> 25 {
>> 26  foo (20);
>> 27  return 0;
>> 28 }
>> 
>> With my GCC, it was compiled and worked:
>> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>> 20
>> Situation 1: With O1 and above, the routine “get_size_from” was inlined into “foo”, therefore, the call to __bdos is in the same routine as the instantiation of the object, and the TYPE information and the attached counted_by attribute information in the TYPE of the object can be USED by the __bdos call to compute the final object size. 
>> 
>> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>> -1
>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call. 
>> 
>> Keep in mind of the above 2 situations, we will refer them in below:
>> 
>> 1. First,  the problem we are trying to resolve is:
>> 
>> (Your description):
>> 
>>> the reordering of __bdos w.r.t. initialization of the size parameter but to also account for DSE of the assignment, we can abstract this problem to that of DFA being unable to see implicit use of the size parameter in the __bdos call.
>> 
>> basically is correct.  However, with the following exception:
>> 
>> The implicit use of the size parameter in the __bdos call is not always there, it ONLY exists WHEN the __bdos is able to evaluated to an expression of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the above example. 
>> In the “Situation 2”, when the __bdos does not see the TYPE of the real object,  it does not see the counted_by information from the TYPE, therefore,  it is not able to evaluate the size of the object through the counted_by information.  As a result, the implicit use of the size parameter in the __bdos call does NOT exist at all.  The optimizer can freely reorder the initialization of the size parameter with the __bdos call since there is no data flow dependency between these two. 
>> 
>> With this exception in mind, we can see that your proposed “option 2” (making the type of size “volatile”) is too conservative, it will  disable many optimizations  unnecessarily, even though it’s safe and simple to implement. 
>> 
>> As a compiler optimization person for many many years, I really don’t want to take this approach at this moment.  -:)
>> 
>> 2. Some facts I’d like to mention:
>> 
>> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE optimization stage. During RTL stage,  the __bdos call has already been replaced by an expression of the size parameter or a constant, the data dependency is explicitly in the IR already.  I believe that the data analysis in RTL stage should pick up the data dependency correctly, No special handling is needed in RTL.
>> 
>> B. If the __bdos call cannot see the real object , it has no way to get the “counted_by” field from the TYPE of the real object. So, if we try to add the implicit use of the “counted_by” field to the __bdos call, the object instantiation should be in the same routine as the __bdos call.  Both the FE and the gimplification phase are too early to do this work. 
>> 
>> 2. Then, what’s the best approach to resolve this problem:
>> 
>> There were several suggestions so far:
>> 
>> A.  Add an additional argument, the size parameter,  to __bdos, 
>>      A.1, during FE;
>>      A.2, during gimplification phase;
>> B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
>> C.  Encode the implicit USE  in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.
>> 
>> As I explained in the above, 
>> ** Approach A (both A.1 and A.2) does not work;
>> ** Approach B will have big performance impact, I’d prefer not to take this approach at this moment.
>> ** Approach C will be a lot of change in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.
>> 
>> So, all the above proposed approaches, A, B, C, are not very good. 
>> 
>> Then, maybe the following might work better?
>> 
>> In the tree optimization stage, 
>>    * After the inlining transformation applied,  
>> +  * Before the data-flow related optimization happens, 
>> +  * when the data flow analysis is constructed, 
>> 
>> For each call to __bdos, add the implicit use of size parameter. 
>> 
>> Is this doable? 
> 
> Here is another proposal:  Add a new builtin function
> 
> __builtin_with_size(x, size)
> 
> that return x but behaves similar to an allocation
> function in that BDOS can look at the size argument
> to discover the size.
> 
> The FE insers this function when the field is accessed:
> 
> __builtin_with_size(x.buf, x.L);

Thanks for the proposal!

So what you suggested is:

For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, then the call to the _bdos (x.buf, 1) will
Become:

   _bdos(__builtin_with_size(x.buf, x.L), 1)?

Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?

This looks like a very promising solution.

Will study this a. Little bit more.

Qing
> 
> 
> Martin
> 
> 
> 
>> 
>> Otherwise, we might need to take the “volatile” approach. 
>> 
>> Let me know your suggestion and comment.
>> 
>> Thanks a lot.
>> 
>> Qing
>> 
>> 
>>> __bdos is the one such implicit user of the size parameter and you're proposing to solve this by encoding the relationship between buffer and size at the __bdos call site.  But what about the case when the instantiation of the object is not at the same place as the __bdos call site, i.e. the DFA is unable to make that relationship?
>>> 
>>> The example Martin showed where the subobject gets "hidden" behind a pointer was a trivial one where DFA *may* actually work in practice (because the object-size pass can thread through these assignments) but think about this one:
>>> 
>>> struct A
>>> {
>>> size_t size;
>>> char buf[] __attribute__((counted_by(size)));
>>> }
>>> 
>>> static size_t
>>> get_size_of (void *ptr)
>>> {
>>> return __bdos (ptr, 1);
>>> }
>>> 
>>> void
>>> foo (size_t sz)
>>> {
>>> struct A *obj = __builtin_malloc (sz);
>>> obj->size = sz;
>>> 
>>> ...
>>> __builtin_printf ("%zu\n", get_size_of (obj->array));
>>> ...
>>> }
>>> 
>>> Until get_size_of is inlined, no DFA can see the __bdos call in the same place as the point where obj is allocated.  As a result, the assignment to obj->size could get reordered (or the store eliminated) w.r.t. the __bdos call until the inlining happens.
>>> 
>>> As a result, the relationship between buf and size established by the attribute needs to be encoded into the type somehow.  There are two options:
>>> 
>>> Option 1: Encode the relationship in the type of buf
>>> 
>>> This is kinda what you end up doing with component_ref_has_counted_by and it does show the relationship if one is looking (through that call), but nothing more that can be used to, e.g. prevent reordering or tell the optimizer that the reference to the buf member may imply a reference to the size member as well.  This could be remedied by somehow encoding the USES relationship for size into the type of buf that the optimization passes can see.  I feel like this may be a bit convoluted to specify in a future language extension in a way that will actually be well understood by developers, but it will likely generate faster runtime code.  This will also likely require a bigger change across passes.
>>> 
>>> Option 2: Encode the relationship in the type of size
>>> 
>>> The other option is to enhance the type of size somehow so that it discourages reordering and store elimination, basically pessimizing code.  I think volatile semantics might be the way to do this and may even be straightforward to specify in the future language extension given that it builds on a known language construct and is thematically related.  However it does pessimize output for code that implements __counted_by__.
>>> 
>>> Thanks,
>>> Sid
Siddhesh Poyarekar Oct. 24, 2023, 11:51 p.m. UTC | #51
On 2023-10-24 18:41, Qing Zhao wrote:
> 
> 
>> On Oct 24, 2023, at 5:03 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>
>> On 2023-10-24 16:30, Qing Zhao wrote:
>>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call.
>>
>> But __bos/__bdos are barely useful without optimization; you need a minimum of -O1.  You're right that if the call is never inlined then we don't care because the __bdos call does not get expanded to obj->size.
>>
>> However, the point of situation 2 is that the TYPE info cannot be used by the __bdos call *only for a while* (i.e. until the call gets inlined) and that window is an opportunity for the reordering/DSE to break things.
> 
> The main point of situation 2 I tried made: there are situations where obj->size is not used at all by the __bdos, marking it as volatile is too conservative, unnecessarily prevent useful optimizations from happening.  -:)

Yes, that's the tradeoff.  However, maybe this is the point where Kees 
jumps in and say the kernel doesn't really care as much or something 
like that :)

Sid
Siddhesh Poyarekar Oct. 24, 2023, 11:56 p.m. UTC | #52
On 2023-10-24 18:51, Qing Zhao wrote:
> Thanks for the proposal!
> 
> So what you suggested is:
> 
> For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, then the call to the _bdos (x.buf, 1) will
> Become:
> 
>     _bdos(__builtin_with_size(x.buf, x.L), 1)?
> 
> Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?

Oops, I think Martin and I fell off-list in a subthread.  I clarified 
that my comment was that any such annotation at object reference is 
probably too late and hence not the right place for it; basically it has 
the same problems as the option A in your comment.  A better place to 
reinforce such a relationship would be the allocation+initialization 
site instead.

Thanks,
Sid
Martin Uecker Oct. 25, 2023, 5:26 a.m. UTC | #53
Am Dienstag, dem 24.10.2023 um 22:51 +0000 schrieb Qing Zhao:
> 
> > On Oct 24, 2023, at 4:38 PM, Martin Uecker <uecker@tugraz.at> wrote:
> > 
> > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> > > Hi, Sid,
> > > 
> > > Really appreciate for your example and detailed explanation. Very helpful.
> > > I think that this example is an excellent example to show (almost) all the issues we need to consider.
> > > 
> > > I slightly modified this example to make it to be compilable and run-able, as following: 
> > > (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
> > > 
> > >  1 #include <malloc.h>
> > >  2 struct A
> > >  3 {
> > >  4  size_t size;
> > >  5  char buf[] __attribute__((counted_by(size)));
> > >  6 };
> > >  7 
> > >  8 static size_t
> > >  9 get_size_from (void *ptr)
> > > 10 {
> > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > 12 }
> > > 13 
> > > 14 void
> > > 15 foo (size_t sz)
> > > 16 {
> > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > 18  obj->size = sz;
> > > 19  obj->buf[0] = 2;
> > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > 21  return;
> > > 22 }
> > > 23 
> > > 24 int main ()
> > > 25 {
> > > 26  foo (20);
> > > 27  return 0;
> > > 28 }
> > > 
> > > With my GCC, it was compiled and worked:
> > > [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
> > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > 20
> > > Situation 1: With O1 and above, the routine “get_size_from” was inlined into “foo”, therefore, the call to __bdos is in the same routine as the instantiation of the object, and the TYPE information and the attached counted_by attribute information in the TYPE of the object can be USED by the __bdos call to compute the final object size. 
> > > 
> > > [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
> > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > -1
> > > Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call. 
> > > 
> > > Keep in mind of the above 2 situations, we will refer them in below:
> > > 
> > > 1. First,  the problem we are trying to resolve is:
> > > 
> > > (Your description):
> > > 
> > > > the reordering of __bdos w.r.t. initialization of the size parameter but to also account for DSE of the assignment, we can abstract this problem to that of DFA being unable to see implicit use of the size parameter in the __bdos call.
> > > 
> > > basically is correct.  However, with the following exception:
> > > 
> > > The implicit use of the size parameter in the __bdos call is not always there, it ONLY exists WHEN the __bdos is able to evaluated to an expression of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the above example. 
> > > In the “Situation 2”, when the __bdos does not see the TYPE of the real object,  it does not see the counted_by information from the TYPE, therefore,  it is not able to evaluate the size of the object through the counted_by information.  As a result, the implicit use of the size parameter in the __bdos call does NOT exist at all.  The optimizer can freely reorder the initialization of the size parameter with the __bdos call since there is no data flow dependency between these two. 
> > > 
> > > With this exception in mind, we can see that your proposed “option 2” (making the type of size “volatile”) is too conservative, it will  disable many optimizations  unnecessarily, even though it’s safe and simple to implement. 
> > > 
> > > As a compiler optimization person for many many years, I really don’t want to take this approach at this moment.  -:)
> > > 
> > > 2. Some facts I’d like to mention:
> > > 
> > > A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE optimization stage. During RTL stage,  the __bdos call has already been replaced by an expression of the size parameter or a constant, the data dependency is explicitly in the IR already.  I believe that the data analysis in RTL stage should pick up the data dependency correctly, No special handling is needed in RTL.
> > > 
> > > B. If the __bdos call cannot see the real object , it has no way to get the “counted_by” field from the TYPE of the real object. So, if we try to add the implicit use of the “counted_by” field to the __bdos call, the object instantiation should be in the same routine as the __bdos call.  Both the FE and the gimplification phase are too early to do this work. 
> > > 
> > > 2. Then, what’s the best approach to resolve this problem:
> > > 
> > > There were several suggestions so far:
> > > 
> > > A.  Add an additional argument, the size parameter,  to __bdos, 
> > >      A.1, during FE;
> > >      A.2, during gimplification phase;
> > > B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
> > > C.  Encode the implicit USE  in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.
> > > 
> > > As I explained in the above, 
> > > ** Approach A (both A.1 and A.2) does not work;
> > > ** Approach B will have big performance impact, I’d prefer not to take this approach at this moment.
> > > ** Approach C will be a lot of change in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.
> > > 
> > > So, all the above proposed approaches, A, B, C, are not very good. 
> > > 
> > > Then, maybe the following might work better?
> > > 
> > > In the tree optimization stage, 
> > >    * After the inlining transformation applied,  
> > > +  * Before the data-flow related optimization happens, 
> > > +  * when the data flow analysis is constructed, 
> > > 
> > > For each call to __bdos, add the implicit use of size parameter. 
> > > 
> > > Is this doable? 
> > 
> > Here is another proposal:  Add a new builtin function
> > 
> > __builtin_with_size(x, size)
> > 
> > that return x but behaves similar to an allocation
> > function in that BDOS can look at the size argument
> > to discover the size.
> > 
> > The FE insers this function when the field is accessed:
> > 
> > __builtin_with_size(x.buf, x.L);
> 
> Thanks for the proposal!
> 
> So what you suggested is:
> 
> For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, then the call to the _bdos (x.buf, 1) will
> Become:
> 
>    _bdos(__builtin_with_size(x.buf, x.L), 1)?
> 
> Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
> 
> This looks like a very promising solution.
> 
> Will study this a. Little bit more.

Yes, the load will be created explicitely in the FE
at the right position.   The BDOS pass can then
later propagate the size to where it is used:

x = &__builtin_with_size(x.buf, x.L)

...other stuff is happening...

__bdos(x, 1).

See for a working example: https://godbolt.org/z/Ej3s1GToa
which shows that reordering is still possible.


I think this should be easy to implement
because it is similar to how BDOS works with
builtin allocation functions.

And this feature seems generally useful. 

I am not sure whether the builtin to take a pointer
argument or the object itself.

Note that the builtin does nothing. It just ensure
that the size argument is evaluated at the right
point in time.

Maybe it should have arguments for min / max 
subobject etc..   Not sure.


Martin

> 
> Qing
> > 
> > 
> > Martin
> > 
> > 
> > 
> > > 
> > > Otherwise, we might need to take the “volatile” approach. 
> > > 
> > > Let me know your suggestion and comment.
> > > 
> > > Thanks a lot.
> > > 
> > > Qing
> > > 
> > > 
> > > > __bdos is the one such implicit user of the size parameter and you're proposing to solve this by encoding the relationship between buffer and size at the __bdos call site.  But what about the case when the instantiation of the object is not at the same place as the __bdos call site, i.e. the DFA is unable to make that relationship?
> > > > 
> > > > The example Martin showed where the subobject gets "hidden" behind a pointer was a trivial one where DFA *may* actually work in practice (because the object-size pass can thread through these assignments) but think about this one:
> > > > 
> > > > struct A
> > > > {
> > > > size_t size;
> > > > char buf[] __attribute__((counted_by(size)));
> > > > }
> > > > 
> > > > static size_t
> > > > get_size_of (void *ptr)
> > > > {
> > > > return __bdos (ptr, 1);
> > > > }
> > > > 
> > > > void
> > > > foo (size_t sz)
> > > > {
> > > > struct A *obj = __builtin_malloc (sz);
> > > > obj->size = sz;
> > > > 
> > > > ...
> > > > __builtin_printf ("%zu\n", get_size_of (obj->array));
> > > > ...
> > > > }
> > > > 
> > > > Until get_size_of is inlined, no DFA can see the __bdos call in the same place as the point where obj is allocated.  As a result, the assignment to obj->size could get reordered (or the store eliminated) w.r.t. the __bdos call until the inlining happens.
> > > > 
> > > > As a result, the relationship between buf and size established by the attribute needs to be encoded into the type somehow.  There are two options:
> > > > 
> > > > Option 1: Encode the relationship in the type of buf
> > > > 
> > > > This is kinda what you end up doing with component_ref_has_counted_by and it does show the relationship if one is looking (through that call), but nothing more that can be used to, e.g. prevent reordering or tell the optimizer that the reference to the buf member may imply a reference to the size member as well.  This could be remedied by somehow encoding the USES relationship for size into the type of buf that the optimization passes can see.  I feel like this may be a bit convoluted to specify in a future language extension in a way that will actually be well understood by developers, but it will likely generate faster runtime code.  This will also likely require a bigger change across passes.
> > > > 
> > > > Option 2: Encode the relationship in the type of size
> > > > 
> > > > The other option is to enhance the type of size somehow so that it discourages reordering and store elimination, basically pessimizing code.  I think volatile semantics might be the way to do this and may even be straightforward to specify in the future language extension given that it builds on a known language construct and is thematically related.  However it does pessimize output for code that implements __counted_by__.
> > > > 
> > > > Thanks,
> > > > Sid
>
Richard Biener Oct. 25, 2023, 6:43 a.m. UTC | #54
> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
> 
> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>> Hi, Sid,
>> 
>> Really appreciate for your example and detailed explanation. Very helpful.
>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>> 
>> I slightly modified this example to make it to be compilable and run-able, as following: 
>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>> 
>>  1 #include <malloc.h>
>>  2 struct A
>>  3 {
>>  4  size_t size;
>>  5  char buf[] __attribute__((counted_by(size)));
>>  6 };
>>  7 
>>  8 static size_t
>>  9 get_size_from (void *ptr)
>> 10 {
>> 11  return __builtin_dynamic_object_size (ptr, 1);
>> 12 }
>> 13 
>> 14 void
>> 15 foo (size_t sz)
>> 16 {
>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> 18  obj->size = sz;
>> 19  obj->buf[0] = 2;
>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>> 21  return;
>> 22 }
>> 23 
>> 24 int main ()
>> 25 {
>> 26  foo (20);
>> 27  return 0;
>> 28 }
>> 
>> With my GCC, it was compiled and worked:
>> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>> 20
>> Situation 1: With O1 and above, the routine “get_size_from” was inlined into “foo”, therefore, the call to __bdos is in the same routine as the instantiation of the object, and the TYPE information and the attached counted_by attribute information in the TYPE of the object can be USED by the __bdos call to compute the final object size. 
>> 
>> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>> -1
>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call. 
>> 
>> Keep in mind of the above 2 situations, we will refer them in below:
>> 
>> 1. First,  the problem we are trying to resolve is:
>> 
>> (Your description):
>> 
>>> the reordering of __bdos w.r.t. initialization of the size parameter but to also account for DSE of the assignment, we can abstract this problem to that of DFA being unable to see implicit use of the size parameter in the __bdos call.
>> 
>> basically is correct.  However, with the following exception:
>> 
>> The implicit use of the size parameter in the __bdos call is not always there, it ONLY exists WHEN the __bdos is able to evaluated to an expression of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the above example. 
>> In the “Situation 2”, when the __bdos does not see the TYPE of the real object,  it does not see the counted_by information from the TYPE, therefore,  it is not able to evaluate the size of the object through the counted_by information.  As a result, the implicit use of the size parameter in the __bdos call does NOT exist at all.  The optimizer can freely reorder the initialization of the size parameter with the __bdos call since there is no data flow dependency between these two. 
>> 
>> With this exception in mind, we can see that your proposed “option 2” (making the type of size “volatile”) is too conservative, it will  disable many optimizations  unnecessarily, even though it’s safe and simple to implement. 
>> 
>> As a compiler optimization person for many many years, I really don’t want to take this approach at this moment.  -:)
>> 
>> 2. Some facts I’d like to mention:
>> 
>> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE optimization stage. During RTL stage,  the __bdos call has already been replaced by an expression of the size parameter or a constant, the data dependency is explicitly in the IR already.  I believe that the data analysis in RTL stage should pick up the data dependency correctly, No special handling is needed in RTL.
>> 
>> B. If the __bdos call cannot see the real object , it has no way to get the “counted_by” field from the TYPE of the real object. So, if we try to add the implicit use of the “counted_by” field to the __bdos call, the object instantiation should be in the same routine as the __bdos call.  Both the FE and the gimplification phase are too early to do this work. 
>> 
>> 2. Then, what’s the best approach to resolve this problem:
>> 
>> There were several suggestions so far:
>> 
>> A.  Add an additional argument, the size parameter,  to __bdos, 
>>      A.1, during FE;
>>      A.2, during gimplification phase;
>> B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
>> C.  Encode the implicit USE  in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.
>> 
>> As I explained in the above, 
>> ** Approach A (both A.1 and A.2) does not work;
>> ** Approach B will have big performance impact, I’d prefer not to take this approach at this moment.
>> ** Approach C will be a lot of change in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.
>> 
>> So, all the above proposed approaches, A, B, C, are not very good. 
>> 
>> Then, maybe the following might work better?
>> 
>> In the tree optimization stage, 
>>    * After the inlining transformation applied,  
>> +  * Before the data-flow related optimization happens, 
>> +  * when the data flow analysis is constructed, 
>> 
>> For each call to __bdos, add the implicit use of size parameter. 
>> 
>> Is this doable? 
> 
> Here is another proposal:  Add a new builtin function
> 
> __builtin_with_size(x, size)
> 
> that return x but behaves similar to an allocation
> function in that BDOS can look at the size argument
> to discover the size.
> 
> The FE insers this function when the field is accessed:

When it’s set I suppose.  Turn

X.l = n;

Into

X.l = __builtin_with_size (x.buf, n);

And indeed we need sth like a fat pointer to reliably solve all the issues.

Richard 

> __builtin_with_size(x.buf, x.L);
> 
> 
> Martin
> 
> 
> 
>> 
>> Otherwise, we might need to take the “volatile” approach. 
>> 
>> Let me know your suggestion and comment.
>> 
>> Thanks a lot.
>> 
>> Qing
>> 
>> 
>>> __bdos is the one such implicit user of the size parameter and you're proposing to solve this by encoding the relationship between buffer and size at the __bdos call site.  But what about the case when the instantiation of the object is not at the same place as the __bdos call site, i.e. the DFA is unable to make that relationship?
>>> 
>>> The example Martin showed where the subobject gets "hidden" behind a pointer was a trivial one where DFA *may* actually work in practice (because the object-size pass can thread through these assignments) but think about this one:
>>> 
>>> struct A
>>> {
>>> size_t size;
>>> char buf[] __attribute__((counted_by(size)));
>>> }
>>> 
>>> static size_t
>>> get_size_of (void *ptr)
>>> {
>>> return __bdos (ptr, 1);
>>> }
>>> 
>>> void
>>> foo (size_t sz)
>>> {
>>> struct A *obj = __builtin_malloc (sz);
>>> obj->size = sz;
>>> 
>>> ...
>>> __builtin_printf ("%zu\n", get_size_of (obj->array));
>>> ...
>>> }
>>> 
>>> Until get_size_of is inlined, no DFA can see the __bdos call in the same place as the point where obj is allocated.  As a result, the assignment to obj->size could get reordered (or the store eliminated) w.r.t. the __bdos call until the inlining happens.
>>> 
>>> As a result, the relationship between buf and size established by the attribute needs to be encoded into the type somehow.  There are two options:
>>> 
>>> Option 1: Encode the relationship in the type of buf
>>> 
>>> This is kinda what you end up doing with component_ref_has_counted_by and it does show the relationship if one is looking (through that call), but nothing more that can be used to, e.g. prevent reordering or tell the optimizer that the reference to the buf member may imply a reference to the size member as well.  This could be remedied by somehow encoding the USES relationship for size into the type of buf that the optimization passes can see.  I feel like this may be a bit convoluted to specify in a future language extension in a way that will actually be well understood by developers, but it will likely generate faster runtime code.  This will also likely require a bigger change across passes.
>>> 
>>> Option 2: Encode the relationship in the type of size
>>> 
>>> The other option is to enhance the type of size somehow so that it discourages reordering and store elimination, basically pessimizing code.  I think volatile semantics might be the way to do this and may even be straightforward to specify in the future language extension given that it builds on a known language construct and is thematically related.  However it does pessimize output for code that implements __counted_by__.
>>> 
>>> Thanks,
>>> Sid
>> 
>
Martin Uecker Oct. 25, 2023, 8:16 a.m. UTC | #55
Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> 
> > Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
> > 
> > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> > > Hi, Sid,
> > > 
> > > Really appreciate for your example and detailed explanation. Very helpful.
> > > I think that this example is an excellent example to show (almost) all the issues we need to consider.
> > > 
> > > I slightly modified this example to make it to be compilable and run-able, as following: 
> > > (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
> > > 
> > >  1 #include <malloc.h>
> > >  2 struct A
> > >  3 {
> > >  4  size_t size;
> > >  5  char buf[] __attribute__((counted_by(size)));
> > >  6 };
> > >  7 
> > >  8 static size_t
> > >  9 get_size_from (void *ptr)
> > > 10 {
> > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > 12 }
> > > 13 
> > > 14 void
> > > 15 foo (size_t sz)
> > > 16 {
> > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > 18  obj->size = sz;
> > > 19  obj->buf[0] = 2;
> > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > 21  return;
> > > 22 }
> > > 23 
> > > 24 int main ()
> > > 25 {
> > > 26  foo (20);
> > > 27  return 0;
> > > 28 }
> > > 
> > > With my GCC, it was compiled and worked:
> > > [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
> > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > 20
> > > Situation 1: With O1 and above, the routine “get_size_from” was inlined into “foo”, therefore, the call to __bdos is in the same routine as the instantiation of the object, and the TYPE information and the attached counted_by attribute information in the TYPE of the object can be USED by the __bdos call to compute the final object size. 
> > > 
> > > [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
> > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > -1
> > > Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call. 
> > > 
> > > Keep in mind of the above 2 situations, we will refer them in below:
> > > 
> > > 1. First,  the problem we are trying to resolve is:
> > > 
> > > (Your description):
> > > 
> > > > the reordering of __bdos w.r.t. initialization of the size parameter but to also account for DSE of the assignment, we can abstract this problem to that of DFA being unable to see implicit use of the size parameter in the __bdos call.
> > > 
> > > basically is correct.  However, with the following exception:
> > > 
> > > The implicit use of the size parameter in the __bdos call is not always there, it ONLY exists WHEN the __bdos is able to evaluated to an expression of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the above example. 
> > > In the “Situation 2”, when the __bdos does not see the TYPE of the real object,  it does not see the counted_by information from the TYPE, therefore,  it is not able to evaluate the size of the object through the counted_by information.  As a result, the implicit use of the size parameter in the __bdos call does NOT exist at all.  The optimizer can freely reorder the initialization of the size parameter with the __bdos call since there is no data flow dependency between these two. 
> > > 
> > > With this exception in mind, we can see that your proposed “option 2” (making the type of size “volatile”) is too conservative, it will  disable many optimizations  unnecessarily, even though it’s safe and simple to implement. 
> > > 
> > > As a compiler optimization person for many many years, I really don’t want to take this approach at this moment.  -:)
> > > 
> > > 2. Some facts I’d like to mention:
> > > 
> > > A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE optimization stage. During RTL stage,  the __bdos call has already been replaced by an expression of the size parameter or a constant, the data dependency is explicitly in the IR already.  I believe that the data analysis in RTL stage should pick up the data dependency correctly, No special handling is needed in RTL.
> > > 
> > > B. If the __bdos call cannot see the real object , it has no way to get the “counted_by” field from the TYPE of the real object. So, if we try to add the implicit use of the “counted_by” field to the __bdos call, the object instantiation should be in the same routine as the __bdos call.  Both the FE and the gimplification phase are too early to do this work. 
> > > 
> > > 2. Then, what’s the best approach to resolve this problem:
> > > 
> > > There were several suggestions so far:
> > > 
> > > A.  Add an additional argument, the size parameter,  to __bdos, 
> > >      A.1, during FE;
> > >      A.2, during gimplification phase;
> > > B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
> > > C.  Encode the implicit USE  in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.
> > > 
> > > As I explained in the above, 
> > > ** Approach A (both A.1 and A.2) does not work;
> > > ** Approach B will have big performance impact, I’d prefer not to take this approach at this moment.
> > > ** Approach C will be a lot of change in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.
> > > 
> > > So, all the above proposed approaches, A, B, C, are not very good. 
> > > 
> > > Then, maybe the following might work better?
> > > 
> > > In the tree optimization stage, 
> > >    * After the inlining transformation applied,  
> > > +  * Before the data-flow related optimization happens, 
> > > +  * when the data flow analysis is constructed, 
> > > 
> > > For each call to __bdos, add the implicit use of size parameter. 
> > > 
> > > Is this doable? 
> > 
> > Here is another proposal:  Add a new builtin function
> > 
> > __builtin_with_size(x, size)
> > 
> > that return x but behaves similar to an allocation
> > function in that BDOS can look at the size argument
> > to discover the size.
> > 
> > The FE insers this function when the field is accessed:
> 
> When it’s set I suppose.  Turn
> 
> X.l = n;
> 
> Into
> 
> X.l = __builtin_with_size (x.buf, n);

It would turn 

some_variable = (&) x.buf

into 

some_variable = __builtin_with_size ( (&) x.buf. x.len)


So the later access to x.buf and not the initialization
of a member of the struct (which is too early).

> 
> And indeed we need sth like a fat pointer to reliably solve all the issues.

What happens for other languages such as FORTRAN 
and ADA do?  Are those pointers lowered in the FE?

To me it seems there are two sound ways to introduce
such information:

- either by using the type system.  This works in
the FE in C using variably modified types

char buf[n];
__auto_type p = &buf;

... = sizeof (*p);

But if I understand Jakob's comment to some PR 
correctly the size information in the TREE_TYPE
is not processed correctly anymore in the
middle-end. 


- or one injects the information via some
tree node or builtin at certain points in
time as suggested here, and the compiler
derives the information from these points 
as tree-object-size does.  


The use of attributes seems fragile and - looking
at the access attribute also overly complex.  And 
we somehow support this only for function types
and not elsewhere and also this then gets lost
during  inlining.   So I think for all this stuff
(nonnull, access, counted_by) I think a better
approach is needed.


Martin


> 
> Richard 




> 
> > __builtin_with_size(x.buf, x.L);
> > 
> > 
> > Martin
> > 
> > 
> > 
> > > 
> > > Otherwise, we might need to take the “volatile” approach. 
> > > 
> > > Let me know your suggestion and comment.
> > > 
> > > Thanks a lot.
> > > 
> > > Qing
> > > 
> > > 
> > > > __bdos is the one such implicit user of the size parameter and you're proposing to solve this by encoding the relationship between buffer and size at the __bdos call site.  But what about the case when the instantiation of the object is not at the same place as the __bdos call site, i.e. the DFA is unable to make that relationship?
> > > > 
> > > > The example Martin showed where the subobject gets "hidden" behind a pointer was a trivial one where DFA *may* actually work in practice (because the object-size pass can thread through these assignments) but think about this one:
> > > > 
> > > > struct A
> > > > {
> > > > size_t size;
> > > > char buf[] __attribute__((counted_by(size)));
> > > > }
> > > > 
> > > > static size_t
> > > > get_size_of (void *ptr)
> > > > {
> > > > return __bdos (ptr, 1);
> > > > }
> > > > 
> > > > void
> > > > foo (size_t sz)
> > > > {
> > > > struct A *obj = __builtin_malloc (sz);
> > > > obj->size = sz;
> > > > 
> > > > ...
> > > > __builtin_printf ("%zu\n", get_size_of (obj->array));
> > > > ...
> > > > }
> > > > 
> > > > Until get_size_of is inlined, no DFA can see the __bdos call in the same place as the point where obj is allocated.  As a result, the assignment to obj->size could get reordered (or the store eliminated) w.r.t. the __bdos call until the inlining happens.
> > > > 
> > > > As a result, the relationship between buf and size established by the attribute needs to be encoded into the type somehow.  There are two options:
> > > > 
> > > > Option 1: Encode the relationship in the type of buf
> > > > 
> > > > This is kinda what you end up doing with component_ref_has_counted_by and it does show the relationship if one is looking (through that call), but nothing more that can be used to, e.g. prevent reordering or tell the optimizer that the reference to the buf member may imply a reference to the size member as well.  This could be remedied by somehow encoding the USES relationship for size into the type of buf that the optimization passes can see.  I feel like this may be a bit convoluted to specify in a future language extension in a way that will actually be well understood by developers, but it will likely generate faster runtime code.  This will also likely require a bigger change across passes.
> > > > 
> > > > Option 2: Encode the relationship in the type of size
> > > > 
> > > > The other option is to enhance the type of size somehow so that it discourages reordering and store elimination, basically pessimizing code.  I think volatile semantics might be the way to do this and may even be straightforward to specify in the future language extension given that it builds on a known language construct and is thematically related.  However it does pessimize output for code that implements __counted_by__.
> > > > 
> > > > Thanks,
> > > > Sid
> > > 
> >
Siddhesh Poyarekar Oct. 25, 2023, 10:25 a.m. UTC | #56
On 2023-10-25 04:16, Martin Uecker wrote:
> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>
>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
>>>
>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>> Hi, Sid,
>>>>
>>>> Really appreciate for your example and detailed explanation. Very helpful.
>>>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>>>>
>>>> I slightly modified this example to make it to be compilable and run-able, as following:
>>>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>>>>
>>>>   1 #include <malloc.h>
>>>>   2 struct A
>>>>   3 {
>>>>   4  size_t size;
>>>>   5  char buf[] __attribute__((counted_by(size)));
>>>>   6 };
>>>>   7
>>>>   8 static size_t
>>>>   9 get_size_from (void *ptr)
>>>> 10 {
>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>> 12 }
>>>> 13
>>>> 14 void
>>>> 15 foo (size_t sz)
>>>> 16 {
>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>> 18  obj->size = sz;
>>>> 19  obj->buf[0] = 2;
>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>> 21  return;
>>>> 22 }
>>>> 23
>>>> 24 int main ()
>>>> 25 {
>>>> 26  foo (20);
>>>> 27  return 0;
>>>> 28 }
>>>>

<snip>

>> When it’s set I suppose.  Turn
>>
>> X.l = n;
>>
>> Into
>>
>> X.l = __builtin_with_size (x.buf, n);
> 
> It would turn
> 
> some_variable = (&) x.buf
> 
> into
> 
> some_variable = __builtin_with_size ( (&) x.buf. x.len)
> 
> 
> So the later access to x.buf and not the initialization
> of a member of the struct (which is too early).
> 

Hmm, so with Qing's example above, are you suggesting the transformation 
be to foo like so:

14 void
15 foo (size_t sz)
16 {
16.5  void * _1;
17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
18  obj->size = sz;
19  obj->buf[0] = 2;
19.5  _1 = __builtin_with_size (obj->buf, obj->size);
20  __builtin_printf (“%d\n", get_size_from (_1));
21  return;
22 }

If yes then this could indeed work.  I think I got thrown off by the 
reference to __bdos.

Thanks,
Sid
Richard Biener Oct. 25, 2023, 10:25 a.m. UTC | #57
> Am 25.10.2023 um 10:16 schrieb Martin Uecker <uecker@tugraz.at>:
> 
> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>> 
>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
>>> 
>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>> Hi, Sid,
>>>> 
>>>> Really appreciate for your example and detailed explanation. Very helpful.
>>>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>>>> 
>>>> I slightly modified this example to make it to be compilable and run-able, as following: 
>>>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>>>> 
>>>> 1 #include <malloc.h>
>>>> 2 struct A
>>>> 3 {
>>>> 4  size_t size;
>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>> 6 };
>>>> 7 
>>>> 8 static size_t
>>>> 9 get_size_from (void *ptr)
>>>> 10 {
>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>> 12 }
>>>> 13 
>>>> 14 void
>>>> 15 foo (size_t sz)
>>>> 16 {
>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>> 18  obj->size = sz;
>>>> 19  obj->buf[0] = 2;
>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>> 21  return;
>>>> 22 }
>>>> 23 
>>>> 24 int main ()
>>>> 25 {
>>>> 26  foo (20);
>>>> 27  return 0;
>>>> 28 }
>>>> 
>>>> With my GCC, it was compiled and worked:
>>>> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
>>>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>>>> 20
>>>> Situation 1: With O1 and above, the routine “get_size_from” was inlined into “foo”, therefore, the call to __bdos is in the same routine as the instantiation of the object, and the TYPE information and the attached counted_by attribute information in the TYPE of the object can be USED by the __bdos call to compute the final object size. 
>>>> 
>>>> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
>>>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>>>> -1
>>>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call. 
>>>> 
>>>> Keep in mind of the above 2 situations, we will refer them in below:
>>>> 
>>>> 1. First,  the problem we are trying to resolve is:
>>>> 
>>>> (Your description):
>>>> 
>>>>> the reordering of __bdos w.r.t. initialization of the size parameter but to also account for DSE of the assignment, we can abstract this problem to that of DFA being unable to see implicit use of the size parameter in the __bdos call.
>>>> 
>>>> basically is correct.  However, with the following exception:
>>>> 
>>>> The implicit use of the size parameter in the __bdos call is not always there, it ONLY exists WHEN the __bdos is able to evaluated to an expression of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the above example. 
>>>> In the “Situation 2”, when the __bdos does not see the TYPE of the real object,  it does not see the counted_by information from the TYPE, therefore,  it is not able to evaluate the size of the object through the counted_by information.  As a result, the implicit use of the size parameter in the __bdos call does NOT exist at all.  The optimizer can freely reorder the initialization of the size parameter with the __bdos call since there is no data flow dependency between these two. 
>>>> 
>>>> With this exception in mind, we can see that your proposed “option 2” (making the type of size “volatile”) is too conservative, it will  disable many optimizations  unnecessarily, even though it’s safe and simple to implement. 
>>>> 
>>>> As a compiler optimization person for many many years, I really don’t want to take this approach at this moment.  -:)
>>>> 
>>>> 2. Some facts I’d like to mention:
>>>> 
>>>> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE optimization stage. During RTL stage,  the __bdos call has already been replaced by an expression of the size parameter or a constant, the data dependency is explicitly in the IR already.  I believe that the data analysis in RTL stage should pick up the data dependency correctly, No special handling is needed in RTL.
>>>> 
>>>> B. If the __bdos call cannot see the real object , it has no way to get the “counted_by” field from the TYPE of the real object. So, if we try to add the implicit use of the “counted_by” field to the __bdos call, the object instantiation should be in the same routine as the __bdos call.  Both the FE and the gimplification phase are too early to do this work. 
>>>> 
>>>> 2. Then, what’s the best approach to resolve this problem:
>>>> 
>>>> There were several suggestions so far:
>>>> 
>>>> A.  Add an additional argument, the size parameter,  to __bdos, 
>>>>     A.1, during FE;
>>>>     A.2, during gimplification phase;
>>>> B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
>>>> C.  Encode the implicit USE  in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.
>>>> 
>>>> As I explained in the above, 
>>>> ** Approach A (both A.1 and A.2) does not work;
>>>> ** Approach B will have big performance impact, I’d prefer not to take this approach at this moment.
>>>> ** Approach C will be a lot of change in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.
>>>> 
>>>> So, all the above proposed approaches, A, B, C, are not very good. 
>>>> 
>>>> Then, maybe the following might work better?
>>>> 
>>>> In the tree optimization stage, 
>>>>   * After the inlining transformation applied,  
>>>> +  * Before the data-flow related optimization happens, 
>>>> +  * when the data flow analysis is constructed, 
>>>> 
>>>> For each call to __bdos, add the implicit use of size parameter. 
>>>> 
>>>> Is this doable? 
>>> 
>>> Here is another proposal:  Add a new builtin function
>>> 
>>> __builtin_with_size(x, size)
>>> 
>>> that return x but behaves similar to an allocation
>>> function in that BDOS can look at the size argument
>>> to discover the size.
>>> 
>>> The FE insers this function when the field is accessed:
>> 
>> When it’s set I suppose.  Turn
>> 
>> X.l = n;
>> 
>> Into
>> 
>> X.l = __builtin_with_size (x.buf, n);
> 
> It would turn 
> 
> some_variable = (&) x.buf
> 
> into 
> 
> some_variable = __builtin_with_size ( (&) x.buf. x.len)

Unless you use the address of x.Len this will not work when len is initialized after buf.  And the address will not have a meaningful data dependence.
> 
> So the later access to x.buf and not the initialization
> of a member of the struct (which is too early).

>> 
>> And indeed we need sth like a fat pointer to reliably solve all the issues.
> 
> What happens for other languages such as FORTRAN 
> and ADA do?  Are those pointers lowered in the FE?

Yes

> To me it seems there are two sound ways to introduce
> such information:
> 
> - either by using the type system.  This works in
> the FE in C using variably modified types
> 
> char buf[n];
> __auto_type p = &buf;
> 
> ... = sizeof (*p);
> 
> But if I understand Jakob's comment to some PR 
> correctly the size information in the TREE_TYPE
> is not processed correctly anymore in the
> middle-end. 

The type based info is lowered during gimplification and in particular for pointer types the middle-end quickly loses track of the original type.

Richard 

> 
> - or one injects the information via some
> tree node or builtin at certain points in
> time as suggested here, and the compiler
> derives the information from these points 
> as tree-object-size does.  
> 
> 
> The use of attributes seems fragile and - looking
> at the access attribute also overly complex.  And 
> we somehow support this only for function types
> and not elsewhere and also this then gets lost
> during  inlining.   So I think for all this stuff
> (nonnull, access, counted_by) I think a better
> approach is needed.
> 
> 
> Martin
> 
> 
>> 
>> Richard 
> 
> 
> 
> 
>> 
>>> __builtin_with_size(x.buf, x.L);
>>> 
>>> 
>>> Martin
>>> 
>>> 
>>> 
>>>> 
>>>> Otherwise, we might need to take the “volatile” approach. 
>>>> 
>>>> Let me know your suggestion and comment.
>>>> 
>>>> Thanks a lot.
>>>> 
>>>> Qing
>>>> 
>>>> 
>>>>> __bdos is the one such implicit user of the size parameter and you're proposing to solve this by encoding the relationship between buffer and size at the __bdos call site.  But what about the case when the instantiation of the object is not at the same place as the __bdos call site, i.e. the DFA is unable to make that relationship?
>>>>> 
>>>>> The example Martin showed where the subobject gets "hidden" behind a pointer was a trivial one where DFA *may* actually work in practice (because the object-size pass can thread through these assignments) but think about this one:
>>>>> 
>>>>> struct A
>>>>> {
>>>>> size_t size;
>>>>> char buf[] __attribute__((counted_by(size)));
>>>>> }
>>>>> 
>>>>> static size_t
>>>>> get_size_of (void *ptr)
>>>>> {
>>>>> return __bdos (ptr, 1);
>>>>> }
>>>>> 
>>>>> void
>>>>> foo (size_t sz)
>>>>> {
>>>>> struct A *obj = __builtin_malloc (sz);
>>>>> obj->size = sz;
>>>>> 
>>>>> ...
>>>>> __builtin_printf ("%zu\n", get_size_of (obj->array));
>>>>> ...
>>>>> }
>>>>> 
>>>>> Until get_size_of is inlined, no DFA can see the __bdos call in the same place as the point where obj is allocated.  As a result, the assignment to obj->size could get reordered (or the store eliminated) w.r.t. the __bdos call until the inlining happens.
>>>>> 
>>>>> As a result, the relationship between buf and size established by the attribute needs to be encoded into the type somehow.  There are two options:
>>>>> 
>>>>> Option 1: Encode the relationship in the type of buf
>>>>> 
>>>>> This is kinda what you end up doing with component_ref_has_counted_by and it does show the relationship if one is looking (through that call), but nothing more that can be used to, e.g. prevent reordering or tell the optimizer that the reference to the buf member may imply a reference to the size member as well.  This could be remedied by somehow encoding the USES relationship for size into the type of buf that the optimization passes can see.  I feel like this may be a bit convoluted to specify in a future language extension in a way that will actually be well understood by developers, but it will likely generate faster runtime code.  This will also likely require a bigger change across passes.
>>>>> 
>>>>> Option 2: Encode the relationship in the type of size
>>>>> 
>>>>> The other option is to enhance the type of size somehow so that it discourages reordering and store elimination, basically pessimizing code.  I think volatile semantics might be the way to do this and may even be straightforward to specify in the future language extension given that it builds on a known language construct and is thematically related.  However it does pessimize output for code that implements __counted_by__.
>>>>> 
>>>>> Thanks,
>>>>> Sid
>>>> 
>>> 
> 
> -- 
> Univ.-Prof. Dr. rer. nat. Martin Uecker
> Graz University of Technology
> Institute of Biomedical Imaging
> 
>
Martin Uecker Oct. 25, 2023, 10:39 a.m. UTC | #58
Am Mittwoch, dem 25.10.2023 um 12:25 +0200 schrieb Richard Biener:
> 
> > Am 25.10.2023 um 10:16 schrieb Martin Uecker <uecker@tugraz.at>:
> > 
> > Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> > > 
> > > > > Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
> > > > 
> > > > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> > > > > Hi, Sid,
> > > > > 
> > > > > Really appreciate for your example and detailed explanation. Very helpful.
> > > > > I think that this example is an excellent example to show (almost) all the issues we need to consider.
> > > > > 
> > > > > I slightly modified this example to make it to be compilable and run-able, as following: 
> > > > > (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
> > > > > 
> > > > > 1 #include <malloc.h>
> > > > > 2 struct A
> > > > > 3 {
> > > > > 4  size_t size;
> > > > > 5  char buf[] __attribute__((counted_by(size)));
> > > > > 6 };
> > > > > 7 
> > > > > 8 static size_t
> > > > > 9 get_size_from (void *ptr)
> > > > > 10 {
> > > > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > > > 12 }
> > > > > 13 
> > > > > 14 void
> > > > > 15 foo (size_t sz)
> > > > > 16 {
> > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > > > 18  obj->size = sz;
> > > > > 19  obj->buf[0] = 2;
> > > > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > > > 21  return;
> > > > > 22 }
> > > > > 23 
> > > > > 24 int main ()
> > > > > 25 {
> > > > > 26  foo (20);
> > > > > 27  return 0;
> > > > > 28 }
> > > > > 
> > > > > With my GCC, it was compiled and worked:
> > > > > [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
> > > > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > > > 20
> > > > > Situation 1: With O1 and above, the routine “get_size_from” was inlined into “foo”, therefore, the call to __bdos is in the same routine as the instantiation of the object, and the TYPE information and the attached counted_by attribute information in the TYPE of the object can be USED by the __bdos call to compute the final object size. 
> > > > > 
> > > > > [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
> > > > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > > > -1
> > > > > Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call. 
> > > > > 
> > > > > Keep in mind of the above 2 situations, we will refer them in below:
> > > > > 
> > > > > 1. First,  the problem we are trying to resolve is:
> > > > > 
> > > > > (Your description):
> > > > > 
> > > > > > the reordering of __bdos w.r.t. initialization of the size parameter but to also account for DSE of the assignment, we can abstract this problem to that of DFA being unable to see implicit use of the size parameter in the __bdos call.
> > > > > 
> > > > > basically is correct.  However, with the following exception:
> > > > > 
> > > > > The implicit use of the size parameter in the __bdos call is not always there, it ONLY exists WHEN the __bdos is able to evaluated to an expression of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the above example. 
> > > > > In the “Situation 2”, when the __bdos does not see the TYPE of the real object,  it does not see the counted_by information from the TYPE, therefore,  it is not able to evaluate the size of the object through the counted_by information.  As a result, the implicit use of the size parameter in the __bdos call does NOT exist at all.  The optimizer can freely reorder the initialization of the size parameter with the __bdos call since there is no data flow dependency between these two. 
> > > > > 
> > > > > With this exception in mind, we can see that your proposed “option 2” (making the type of size “volatile”) is too conservative, it will  disable many optimizations  unnecessarily, even though it’s safe and simple to implement. 
> > > > > 
> > > > > As a compiler optimization person for many many years, I really don’t want to take this approach at this moment.  -:)
> > > > > 
> > > > > 2. Some facts I’d like to mention:
> > > > > 
> > > > > A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE optimization stage. During RTL stage,  the __bdos call has already been replaced by an expression of the size parameter or a constant, the data dependency is explicitly in the IR already.  I believe that the data analysis in RTL stage should pick up the data dependency correctly, No special handling is needed in RTL.
> > > > > 
> > > > > B. If the __bdos call cannot see the real object , it has no way to get the “counted_by” field from the TYPE of the real object. So, if we try to add the implicit use of the “counted_by” field to the __bdos call, the object instantiation should be in the same routine as the __bdos call.  Both the FE and the gimplification phase are too early to do this work. 
> > > > > 
> > > > > 2. Then, what’s the best approach to resolve this problem:
> > > > > 
> > > > > There were several suggestions so far:
> > > > > 
> > > > > A.  Add an additional argument, the size parameter,  to __bdos, 
> > > > >     A.1, during FE;
> > > > >     A.2, during gimplification phase;
> > > > > B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
> > > > > C.  Encode the implicit USE  in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.
> > > > > 
> > > > > As I explained in the above, 
> > > > > ** Approach A (both A.1 and A.2) does not work;
> > > > > ** Approach B will have big performance impact, I’d prefer not to take this approach at this moment.
> > > > > ** Approach C will be a lot of change in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.
> > > > > 
> > > > > So, all the above proposed approaches, A, B, C, are not very good. 
> > > > > 
> > > > > Then, maybe the following might work better?
> > > > > 
> > > > > In the tree optimization stage, 
> > > > >   * After the inlining transformation applied,  
> > > > > +  * Before the data-flow related optimization happens, 
> > > > > +  * when the data flow analysis is constructed, 
> > > > > 
> > > > > For each call to __bdos, add the implicit use of size parameter. 
> > > > > 
> > > > > Is this doable? 
> > > > 
> > > > Here is another proposal:  Add a new builtin function
> > > > 
> > > > __builtin_with_size(x, size)
> > > > 
> > > > that return x but behaves similar to an allocation
> > > > function in that BDOS can look at the size argument
> > > > to discover the size.
> > > > 
> > > > The FE insers this function when the field is accessed:
> > > 
> > > When it’s set I suppose.  Turn
> > > 
> > > X.l = n;
> > > 
> > > Into
> > > 
> > > X.l = __builtin_with_size (x.buf, n);
> > 
> > It would turn 
> > 
> > some_variable = (&) x.buf
> > 
> > into 
> > 
> > some_variable = __builtin_with_size ( (&) x.buf. x.len)
> 
> Unless you use the address of x.Len this will not work when len is initialized after buf.  And the address will not have a meaningful data dependence.
> > 

It would be a semantic requirement for this feature that
x.len needs to be initialized before x.buf is accessed.  

Otherwise, I am not sure how to define the time point 
at which x.len should be evaluated. 

> > So the later access to x.buf and not the initialization
> > of a member of the struct (which is too early).
> 
> > > 
> > > And indeed we need sth like a fat pointer to reliably solve all the issues.
> > 
> > What happens for other languages such as FORTRAN 
> > and ADA do?  Are those pointers lowered in the FE?
> 
> Yes
> 
> > To me it seems there are two sound ways to introduce
> > such information:
> > 
> > - either by using the type system.  This works in
> > the FE in C using variably modified types
> > 
> > char buf[n];
> > __auto_type p = &buf;
> > 
> > ... = sizeof (*p);
> > 
> > But if I understand Jakob's comment to some PR 
> > correctly the size information in the TREE_TYPE
> > is not processed correctly anymore in the
> > middle-end. 
> 
> The type based info is lowered during gimplification and in particular for pointer types the middle-end quickly loses track of the original type.
> 

Would it work if we make sure that we find a suitable
type? Or in other words, are the (non-constant) size 
expressions inside it still useful in later passes? 

Martin


> Richard 
> 
> > 
> > - or one injects the information via some
> > tree node or builtin at certain points in
> > time as suggested here, and the compiler
> > derives the information from these points 
> > as tree-object-size does.  
> > 
> > 
> > The use of attributes seems fragile and - looking
> > at the access attribute also overly complex.  And 
> > we somehow support this only for function types
> > and not elsewhere and also this then gets lost
> > during  inlining.   So I think for all this stuff
> > (nonnull, access, counted_by) I think a better
> > approach is needed.
> > 
> > 
> > Martin
> > 
> > 
> > > 
> > > Richard 
> > 
> > 
> > 
> > 
> > > 
> > > > __builtin_with_size(x.buf, x.L);
> > > > 
> > > > 
> > > > Martin
> > > > 
> > > > 
> > > > 
> > > > > 
> > > > > Otherwise, we might need to take the “volatile” approach. 
> > > > > 
> > > > > Let me know your suggestion and comment.
> > > > > 
> > > > > Thanks a lot.
> > > > > 
> > > > > Qing
> > > > > 
> > > > > 
> > > > > > __bdos is the one such implicit user of the size parameter and you're proposing to solve this by encoding the relationship between buffer and size at the __bdos call site.  But what about the case when the instantiation of the object is not at the same place as the __bdos call site, i.e. the DFA is unable to make that relationship?
> > > > > > 
> > > > > > The example Martin showed where the subobject gets "hidden" behind a pointer was a trivial one where DFA *may* actually work in practice (because the object-size pass can thread through these assignments) but think about this one:
> > > > > > 
> > > > > > struct A
> > > > > > {
> > > > > > size_t size;
> > > > > > char buf[] __attribute__((counted_by(size)));
> > > > > > }
> > > > > > 
> > > > > > static size_t
> > > > > > get_size_of (void *ptr)
> > > > > > {
> > > > > > return __bdos (ptr, 1);
> > > > > > }
> > > > > > 
> > > > > > void
> > > > > > foo (size_t sz)
> > > > > > {
> > > > > > struct A *obj = __builtin_malloc (sz);
> > > > > > obj->size = sz;
> > > > > > 
> > > > > > ...
> > > > > > __builtin_printf ("%zu\n", get_size_of (obj->array));
> > > > > > ...
> > > > > > }
> > > > > > 
> > > > > > Until get_size_of is inlined, no DFA can see the __bdos call in the same place as the point where obj is allocated.  As a result, the assignment to obj->size could get reordered (or the store eliminated) w.r.t. the __bdos call until the inlining happens.
> > > > > > 
> > > > > > As a result, the relationship between buf and size established by the attribute needs to be encoded into the type somehow.  There are two options:
> > > > > > 
> > > > > > Option 1: Encode the relationship in the type of buf
> > > > > > 
> > > > > > This is kinda what you end up doing with component_ref_has_counted_by and it does show the relationship if one is looking (through that call), but nothing more that can be used to, e.g. prevent reordering or tell the optimizer that the reference to the buf member may imply a reference to the size member as well.  This could be remedied by somehow encoding the USES relationship for size into the type of buf that the optimization passes can see.  I feel like this may be a bit convoluted to specify in a future language extension in a way that will actually be well understood by developers, but it will likely generate faster runtime code.  This will also likely require a bigger change across passes.
> > > > > > 
> > > > > > Option 2: Encode the relationship in the type of size
> > > > > > 
> > > > > > The other option is to enhance the type of size somehow so that it discourages reordering and store elimination, basically pessimizing code.  I think volatile semantics might be the way to do this and may even be straightforward to specify in the future language extension given that it builds on a known language construct and is thematically related.  However it does pessimize output for code that implements __counted_by__.
> > > > > > 
> > > > > > Thanks,
> > > > > > Sid
> > > > > 
> > > > 
> > 
> > -- 
> > Univ.-Prof. Dr. rer. nat. Martin Uecker
> > Graz University of Technology
> > Institute of Biomedical Imaging
> > 
> >
Martin Uecker Oct. 25, 2023, 10:47 a.m. UTC | #59
Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
> On 2023-10-25 04:16, Martin Uecker wrote:
> > Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> > > 
> > > > Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
> > > > 
> > > > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> > > > > Hi, Sid,
> > > > > 
> > > > > Really appreciate for your example and detailed explanation. Very helpful.
> > > > > I think that this example is an excellent example to show (almost) all the issues we need to consider.
> > > > > 
> > > > > I slightly modified this example to make it to be compilable and run-able, as following:
> > > > > (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
> > > > > 
> > > > >   1 #include <malloc.h>
> > > > >   2 struct A
> > > > >   3 {
> > > > >   4  size_t size;
> > > > >   5  char buf[] __attribute__((counted_by(size)));
> > > > >   6 };
> > > > >   7
> > > > >   8 static size_t
> > > > >   9 get_size_from (void *ptr)
> > > > > 10 {
> > > > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > > > 12 }
> > > > > 13
> > > > > 14 void
> > > > > 15 foo (size_t sz)
> > > > > 16 {
> > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > > > 18  obj->size = sz;
> > > > > 19  obj->buf[0] = 2;
> > > > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > > > 21  return;
> > > > > 22 }
> > > > > 23
> > > > > 24 int main ()
> > > > > 25 {
> > > > > 26  foo (20);
> > > > > 27  return 0;
> > > > > 28 }
> > > > > 
> 
> <snip>
> 
> > > When it’s set I suppose.  Turn
> > > 
> > > X.l = n;
> > > 
> > > Into
> > > 
> > > X.l = __builtin_with_size (x.buf, n);
> > 
> > It would turn
> > 
> > some_variable = (&) x.buf
> > 
> > into
> > 
> > some_variable = __builtin_with_size ( (&) x.buf. x.len)
> > 
> > 
> > So the later access to x.buf and not the initialization
> > of a member of the struct (which is too early).
> > 
> 
> Hmm, so with Qing's example above, are you suggesting the transformation 
> be to foo like so:
> 
> 14 void
> 15 foo (size_t sz)
> 16 {
> 16.5  void * _1;
> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 18  obj->size = sz;
> 19  obj->buf[0] = 2;
> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
> 20  __builtin_printf (“%d\n", get_size_from (_1));
> 21  return;
> 22 }
> 
> If yes then this could indeed work.  I think I got thrown off by the 
> reference to __bdos.

Yes. I think it is important not to evaluate the size at the
access to buf and not the allocation, because the point is to 
recover it from the size member even when the compiler can't 
see the original allocation.

Evaluating at this point requires that the size is correctly set
before the access to the FAM and the user has to make sure 
this is the case. But to me this requirement would make sense.

Semantically, it could aöso make sense to evaluate the size at a
later time.  But then the reordering becomes problematic again.

Also I think this would make this feature generally more useful.
For example, it could work also for others pointers in the struct
and not just for FAMs.  In this case, the struct may already be
freed when  BDOS is called, so it might also not possible to
access the size member at a later time.

Martin


>
Richard Biener Oct. 25, 2023, 11:13 a.m. UTC | #60
> Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
> 
> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>> 
>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>> 
>>>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>>>> Hi, Sid,
>>>>>> 
>>>>>> Really appreciate for your example and detailed explanation. Very helpful.
>>>>>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>>>>>> 
>>>>>> I slightly modified this example to make it to be compilable and run-able, as following:
>>>>>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>>>>>> 
>>>>>>  1 #include <malloc.h>
>>>>>>  2 struct A
>>>>>>  3 {
>>>>>>  4  size_t size;
>>>>>>  5  char buf[] __attribute__((counted_by(size)));
>>>>>>  6 };
>>>>>>  7
>>>>>>  8 static size_t
>>>>>>  9 get_size_from (void *ptr)
>>>>>> 10 {
>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>> 12 }
>>>>>> 13
>>>>>> 14 void
>>>>>> 15 foo (size_t sz)
>>>>>> 16 {
>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>> 18  obj->size = sz;
>>>>>> 19  obj->buf[0] = 2;
>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>> 21  return;
>>>>>> 22 }
>>>>>> 23
>>>>>> 24 int main ()
>>>>>> 25 {
>>>>>> 26  foo (20);
>>>>>> 27  return 0;
>>>>>> 28 }
>>>>>> 
>> 
>> <snip>
>> 
>>>> When it’s set I suppose.  Turn
>>>> 
>>>> X.l = n;
>>>> 
>>>> Into
>>>> 
>>>> X.l = __builtin_with_size (x.buf, n);
>>> 
>>> It would turn
>>> 
>>> some_variable = (&) x.buf
>>> 
>>> into
>>> 
>>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>>> 
>>> 
>>> So the later access to x.buf and not the initialization
>>> of a member of the struct (which is too early).
>>> 
>> 
>> Hmm, so with Qing's example above, are you suggesting the transformation 
>> be to foo like so:
>> 
>> 14 void
>> 15 foo (size_t sz)
>> 16 {
>> 16.5  void * _1;
>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> 18  obj->size = sz;
>> 19  obj->buf[0] = 2;
>> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
>> 20  __builtin_printf (“%d\n", get_size_from (_1));
>> 21  return;
>> 22 }
>> 
>> If yes then this could indeed work.  I think I got thrown off by the 
>> reference to __bdos.
> 
> Yes. I think it is important not to evaluate the size at the
> access to buf and not the allocation, because the point is to 
> recover it from the size member even when the compiler can't 
> see the original allocation.

But if the access is through a pointer without the attribute visible even the Frontend cannot recover?  We’d need to force type correctness and give up on indirecting through an int * when it can refer to two diffenent container types.  The best we can do I think is mark allocation sites and hope for some basic code hygiene (not clobbering size or array pointer through pointers without the appropriately attributed type)

> Evaluating at this point requires that the size is correctly set
> before the access to the FAM and the user has to make sure 
> this is the case. But to me this requirement would make sense.
> 
> Semantically, it could aöso make sense to evaluate the size at a
> later time.  But then the reordering becomes problematic again.
> 
> Also I think this would make this feature generally more useful.
> For example, it could work also for others pointers in the struct
> and not just for FAMs.  In this case, the struct may already be
> freed when  BDOS is called, so it might also not possible to
> access the size member at a later time.
> 
> Martin
> 
> 
>> 
>
Qing Zhao Oct. 25, 2023, 1:27 p.m. UTC | #61
> On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> 
> On 2023-10-24 18:51, Qing Zhao wrote:
>> Thanks for the proposal!
>> So what you suggested is:
>> For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, then the call to the _bdos (x.buf, 1) will
>> Become:
>>    _bdos(__builtin_with_size(x.buf, x.L), 1)?
>> Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
> 
> Oops, I think Martin and I fell off-list in a subthread.  I clarified that my comment was that any such annotation at object reference is probably too late and hence not the right place for it; basically it has the same problems as the option A in your comment.  A better place to reinforce such a relationship would be the allocation+initialization site instead.

I think Martin’s proposal might work, it’s different than the option A:

A.  Add an additional argument, the size parameter,  to __bdos, 
     A.1, during FE;
     A.2, during gimplification phase;

Option A targets on the __bdos call, try to encode the implicit use to the call, this will not work when the real object has not been instantiation at the call site.

However, Martin’s proposal targets on the FMA array itself, it will enhance the FAM access naturally with the size information. And such FAM access with size info will propagated to the __bdos site later through inlining, etc. and then tree-object-size can use the size information at that point. At the same time, the implicit use of the size is recorded correctly. 

So, I think that this proposal is natural and reasonable.

Qing
> 
> Thanks,
> Sid
Siddhesh Poyarekar Oct. 25, 2023, 2:50 p.m. UTC | #62
On 2023-10-25 09:27, Qing Zhao wrote:
> 
> 
>> On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>
>> On 2023-10-24 18:51, Qing Zhao wrote:
>>> Thanks for the proposal!
>>> So what you suggested is:
>>> For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, then the call to the _bdos (x.buf, 1) will
>>> Become:
>>>     _bdos(__builtin_with_size(x.buf, x.L), 1)?
>>> Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
>>
>> Oops, I think Martin and I fell off-list in a subthread.  I clarified that my comment was that any such annotation at object reference is probably too late and hence not the right place for it; basically it has the same problems as the option A in your comment.  A better place to reinforce such a relationship would be the allocation+initialization site instead.
> 
> I think Martin’s proposal might work, it’s different than the option A:
> 
> A.  Add an additional argument, the size parameter,  to __bdos,
>       A.1, during FE;
>       A.2, during gimplification phase;
> 
> Option A targets on the __bdos call, try to encode the implicit use to the call, this will not work when the real object has not been instantiation at the call site.
> 
> However, Martin’s proposal targets on the FMA array itself, it will enhance the FAM access naturally with the size information. And such FAM access with size info will propagated to the __bdos site later through inlining, etc. and then tree-object-size can use the size information at that point. At the same time, the implicit use of the size is recorded correctly.
> 
> So, I think that this proposal is natural and reasonable.

Ack, we discussed this later in the thread and I agree[1].  Richard 
still has concerns[2] that I think may be addressed by putting 
__builtin_with_size at the point where the reference to x.buf escapes, 
but I'm not very sure about that.

Oh, and Martin suggested using __builtin_with_size more generally[3] in 
bugzilla to address attribute inlining issues and we have high level 
consensus for a __builtin_with_access instead, which associates access 
type in addition to size with the target object.  For the purposes of 
counted_by, access type could simply be -1.

Thanks,
Sid


[1] 
https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9908@gotplt.org/T/#m4f3cafa489493180e258fd62aca0196a5f244039

[2] 
https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9908@gotplt.org/T/#mcf226f891621db8b640deaedd8942bb8519010f3

[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503#c6
Richard Biener Oct. 25, 2023, 3:38 p.m. UTC | #63
> Am 25.10.2023 um 16:50 schrieb Siddhesh Poyarekar <siddhesh@gotplt.org>:
> 
> On 2023-10-25 09:27, Qing Zhao wrote:
>>>> On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>> 
>>> On 2023-10-24 18:51, Qing Zhao wrote:
>>>> Thanks for the proposal!
>>>> So what you suggested is:
>>>> For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, then the call to the _bdos (x.buf, 1) will
>>>> Become:
>>>>    _bdos(__builtin_with_size(x.buf, x.L), 1)?
>>>> Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
>>> 
>>> Oops, I think Martin and I fell off-list in a subthread.  I clarified that my comment was that any such annotation at object reference is probably too late and hence not the right place for it; basically it has the same problems as the option A in your comment.  A better place to reinforce such a relationship would be the allocation+initialization site instead.
>> I think Martin’s proposal might work, it’s different than the option A:
>> A.  Add an additional argument, the size parameter,  to __bdos,
>>      A.1, during FE;
>>      A.2, during gimplification phase;
>> Option A targets on the __bdos call, try to encode the implicit use to the call, this will not work when the real object has not been instantiation at the call site.
>> However, Martin’s proposal targets on the FMA array itself, it will enhance the FAM access naturally with the size information. And such FAM access with size info will propagated to the __bdos site later through inlining, etc. and then tree-object-size can use the size information at that point. At the same time, the implicit use of the size is recorded correctly.
>> So, I think that this proposal is natural and reasonable.
> 
> Ack, we discussed this later in the thread and I agree[1].  Richard still has concerns[2] that I think may be addressed by putting __builtin_with_size at the point where the reference to x.buf escapes, but I'm not very sure about that.
> 
> Oh, and Martin suggested using __builtin_with_size more generally[3] in bugzilla to address attribute inlining issues and we have high level consensus for a __builtin_with_access instead, which associates access type in addition to size with the target object.  For the purposes of counted_by, access type could simply be -1.

Btw, I’d like to see some hard numbers on the amount of extra false positives this will cause a well as the effect on generated code before putting this in mainline and effectively needing to support it forever.

Richard 

> Thanks,
> Sid
> 
> 
> [1] https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9908@gotplt.org/T/#m4f3cafa489493180e258fd62aca0196a5f244039
> 
> [2] https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9908@gotplt.org/T/#mcf226f891621db8b640deaedd8942bb8519010f3
> 
> [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503#c6
Qing Zhao Oct. 25, 2023, 6:06 p.m. UTC | #64
> On Oct 25, 2023, at 6:39 AM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Mittwoch, dem 25.10.2023 um 12:25 +0200 schrieb Richard Biener:
>> 
>>> Am 25.10.2023 um 10:16 schrieb Martin Uecker <uecker@tugraz.at>:
>>> 
>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>> 
>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>> 
>>>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>>>> Hi, Sid,
>>>>>> 
>>>>>> Really appreciate for your example and detailed explanation. Very helpful.
>>>>>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>>>>>> 
>>>>>> I slightly modified this example to make it to be compilable and run-able, as following: 
>>>>>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>>>>>> 
>>>>>> 1 #include <malloc.h>
>>>>>> 2 struct A
>>>>>> 3 {
>>>>>> 4  size_t size;
>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>> 6 };
>>>>>> 7 
>>>>>> 8 static size_t
>>>>>> 9 get_size_from (void *ptr)
>>>>>> 10 {
>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>> 12 }
>>>>>> 13 
>>>>>> 14 void
>>>>>> 15 foo (size_t sz)
>>>>>> 16 {
>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>> 18  obj->size = sz;
>>>>>> 19  obj->buf[0] = 2;
>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>> 21  return;
>>>>>> 22 }
>>>>>> 23 
>>>>>> 24 int main ()
>>>>>> 25 {
>>>>>> 26  foo (20);
>>>>>> 27  return 0;
>>>>>> 28 }
>>>>>> 
>>>>>> With my GCC, it was compiled and worked:
>>>>>> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
>>>>>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>>>>>> 20
>>>>>> Situation 1: With O1 and above, the routine “get_size_from” was inlined into “foo”, therefore, the call to __bdos is in the same routine as the instantiation of the object, and the TYPE information and the attached counted_by attribute information in the TYPE of the object can be USED by the __bdos call to compute the final object size. 
>>>>>> 
>>>>>> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
>>>>>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>>>>>> -1
>>>>>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call. 
>>>>>> 
>>>>>> Keep in mind of the above 2 situations, we will refer them in below:
>>>>>> 
>>>>>> 1. First,  the problem we are trying to resolve is:
>>>>>> 
>>>>>> (Your description):
>>>>>> 
>>>>>>> the reordering of __bdos w.r.t. initialization of the size parameter but to also account for DSE of the assignment, we can abstract this problem to that of DFA being unable to see implicit use of the size parameter in the __bdos call.
>>>>>> 
>>>>>> basically is correct.  However, with the following exception:
>>>>>> 
>>>>>> The implicit use of the size parameter in the __bdos call is not always there, it ONLY exists WHEN the __bdos is able to evaluated to an expression of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the above example. 
>>>>>> In the “Situation 2”, when the __bdos does not see the TYPE of the real object,  it does not see the counted_by information from the TYPE, therefore,  it is not able to evaluate the size of the object through the counted_by information.  As a result, the implicit use of the size parameter in the __bdos call does NOT exist at all.  The optimizer can freely reorder the initialization of the size parameter with the __bdos call since there is no data flow dependency between these two. 
>>>>>> 
>>>>>> With this exception in mind, we can see that your proposed “option 2” (making the type of size “volatile”) is too conservative, it will  disable many optimizations  unnecessarily, even though it’s safe and simple to implement. 
>>>>>> 
>>>>>> As a compiler optimization person for many many years, I really don’t want to take this approach at this moment.  -:)
>>>>>> 
>>>>>> 2. Some facts I’d like to mention:
>>>>>> 
>>>>>> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE optimization stage. During RTL stage,  the __bdos call has already been replaced by an expression of the size parameter or a constant, the data dependency is explicitly in the IR already.  I believe that the data analysis in RTL stage should pick up the data dependency correctly, No special handling is needed in RTL.
>>>>>> 
>>>>>> B. If the __bdos call cannot see the real object , it has no way to get the “counted_by” field from the TYPE of the real object. So, if we try to add the implicit use of the “counted_by” field to the __bdos call, the object instantiation should be in the same routine as the __bdos call.  Both the FE and the gimplification phase are too early to do this work. 
>>>>>> 
>>>>>> 2. Then, what’s the best approach to resolve this problem:
>>>>>> 
>>>>>> There were several suggestions so far:
>>>>>> 
>>>>>> A.  Add an additional argument, the size parameter,  to __bdos, 
>>>>>>    A.1, during FE;
>>>>>>    A.2, during gimplification phase;
>>>>>> B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
>>>>>> C.  Encode the implicit USE  in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.
>>>>>> 
>>>>>> As I explained in the above, 
>>>>>> ** Approach A (both A.1 and A.2) does not work;
>>>>>> ** Approach B will have big performance impact, I’d prefer not to take this approach at this moment.
>>>>>> ** Approach C will be a lot of change in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.
>>>>>> 
>>>>>> So, all the above proposed approaches, A, B, C, are not very good. 
>>>>>> 
>>>>>> Then, maybe the following might work better?
>>>>>> 
>>>>>> In the tree optimization stage, 
>>>>>>  * After the inlining transformation applied,  
>>>>>> +  * Before the data-flow related optimization happens, 
>>>>>> +  * when the data flow analysis is constructed, 
>>>>>> 
>>>>>> For each call to __bdos, add the implicit use of size parameter. 
>>>>>> 
>>>>>> Is this doable? 
>>>>> 
>>>>> Here is another proposal:  Add a new builtin function
>>>>> 
>>>>> __builtin_with_size(x, size)
>>>>> 
>>>>> that return x but behaves similar to an allocation
>>>>> function in that BDOS can look at the size argument
>>>>> to discover the size.
>>>>> 
>>>>> The FE insers this function when the field is accessed:
>>>> 
>>>> When it’s set I suppose.  Turn
>>>> 
>>>> X.l = n;
>>>> 
>>>> Into
>>>> 
>>>> X.l = __builtin_with_size (x.buf, n);
>>> 
>>> It would turn 
>>> 
>>> some_variable = (&) x.buf
>>> 
>>> into 
>>> 
>>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>> 
>> Unless you use the address of x.Len this will not work when len is initialized after buf.  And the address will not have a meaningful data dependence.
>>> 
> 
> It would be a semantic requirement for this feature that
> x.len needs to be initialized before x.buf is accessed.  

Yes, that’s right, we might need to clarify this into the documentation of the counted_by. 
It should be a user error if the source code violate this rule.

Qing
> 
> Otherwise, I am not sure how to define the time point 
> at which x.len should be evaluated. 
> 
>>> So the later access to x.buf and not the initialization
>>> of a member of the struct (which is too early).
>> 
>>>> 
>>>> And indeed we need sth like a fat pointer to reliably solve all the issues.
>>> 
>>> What happens for other languages such as FORTRAN 
>>> and ADA do?  Are those pointers lowered in the FE?
>> 
>> Yes
>> 
>>> To me it seems there are two sound ways to introduce
>>> such information:
>>> 
>>> - either by using the type system.  This works in
>>> the FE in C using variably modified types
>>> 
>>> char buf[n];
>>> __auto_type p = &buf;
>>> 
>>> ... = sizeof (*p);
>>> 
>>> But if I understand Jakob's comment to some PR 
>>> correctly the size information in the TREE_TYPE
>>> is not processed correctly anymore in the
>>> middle-end. 
>> 
>> The type based info is lowered during gimplification and in particular for pointer types the middle-end quickly loses track of the original type.
>> 
> 
> Would it work if we make sure that we find a suitable
> type? Or in other words, are the (non-constant) size 
> expressions inside it still useful in later passes? 
> 
> Martin
> 
> 
>> Richard 
>> 
>>> 
>>> - or one injects the information via some
>>> tree node or builtin at certain points in
>>> time as suggested here, and the compiler
>>> derives the information from these points 
>>> as tree-object-size does.  
>>> 
>>> 
>>> The use of attributes seems fragile and - looking
>>> at the access attribute also overly complex.  And 
>>> we somehow support this only for function types
>>> and not elsewhere and also this then gets lost
>>> during  inlining.   So I think for all this stuff
>>> (nonnull, access, counted_by) I think a better
>>> approach is needed.
>>> 
>>> 
>>> Martin
>>> 
>>> 
>>>> 
>>>> Richard 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>>> __builtin_with_size(x.buf, x.L);
>>>>> 
>>>>> 
>>>>> Martin
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Otherwise, we might need to take the “volatile” approach. 
>>>>>> 
>>>>>> Let me know your suggestion and comment.
>>>>>> 
>>>>>> Thanks a lot.
>>>>>> 
>>>>>> Qing
>>>>>> 
>>>>>> 
>>>>>>> __bdos is the one such implicit user of the size parameter and you're proposing to solve this by encoding the relationship between buffer and size at the __bdos call site.  But what about the case when the instantiation of the object is not at the same place as the __bdos call site, i.e. the DFA is unable to make that relationship?
>>>>>>> 
>>>>>>> The example Martin showed where the subobject gets "hidden" behind a pointer was a trivial one where DFA *may* actually work in practice (because the object-size pass can thread through these assignments) but think about this one:
>>>>>>> 
>>>>>>> struct A
>>>>>>> {
>>>>>>> size_t size;
>>>>>>> char buf[] __attribute__((counted_by(size)));
>>>>>>> }
>>>>>>> 
>>>>>>> static size_t
>>>>>>> get_size_of (void *ptr)
>>>>>>> {
>>>>>>> return __bdos (ptr, 1);
>>>>>>> }
>>>>>>> 
>>>>>>> void
>>>>>>> foo (size_t sz)
>>>>>>> {
>>>>>>> struct A *obj = __builtin_malloc (sz);
>>>>>>> obj->size = sz;
>>>>>>> 
>>>>>>> ...
>>>>>>> __builtin_printf ("%zu\n", get_size_of (obj->array));
>>>>>>> ...
>>>>>>> }
>>>>>>> 
>>>>>>> Until get_size_of is inlined, no DFA can see the __bdos call in the same place as the point where obj is allocated.  As a result, the assignment to obj->size could get reordered (or the store eliminated) w.r.t. the __bdos call until the inlining happens.
>>>>>>> 
>>>>>>> As a result, the relationship between buf and size established by the attribute needs to be encoded into the type somehow.  There are two options:
>>>>>>> 
>>>>>>> Option 1: Encode the relationship in the type of buf
>>>>>>> 
>>>>>>> This is kinda what you end up doing with component_ref_has_counted_by and it does show the relationship if one is looking (through that call), but nothing more that can be used to, e.g. prevent reordering or tell the optimizer that the reference to the buf member may imply a reference to the size member as well.  This could be remedied by somehow encoding the USES relationship for size into the type of buf that the optimization passes can see.  I feel like this may be a bit convoluted to specify in a future language extension in a way that will actually be well understood by developers, but it will likely generate faster runtime code.  This will also likely require a bigger change across passes.
>>>>>>> 
>>>>>>> Option 2: Encode the relationship in the type of size
>>>>>>> 
>>>>>>> The other option is to enhance the type of size somehow so that it discourages reordering and store elimination, basically pessimizing code.  I think volatile semantics might be the way to do this and may even be straightforward to specify in the future language extension given that it builds on a known language construct and is thematically related.  However it does pessimize output for code that implements __counted_by__.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Sid
>>>>>> 
>>>>> 
>>> 
>>> -- 
>>> Univ.-Prof. Dr. rer. nat. Martin Uecker
>>> Graz University of Technology
>>> Institute of Biomedical Imaging
>>> 
>>> 
> 
> -- 
> Univ.-Prof. Dr. rer. nat. Martin Uecker
> Graz University of Technology
> Institute of Biomedical Imaging
> 
>
Martin Uecker Oct. 25, 2023, 6:16 p.m. UTC | #65
Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
> 
> > Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
> > 
> > Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
> > > > On 2023-10-25 04:16, Martin Uecker wrote:
> > > > Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> > > > > 
> > > > > > Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
> > > > > > 
> > > > > > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> > > > > > > Hi, Sid,
> > > > > > > 
> > > > > > > Really appreciate for your example and detailed explanation. Very helpful.
> > > > > > > I think that this example is an excellent example to show (almost) all the issues we need to consider.
> > > > > > > 
> > > > > > > I slightly modified this example to make it to be compilable and run-able, as following:
> > > > > > > (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
> > > > > > > 
> > > > > > >  1 #include <malloc.h>
> > > > > > >  2 struct A
> > > > > > >  3 {
> > > > > > >  4  size_t size;
> > > > > > >  5  char buf[] __attribute__((counted_by(size)));
> > > > > > >  6 };
> > > > > > >  7
> > > > > > >  8 static size_t
> > > > > > >  9 get_size_from (void *ptr)
> > > > > > > 10 {
> > > > > > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > > > > > 12 }
> > > > > > > 13
> > > > > > > 14 void
> > > > > > > 15 foo (size_t sz)
> > > > > > > 16 {
> > > > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > > > > > 18  obj->size = sz;
> > > > > > > 19  obj->buf[0] = 2;
> > > > > > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > > > > > 21  return;
> > > > > > > 22 }
> > > > > > > 23
> > > > > > > 24 int main ()
> > > > > > > 25 {
> > > > > > > 26  foo (20);
> > > > > > > 27  return 0;
> > > > > > > 28 }
> > > > > > > 
> > > 
> > > <snip>
> > > 
> > > > > When it’s set I suppose.  Turn
> > > > > 
> > > > > X.l = n;
> > > > > 
> > > > > Into
> > > > > 
> > > > > X.l = __builtin_with_size (x.buf, n);
> > > > 
> > > > It would turn
> > > > 
> > > > some_variable = (&) x.buf
> > > > 
> > > > into
> > > > 
> > > > some_variable = __builtin_with_size ( (&) x.buf. x.len)
> > > > 
> > > > 
> > > > So the later access to x.buf and not the initialization
> > > > of a member of the struct (which is too early).
> > > > 
> > > 
> > > Hmm, so with Qing's example above, are you suggesting the transformation 
> > > be to foo like so:
> > > 
> > > 14 void
> > > 15 foo (size_t sz)
> > > 16 {
> > > 16.5  void * _1;
> > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > 18  obj->size = sz;
> > > 19  obj->buf[0] = 2;
> > > 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
> > > 20  __builtin_printf (“%d\n", get_size_from (_1));
> > > 21  return;
> > > 22 }
> > > 
> > > If yes then this could indeed work.  I think I got thrown off by the 
> > > reference to __bdos.
> > 
> > Yes. I think it is important not to evaluate the size at the
> > access to buf and not the allocation, because the point is to 
> > recover it from the size member even when the compiler can't 
> > see the original allocation.
> 
> But if the access is through a pointer without the attribute visible
> even the Frontend cannot recover?  

Yes, if the access is using a struct-with-FAM without the attribute
the FE would not be insert the builtin.  BDOS could potentially
still see the original allocation but if it doesn't, then there is
no information.

> We’d need to force type correctness and give up on indirecting
> through an int * when it can refer to two diffenent container types. 
> The best we can do I think is mark allocation sites and hope for
> some basic code hygiene (not clobbering size or array pointer
> through pointers without the appropriately attributed type)

I am do not fully understand what you are referring to. But yes,
for full bounds safety we would need the language feature.
In C people should start to variably-modified types
more.  I think we can build perfect bounds safety on top of
them in a very good way with only FE changes.

All these attributes are just a best effort.  But for a while,
this will be necessary.

Martin

> 
> > Evaluating at this point requires that the size is correctly set
> > before the access to the FAM and the user has to make sure 
> > this is the case. But to me this requirement would make sense.
> > 
> > Semantically, it could aöso make sense to evaluate the size at a
> > later time.  But then the reordering becomes problematic again.
> > 
> > Also I think this would make this feature generally more useful.
> > For example, it could work also for others pointers in the struct
> > and not just for FAMs.  In this case, the struct may already be
> > freed when  BDOS is called, so it might also not possible to
> > access the size member at a later time.
> > 
> > Martin
> > 
> > 
> > > 
> >
Qing Zhao Oct. 25, 2023, 6:17 p.m. UTC | #66
> On Oct 25, 2023, at 7:13 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> 
> 
>> Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
>> 
>> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>>> 
>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>>> 
>>>>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>>>>> Hi, Sid,
>>>>>>> 
>>>>>>> Really appreciate for your example and detailed explanation. Very helpful.
>>>>>>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>>>>>>> 
>>>>>>> I slightly modified this example to make it to be compilable and run-able, as following:
>>>>>>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>>>>>>> 
>>>>>>> 1 #include <malloc.h>
>>>>>>> 2 struct A
>>>>>>> 3 {
>>>>>>> 4  size_t size;
>>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>>> 6 };
>>>>>>> 7
>>>>>>> 8 static size_t
>>>>>>> 9 get_size_from (void *ptr)
>>>>>>> 10 {
>>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>>> 12 }
>>>>>>> 13
>>>>>>> 14 void
>>>>>>> 15 foo (size_t sz)
>>>>>>> 16 {
>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>>> 18  obj->size = sz;
>>>>>>> 19  obj->buf[0] = 2;
>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>>> 21  return;
>>>>>>> 22 }
>>>>>>> 23
>>>>>>> 24 int main ()
>>>>>>> 25 {
>>>>>>> 26  foo (20);
>>>>>>> 27  return 0;
>>>>>>> 28 }
>>>>>>> 
>>> 
>>> <snip>
>>> 
>>>>> When it’s set I suppose.  Turn
>>>>> 
>>>>> X.l = n;
>>>>> 
>>>>> Into
>>>>> 
>>>>> X.l = __builtin_with_size (x.buf, n);
>>>> 
>>>> It would turn
>>>> 
>>>> some_variable = (&) x.buf
>>>> 
>>>> into
>>>> 
>>>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>>>> 
>>>> 
>>>> So the later access to x.buf and not the initialization
>>>> of a member of the struct (which is too early).
>>>> 
>>> 
>>> Hmm, so with Qing's example above, are you suggesting the transformation 
>>> be to foo like so:
>>> 
>>> 14 void
>>> 15 foo (size_t sz)
>>> 16 {
>>> 16.5  void * _1;
>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>> 18  obj->size = sz;
>>> 19  obj->buf[0] = 2;
>>> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
>>> 20  __builtin_printf (“%d\n", get_size_from (_1));
>>> 21  return;
>>> 22 }
>>> 
>>> If yes then this could indeed work.  I think I got thrown off by the 
>>> reference to __bdos.
>> 
>> Yes. I think it is important not to evaluate the size at the
>> access to buf and not the allocation, because the point is to 
>> recover it from the size member even when the compiler can't 
>> see the original allocation.
> 
> But if the access is through a pointer without the attribute visible even the Frontend cannot recover?  We’d need to force type correctness and give up on indirecting through an int * when it can refer to two diffenent container types.

Might need issue warnings when this happens?

>  The best we can do I think is mark allocation sites and hope for some basic code hygiene (not clobbering size or array pointer through pointers without the appropriately attributed type)
I guess that we need to clarify the requirement in the documentation, and also issue warnings when the source code has such issues.

Qing
> 
>> Evaluating at this point requires that the size is correctly set
>> before the access to the FAM and the user has to make sure 
>> this is the case. But to me this requirement would make sense.
>> 
>> Semantically, it could aöso make sense to evaluate the size at a
>> later time.  But then the reordering becomes problematic again.
>> 
>> Also I think this would make this feature generally more useful.
>> For example, it could work also for others pointers in the struct
>> and not just for FAMs.  In this case, the struct may already be
>> freed when  BDOS is called, so it might also not possible to
>> access the size member at a later time.
>> 
>> Martin
>> 
>> 
>>> 
>>
Qing Zhao Oct. 25, 2023, 6:44 p.m. UTC | #67
> On Oct 25, 2023, at 10:50 AM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> 
> On 2023-10-25 09:27, Qing Zhao wrote:
>>> On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>> 
>>> On 2023-10-24 18:51, Qing Zhao wrote:
>>>> Thanks for the proposal!
>>>> So what you suggested is:
>>>> For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, then the call to the _bdos (x.buf, 1) will
>>>> Become:
>>>>    _bdos(__builtin_with_size(x.buf, x.L), 1)?
>>>> Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
>>> 
>>> Oops, I think Martin and I fell off-list in a subthread.  I clarified that my comment was that any such annotation at object reference is probably too late and hence not the right place for it; basically it has the same problems as the option A in your comment.  A better place to reinforce such a relationship would be the allocation+initialization site instead.
>> I think Martin’s proposal might work, it’s different than the option A:
>> A.  Add an additional argument, the size parameter,  to __bdos,
>>      A.1, during FE;
>>      A.2, during gimplification phase;
>> Option A targets on the __bdos call, try to encode the implicit use to the call, this will not work when the real object has not been instantiation at the call site.
>> However, Martin’s proposal targets on the FMA array itself, it will enhance the FAM access naturally with the size information. And such FAM access with size info will propagated to the __bdos site later through inlining, etc. and then tree-object-size can use the size information at that point. At the same time, the implicit use of the size is recorded correctly.
>> So, I think that this proposal is natural and reasonable.
> 
> Ack, we discussed this later in the thread and I agree[1].  Richard still has concerns[2] that I think may be addressed by putting __builtin_with_size at the point where the reference to x.buf escapes, but I'm not very sure about that.
> 
> Oh, and Martin suggested using __builtin_with_size more generally[3] in bugzilla to address attribute inlining issues and we have high level consensus for a __builtin_with_access instead, which associates access type in addition to size with the target object.  For the purposes of counted_by, access type could simply be -1.

Yes, I read all the discussions in the comments of PR96503 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503), and I do agree that this is a good idea. 

I prefer the name for the new builtin as:  
__builtin_with_access_and_size
Instead of 
__builtin_with_access

All the attributes, “alloca_size”, “access”, and the new “counted_by” for FMA, could be converted to this builtin consistently, and even the later new extension, for example, “counted_by” attribute for general pointers, could use the same builtin. 

SOMETYPE *ptr = __builtin_with_access_and_size (SOMETYPE *ptr, size_t size, int access)

In the above, 

1. SOMETYPE will be the type of the pointee of “ptr”, it could be a real type or void.

2. “size”

If SOMETYPE is a real type, the “size” will be the number of elements of the type;
If SOMETYPE is void, the “size” will be the number of bytes.   

3. “access”

-1: Unknown access semantics
0: none
1: read_only
2: write_only
3: read_write

For the “counted_by” and “alloca_size” attribute, the “access” will be -1. 

Qing
> 
> Thanks,
> Sid
> 
> 
> [1] https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9908@gotplt.org/T/#m4f3cafa489493180e258fd62aca0196a5f244039
> 
> [2] https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9908@gotplt.org/T/#mcf226f891621db8b640deaedd8942bb8519010f3
> 
> [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503#c6
Qing Zhao Oct. 25, 2023, 7:03 p.m. UTC | #68
> On Oct 25, 2023, at 11:38 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> 
> 
>> Am 25.10.2023 um 16:50 schrieb Siddhesh Poyarekar <siddhesh@gotplt.org>:
>> 
>> On 2023-10-25 09:27, Qing Zhao wrote:
>>>>> On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>>> 
>>>> On 2023-10-24 18:51, Qing Zhao wrote:
>>>>> Thanks for the proposal!
>>>>> So what you suggested is:
>>>>> For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, then the call to the _bdos (x.buf, 1) will
>>>>> Become:
>>>>>   _bdos(__builtin_with_size(x.buf, x.L), 1)?
>>>>> Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
>>>> 
>>>> Oops, I think Martin and I fell off-list in a subthread.  I clarified that my comment was that any such annotation at object reference is probably too late and hence not the right place for it; basically it has the same problems as the option A in your comment.  A better place to reinforce such a relationship would be the allocation+initialization site instead.
>>> I think Martin’s proposal might work, it’s different than the option A:
>>> A.  Add an additional argument, the size parameter,  to __bdos,
>>>     A.1, during FE;
>>>     A.2, during gimplification phase;
>>> Option A targets on the __bdos call, try to encode the implicit use to the call, this will not work when the real object has not been instantiation at the call site.
>>> However, Martin’s proposal targets on the FMA array itself, it will enhance the FAM access naturally with the size information. And such FAM access with size info will propagated to the __bdos site later through inlining, etc. and then tree-object-size can use the size information at that point. At the same time, the implicit use of the size is recorded correctly.
>>> So, I think that this proposal is natural and reasonable.
>> 
>> Ack, we discussed this later in the thread and I agree[1].  Richard still has concerns[2] that I think may be addressed by putting __builtin_with_size at the point where the reference to x.buf escapes, but I'm not very sure about that.
>> 
>> Oh, and Martin suggested using __builtin_with_size more generally[3] in bugzilla to address attribute inlining issues and we have high level consensus for a __builtin_with_access instead, which associates access type in addition to size with the target object.  For the purposes of counted_by, access type could simply be -1.
> 
> Btw, I’d like to see some hard numbers on the amount of extra false positives this will cause a well as the effect on generated code before putting this in mainline and effectively needing to support it forever. 

What do you mean by the “extra false positives”? 

For the code generation impact:

turning the original  x.buf 
to a builtin function call
__builtin_with_access_and_size(x,buf, x.L,-1)

might inhibit some optimizations from happening before the builtin is evaluated into object size info (phase  .objsz1).  I guess there might be some performance impact. 

However, if we mark this builtin as PURE, NOTRROW, etc, then the negative performance impact will be reduced to minimum? 

Qing

> 
> Richard 
> 
>> Thanks,
>> Sid
>> 
>> 
>> [1] https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9908@gotplt.org/T/#m4f3cafa489493180e258fd62aca0196a5f244039
>> 
>> [2] https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9908@gotplt.org/T/#mcf226f891621db8b640deaedd8942bb8519010f3
>> 
>> [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503#c6
Kees Cook Oct. 25, 2023, 9:59 p.m. UTC | #69
On Tue, Oct 24, 2023 at 07:51:55PM -0400, Siddhesh Poyarekar wrote:
> Yes, that's the tradeoff.  However, maybe this is the point where Kees jumps
> in and say the kernel doesn't really care as much or something like that :)

"I only care about -O2" :)
Kees Cook Oct. 25, 2023, 10:06 p.m. UTC | #70
On Wed, Oct 25, 2023 at 01:27:29PM +0000, Qing Zhao wrote:
> A.  Add an additional argument, the size parameter,  to __bdos, 
>      A.1, during FE;
>      A.2, during gimplification phase;

I just wanted to clarify that this is all just an "internal" detail,
yes? i.e. the __bdos() used by in C code is unchanged?

For example, the Linux kernel can still use __bdos() without knowing
the count member ahead of time (otherwise it kind of defeats the purpose).
Qing Zhao Oct. 25, 2023, 10:27 p.m. UTC | #71
> On Oct 25, 2023, at 6:06 PM, Kees Cook <keescook@chromium.org> wrote:
> 
> On Wed, Oct 25, 2023 at 01:27:29PM +0000, Qing Zhao wrote:
>> A.  Add an additional argument, the size parameter,  to __bdos, 
>>     A.1, during FE;
>>     A.2, during gimplification phase;
> 
> I just wanted to clarify that this is all just an "internal" detail,
> yes?

YES!

> i.e. the __bdos() used by in C code is unchanged?

there should be no change to the user interface. 

> 
> For example, the Linux kernel can still use __bdos() without knowing
> the count member ahead of time (otherwise it kind of defeats the purpose).
Don’t quite understand this, could you clarify? 

(Anyway, the bottom line is no change to the user interface, we just discuss the internal implementation inside GCC) -:)

Qing
> 
> -- 
> Kees Cook
Kees Cook Oct. 25, 2023, 10:32 p.m. UTC | #72
On Wed, Oct 25, 2023 at 10:27:41PM +0000, Qing Zhao wrote:
> 
> 
> > On Oct 25, 2023, at 6:06 PM, Kees Cook <keescook@chromium.org> wrote:
> > 
> > On Wed, Oct 25, 2023 at 01:27:29PM +0000, Qing Zhao wrote:
> >> A.  Add an additional argument, the size parameter,  to __bdos, 
> >>     A.1, during FE;
> >>     A.2, during gimplification phase;
> > 
> > I just wanted to clarify that this is all just an "internal" detail,
> > yes?
> 
> YES!

Okay, I thought so, but I just wanted to double-check. :)

> > For example, the Linux kernel can still use __bdos() without knowing
> > the count member ahead of time (otherwise it kind of defeats the purpose).
> Don’t quite understand this, could you clarify? 

I was just trying to explain why a chance would be a problem. But it
doesn't matter, so nevermind. :)

> (Anyway, the bottom line is no change to the user interface, we just discuss the internal implementation inside GCC) -:)

Great! I'll go back to lurking. :)

Thanks!
Jakub Jelinek Oct. 26, 2023, 5:21 a.m. UTC | #73
On Wed, Oct 25, 2023 at 07:03:43PM +0000, Qing Zhao wrote:
> For the code generation impact:
> 
> turning the original  x.buf 
> to a builtin function call
> __builtin_with_access_and_size(x,buf, x.L,-1)
> 
> might inhibit some optimizations from happening before the builtin is
> evaluated into object size info (phase  .objsz1).  I guess there might be
> some performance impact.
> 
> However, if we mark this builtin as PURE, NOTRROW, etc, then the negative
> performance impact will be reduced to minimum?

You can't drop it during objsz1 pass though, otherwise __bdos wouldn't
be able to figure out the dynamic sizes in case of normal (non-early)
inlining - caller takes address of a counted_by array, passes it down
to callee which is only inlined late and uses __bdos, or callee takes address
and returns it and caller uses __bdos, etc. - so it would need to be objsz2.

And while the builtin (or if it is an internal detail rather than user
accessible builtin an internal function) could be even const/nothrow/leaf if
the arguments contain the loads from the structure 2 fields, I'm afraid it
will still have huge code generation impact, prevent tons of pre-IPA
optimizations.  And it will need some work to handle it properly during
inlining heuristics, because in GIMPLE the COMPONENT_REF loads aren't gimple
values, so it wouldn't be just the builtin/internal-fn call to be ignored,
but also the count load from memory.

	Jakub
Martin Uecker Oct. 26, 2023, 8:15 a.m. UTC | #74
Am Mittwoch, dem 25.10.2023 um 15:32 -0700 schrieb Kees Cook:
> On Wed, Oct 25, 2023 at 10:27:41PM +0000, Qing Zhao wrote:
> > 
> > 
> > > On Oct 25, 2023, at 6:06 PM, Kees Cook <keescook@chromium.org> wrote:
> > > 
> > > On Wed, Oct 25, 2023 at 01:27:29PM +0000, Qing Zhao wrote:
> > > > A.  Add an additional argument, the size parameter,  to __bdos, 
> > > >     A.1, during FE;
> > > >     A.2, during gimplification phase;
> > > 
> > > I just wanted to clarify that this is all just an "internal" detail,
> > > yes?
> > 
> > YES!
> 
> Okay, I thought so, but I just wanted to double-check. :)
> 
> > > For example, the Linux kernel can still use __bdos() without knowing
> > > the count member ahead of time (otherwise it kind of defeats the purpose).
> > Don’t quite understand this, could you clarify? 
> 
> I was just trying to explain why a chance would be a problem. But it
> doesn't matter, so nevermind. :)
> 
> > (Anyway, the bottom line is no change to the user interface, we just discuss the internal implementation inside GCC) -:)
> 
> Great! I'll go back to lurking. :)
> 
> Thanks!
> 

While it is about the internal implementation, it would
potentially affect the semantics of the attribute:

This would work:

x->count = 10;
char *p = &x->buf;

but not this:

char *p = &x->buf;
x->count = 1;
p[10] = 1; // !

(because the pointer is passed around the
store to the counter)

and also here the second store is then irrelevant
for the access:

x->count = 10;
char* p = &x->buf;
...
x->count = 1; // somewhere else
----
p[9] = 1; // ok, because count matter when buf was accesssed.


IMHO this makes sense also from the user side and
are the desirable semantics we discussed before.

But can you take a look at this?


This should simulate it fairly well:
https://godbolt.org/z/xq89aM7Gr

(the call to the noinline function would go away,
but not necessarily its impact on optimization)

Martin
Richard Biener Oct. 26, 2023, 8:45 a.m. UTC | #75
On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker <uecker@tugraz.at> wrote:
>
> Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
> >
> > > Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
> > >
> > > Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
> > > > > On 2023-10-25 04:16, Martin Uecker wrote:
> > > > > Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> > > > > >
> > > > > > > Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
> > > > > > >
> > > > > > > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> > > > > > > > Hi, Sid,
> > > > > > > >
> > > > > > > > Really appreciate for your example and detailed explanation. Very helpful.
> > > > > > > > I think that this example is an excellent example to show (almost) all the issues we need to consider.
> > > > > > > >
> > > > > > > > I slightly modified this example to make it to be compilable and run-able, as following:
> > > > > > > > (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
> > > > > > > >
> > > > > > > >  1 #include <malloc.h>
> > > > > > > >  2 struct A
> > > > > > > >  3 {
> > > > > > > >  4  size_t size;
> > > > > > > >  5  char buf[] __attribute__((counted_by(size)));
> > > > > > > >  6 };
> > > > > > > >  7
> > > > > > > >  8 static size_t
> > > > > > > >  9 get_size_from (void *ptr)
> > > > > > > > 10 {
> > > > > > > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > > > > > > 12 }
> > > > > > > > 13
> > > > > > > > 14 void
> > > > > > > > 15 foo (size_t sz)
> > > > > > > > 16 {
> > > > > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > > > > > > 18  obj->size = sz;
> > > > > > > > 19  obj->buf[0] = 2;
> > > > > > > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > > > > > > 21  return;
> > > > > > > > 22 }
> > > > > > > > 23
> > > > > > > > 24 int main ()
> > > > > > > > 25 {
> > > > > > > > 26  foo (20);
> > > > > > > > 27  return 0;
> > > > > > > > 28 }
> > > > > > > >
> > > >
> > > > <snip>
> > > >
> > > > > > When it’s set I suppose.  Turn
> > > > > >
> > > > > > X.l = n;
> > > > > >
> > > > > > Into
> > > > > >
> > > > > > X.l = __builtin_with_size (x.buf, n);
> > > > >
> > > > > It would turn
> > > > >
> > > > > some_variable = (&) x.buf
> > > > >
> > > > > into
> > > > >
> > > > > some_variable = __builtin_with_size ( (&) x.buf. x.len)
> > > > >
> > > > >
> > > > > So the later access to x.buf and not the initialization
> > > > > of a member of the struct (which is too early).
> > > > >
> > > >
> > > > Hmm, so with Qing's example above, are you suggesting the transformation
> > > > be to foo like so:
> > > >
> > > > 14 void
> > > > 15 foo (size_t sz)
> > > > 16 {
> > > > 16.5  void * _1;
> > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > > 18  obj->size = sz;
> > > > 19  obj->buf[0] = 2;
> > > > 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
> > > > 20  __builtin_printf (“%d\n", get_size_from (_1));
> > > > 21  return;
> > > > 22 }
> > > >
> > > > If yes then this could indeed work.  I think I got thrown off by the
> > > > reference to __bdos.
> > >
> > > Yes. I think it is important not to evaluate the size at the
> > > access to buf and not the allocation, because the point is to
> > > recover it from the size member even when the compiler can't
> > > see the original allocation.
> >
> > But if the access is through a pointer without the attribute visible
> > even the Frontend cannot recover?
>
> Yes, if the access is using a struct-with-FAM without the attribute
> the FE would not be insert the builtin.  BDOS could potentially
> still see the original allocation but if it doesn't, then there is
> no information.
>
> > We’d need to force type correctness and give up on indirecting
> > through an int * when it can refer to two diffenent container types.
> > The best we can do I think is mark allocation sites and hope for
> > some basic code hygiene (not clobbering size or array pointer
> > through pointers without the appropriately attributed type)
>
> I am do not fully understand what you are referring to.

struct A { int n; int data[n]; };
struct B { long n; int data[n]; };

int *p = flag ? a->data : b->data;

access *p;

Since we need to allow interoperability of pointers (a->data is
convertible to a non-fat pointer of type int *) this leaves us with
ambiguity we need to conservatively handle to avoid false positives.

We _might_ want to diagnose decay of a->data to int *, but IIRC
there's no way (or proposal) to allow declaring a corresponding
fat pointer, so it's not a good designed feature.

Having __builtin_with_size at allocation would possibly make
the BOS use-def walk discover both objects.  I think you can't
insert __builtin_with_size at the access to *p, but in practice
that would be very much needed.

Richard.

> But yes,
> for full bounds safety we would need the language feature.
> In C people should start to variably-modified types
> more.  I think we can build perfect bounds safety on top of
> them in a very good way with only FE changes.
>
> All these attributes are just a best effort.  But for a while,
> this will be necessary.
>
> Martin
>
> >
> > > Evaluating at this point requires that the size is correctly set
> > > before the access to the FAM and the user has to make sure
> > > this is the case. But to me this requirement would make sense.
> > >
> > > Semantically, it could aöso make sense to evaluate the size at a
> > > later time.  But then the reordering becomes problematic again.
> > >
> > > Also I think this would make this feature generally more useful.
> > > For example, it could work also for others pointers in the struct
> > > and not just for FAMs.  In this case, the struct may already be
> > > freed when  BDOS is called, so it might also not possible to
> > > access the size member at a later time.
> > >
> > > Martin
> > >
> > >
> > > >
> > >
>
Richard Biener Oct. 26, 2023, 8:56 a.m. UTC | #76
On Thu, Oct 26, 2023 at 7:22 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Oct 25, 2023 at 07:03:43PM +0000, Qing Zhao wrote:
> > For the code generation impact:
> >
> > turning the original  x.buf
> > to a builtin function call
> > __builtin_with_access_and_size(x,buf, x.L,-1)
> >
> > might inhibit some optimizations from happening before the builtin is
> > evaluated into object size info (phase  .objsz1).  I guess there might be
> > some performance impact.
> >
> > However, if we mark this builtin as PURE, NOTRROW, etc, then the negative
> > performance impact will be reduced to minimum?
>
> You can't drop it during objsz1 pass though, otherwise __bdos wouldn't
> be able to figure out the dynamic sizes in case of normal (non-early)
> inlining - caller takes address of a counted_by array, passes it down
> to callee which is only inlined late and uses __bdos, or callee takes address
> and returns it and caller uses __bdos, etc. - so it would need to be objsz2.
>
> And while the builtin (or if it is an internal detail rather than user
> accessible builtin an internal function) could be even const/nothrow/leaf if
> the arguments contain the loads from the structure 2 fields, I'm afraid it
> will still have huge code generation impact, prevent tons of pre-IPA
> optimizations.  And it will need some work to handle it properly during
> inlining heuristics, because in GIMPLE the COMPONENT_REF loads aren't gimple
> values, so it wouldn't be just the builtin/internal-fn call to be ignored,
> but also the count load from memory.

I think we want to track the value, not the "memory" in the builtin call,
so GIMPLE would be

 _1 = x.L;
 .. = __builtin_with_access_and_size (&x.buf, _1, -1);

also please make sure to use an internal function for
__builtin_with_access_and_size,
I don't think we want to expose this to users - it's an implementation detail.

Richard.

>
>         Jakub
>
Martin Uecker Oct. 26, 2023, 9:20 a.m. UTC | #77
Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
> On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker <uecker@tugraz.at> wrote:
> > 
> > Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
> > > 
> > > > Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
> > > > 
> > > > Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
> > > > > > On 2023-10-25 04:16, Martin Uecker wrote:
> > > > > > Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> > > > > > > 
> > > > > > > > Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
> > > > > > > > 
> > > > > > > > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> > > > > > > > > Hi, Sid,
> > > > > > > > > 
> > > > > > > > > Really appreciate for your example and detailed explanation. Very helpful.
> > > > > > > > > I think that this example is an excellent example to show (almost) all the issues we need to consider.
> > > > > > > > > 
> > > > > > > > > I slightly modified this example to make it to be compilable and run-able, as following:
> > > > > > > > > (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
> > > > > > > > > 
> > > > > > > > >  1 #include <malloc.h>
> > > > > > > > >  2 struct A
> > > > > > > > >  3 {
> > > > > > > > >  4  size_t size;
> > > > > > > > >  5  char buf[] __attribute__((counted_by(size)));
> > > > > > > > >  6 };
> > > > > > > > >  7
> > > > > > > > >  8 static size_t
> > > > > > > > >  9 get_size_from (void *ptr)
> > > > > > > > > 10 {
> > > > > > > > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > > > > > > > 12 }
> > > > > > > > > 13
> > > > > > > > > 14 void
> > > > > > > > > 15 foo (size_t sz)
> > > > > > > > > 16 {
> > > > > > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > > > > > > > 18  obj->size = sz;
> > > > > > > > > 19  obj->buf[0] = 2;
> > > > > > > > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > > > > > > > 21  return;
> > > > > > > > > 22 }
> > > > > > > > > 23
> > > > > > > > > 24 int main ()
> > > > > > > > > 25 {
> > > > > > > > > 26  foo (20);
> > > > > > > > > 27  return 0;
> > > > > > > > > 28 }
> > > > > > > > > 
> > > > > 
> > > > > <snip>
> > > > > 
> > > > > > > When it’s set I suppose.  Turn
> > > > > > > 
> > > > > > > X.l = n;
> > > > > > > 
> > > > > > > Into
> > > > > > > 
> > > > > > > X.l = __builtin_with_size (x.buf, n);
> > > > > > 
> > > > > > It would turn
> > > > > > 
> > > > > > some_variable = (&) x.buf
> > > > > > 
> > > > > > into
> > > > > > 
> > > > > > some_variable = __builtin_with_size ( (&) x.buf. x.len)
> > > > > > 
> > > > > > 
> > > > > > So the later access to x.buf and not the initialization
> > > > > > of a member of the struct (which is too early).
> > > > > > 
> > > > > 
> > > > > Hmm, so with Qing's example above, are you suggesting the transformation
> > > > > be to foo like so:
> > > > > 
> > > > > 14 void
> > > > > 15 foo (size_t sz)
> > > > > 16 {
> > > > > 16.5  void * _1;
> > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > > > 18  obj->size = sz;
> > > > > 19  obj->buf[0] = 2;
> > > > > 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
> > > > > 20  __builtin_printf (“%d\n", get_size_from (_1));
> > > > > 21  return;
> > > > > 22 }
> > > > > 
> > > > > If yes then this could indeed work.  I think I got thrown off by the
> > > > > reference to __bdos.
> > > > 
> > > > Yes. I think it is important not to evaluate the size at the
> > > > access to buf and not the allocation, because the point is to
> > > > recover it from the size member even when the compiler can't
> > > > see the original allocation.
> > > 
> > > But if the access is through a pointer without the attribute visible
> > > even the Frontend cannot recover?
> > 
> > Yes, if the access is using a struct-with-FAM without the attribute
> > the FE would not be insert the builtin.  BDOS could potentially
> > still see the original allocation but if it doesn't, then there is
> > no information.
> > 
> > > We’d need to force type correctness and give up on indirecting
> > > through an int * when it can refer to two diffenent container types.
> > > The best we can do I think is mark allocation sites and hope for
> > > some basic code hygiene (not clobbering size or array pointer
> > > through pointers without the appropriately attributed type)
> > 
> > I am do not fully understand what you are referring to.
> 
> struct A { int n; int data[n]; };
> struct B { long n; int data[n]; };
> 
> int *p = flag ? a->data : b->data;
> 
> access *p;
> 
> Since we need to allow interoperability of pointers (a->data is
> convertible to a non-fat pointer of type int *) this leaves us with
> ambiguity we need to conservatively handle to avoid false positives.

For BDOS, I would expect this to work exactly like:

char aa[n1];
char bb[n2];
char *p = flag ? aa : bb;

(or similar code with malloc). In fact it does:

https://godbolt.org/z/bK68YKqhe
(cheating a bit and also the sub-object version of
BDOS does not seem to work)

> 
> We _might_ want to diagnose decay of a->data to int *, but IIRC
> there's no way (or proposal) to allow declaring a corresponding
> fat pointer, so it's not a good designed feature.

As a language feature, I fully agree.  I see the
counted_by attribute has a makeshift solution.

But we can already do:

auto p = flag ? &aa : &bb;

and this already works perfectly:

https://godbolt.org/z/rvb6xWWPj

We can also name the variably-modifed type: 

char (*p)[flag ? n1 : n2] = flag ? &aa : &bb;
https://godbolt.org/z/13cTT1vGP

The problem with this version is that consistency
is not checked. (I have patch for adding run-time
checks).

And then the next step would be to allow

char (*p)[:] = flag ? &aa : &bb;

or similar.  Dennis Ritchie proposed this himself
a long time ago.

So far this seems straightfoward.

If we then want to allow such wide pointers as
function arguments or in structs, we would need
to define an ABI. But the ABI could just be

struct { char (*p)[.s]; size_t s; };

Maybe we could try to make the following
ABI compatible:

int foo(int p[s], size_t s);
int foo(int p[:]);


> Having __builtin_with_size at allocation would possibly make
> the BOS use-def walk discover both objects.

Yes. But I do not think this there is any fundamental
difference to discovering allocation functions.

>   I think you can't
> insert __builtin_with_size at the access to *p, but in practice
> that would be very much needed.

Usually the access to *p would follow directly the
access x.buf, so BDOS should find it.

But yes, to get full bounds safety, the pointer type 
has to change to a variably-modified type (which would work
today) or a fat pointer type. The later can be built on
vm-types easily because all the FE semantics already
exists.

Martin

> 
> Richard.
> 
> > But yes,
> > for full bounds safety we would need the language feature.
> > In C people should start to variably-modified types
> > more.  I think we can build perfect bounds safety on top of
> > them in a very good way with only FE changes.
> > 
> > All these attributes are just a best effort.  But for a while,
> > this will be necessary.
> > 
> > Martin
> > 
> > > 
> > > > Evaluating at this point requires that the size is correctly set
> > > > before the access to the FAM and the user has to make sure
> > > > this is the case. But to me this requirement would make sense.
> > > > 
> > > > Semantically, it could aöso make sense to evaluate the size at a
> > > > later time.  But then the reordering becomes problematic again.
> > > > 
> > > > Also I think this would make this feature generally more useful.
> > > > For example, it could work also for others pointers in the struct
> > > > and not just for FAMs.  In this case, the struct may already be
> > > > freed when  BDOS is called, so it might also not possible to
> > > > access the size member at a later time.
> > > > 
> > > > Martin
> > > > 
> > > > 
> > > > > 
> > > > 
> >
Martin Uecker Oct. 26, 2023, 10:14 a.m. UTC | #78
Am Donnerstag, dem 26.10.2023 um 11:20 +0200 schrieb Martin Uecker:
> Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
> > On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker <uecker@tugraz.at> wrote:
> > > 
> > > Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
> > > > 
> > > > > Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
> > > > > 
> > > > > Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
> > > > > > > On 2023-10-25 04:16, Martin Uecker wrote:
> > > > > > > Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> > > > > > > > 
> > > > > > > > > Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
> > > > > > > > > 
> > > > > > > > > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> > > > > > > > > > Hi, Sid,
> > > > > > > > > > 
> > > > > > > > > > Really appreciate for your example and detailed explanation. Very helpful.
> > > > > > > > > > I think that this example is an excellent example to show (almost) all the issues we need to consider.
> > > > > > > > > > 
> > > > > > > > > > I slightly modified this example to make it to be compilable and run-able, as following:
> > > > > > > > > > (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
> > > > > > > > > > 
> > > > > > > > > >  1 #include <malloc.h>
> > > > > > > > > >  2 struct A
> > > > > > > > > >  3 {
> > > > > > > > > >  4  size_t size;
> > > > > > > > > >  5  char buf[] __attribute__((counted_by(size)));
> > > > > > > > > >  6 };
> > > > > > > > > >  7
> > > > > > > > > >  8 static size_t
> > > > > > > > > >  9 get_size_from (void *ptr)
> > > > > > > > > > 10 {
> > > > > > > > > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > > > > > > > > 12 }
> > > > > > > > > > 13
> > > > > > > > > > 14 void
> > > > > > > > > > 15 foo (size_t sz)
> > > > > > > > > > 16 {
> > > > > > > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > > > > > > > > 18  obj->size = sz;
> > > > > > > > > > 19  obj->buf[0] = 2;
> > > > > > > > > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > > > > > > > > 21  return;
> > > > > > > > > > 22 }
> > > > > > > > > > 23
> > > > > > > > > > 24 int main ()
> > > > > > > > > > 25 {
> > > > > > > > > > 26  foo (20);
> > > > > > > > > > 27  return 0;
> > > > > > > > > > 28 }
> > > > > > > > > > 
> > > > > > 
> > > > > > <snip>
> > > > > > 
> > > > > > > > When it’s set I suppose.  Turn
> > > > > > > > 
> > > > > > > > X.l = n;
> > > > > > > > 
> > > > > > > > Into
> > > > > > > > 
> > > > > > > > X.l = __builtin_with_size (x.buf, n);
> > > > > > > 
> > > > > > > It would turn
> > > > > > > 
> > > > > > > some_variable = (&) x.buf
> > > > > > > 
> > > > > > > into
> > > > > > > 
> > > > > > > some_variable = __builtin_with_size ( (&) x.buf. x.len)
> > > > > > > 
> > > > > > > 
> > > > > > > So the later access to x.buf and not the initialization
> > > > > > > of a member of the struct (which is too early).
> > > > > > > 
> > > > > > 
> > > > > > Hmm, so with Qing's example above, are you suggesting the transformation
> > > > > > be to foo like so:
> > > > > > 
> > > > > > 14 void
> > > > > > 15 foo (size_t sz)
> > > > > > 16 {
> > > > > > 16.5  void * _1;
> > > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > > > > 18  obj->size = sz;
> > > > > > 19  obj->buf[0] = 2;
> > > > > > 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
> > > > > > 20  __builtin_printf (“%d\n", get_size_from (_1));
> > > > > > 21  return;
> > > > > > 22 }
> > > > > > 
> > > > > > If yes then this could indeed work.  I think I got thrown off by the
> > > > > > reference to __bdos.
> > > > > 
> > > > > Yes. I think it is important not to evaluate the size at the
> > > > > access to buf and not the allocation, because the point is to
> > > > > recover it from the size member even when the compiler can't
> > > > > see the original allocation.
> > > > 
> > > > But if the access is through a pointer without the attribute visible
> > > > even the Frontend cannot recover?
> > > 
> > > Yes, if the access is using a struct-with-FAM without the attribute
> > > the FE would not be insert the builtin.  BDOS could potentially
> > > still see the original allocation but if it doesn't, then there is
> > > no information.
> > > 
> > > > We’d need to force type correctness and give up on indirecting
> > > > through an int * when it can refer to two diffenent container types.
> > > > The best we can do I think is mark allocation sites and hope for
> > > > some basic code hygiene (not clobbering size or array pointer
> > > > through pointers without the appropriately attributed type)
> > > 
> > > I am do not fully understand what you are referring to.
> > 
> > struct A { int n; int data[n]; };
> > struct B { long n; int data[n]; };
> > 
> > int *p = flag ? a->data : b->data;
> > 
> > access *p;
> > 
> > Since we need to allow interoperability of pointers (a->data is
> > convertible to a non-fat pointer of type int *) this leaves us with
> > ambiguity we need to conservatively handle to avoid false positives.
> 
> For BDOS, I would expect this to work exactly like:
> 
> char aa[n1];
> char bb[n2];
> char *p = flag ? aa : bb;
> 
> (or similar code with malloc). In fact it does:
> 
> https://godbolt.org/z/bK68YKqhe
> (cheating a bit and also the sub-object version of
> BDOS does not seem to work)
> 
> > 
> > We _might_ want to diagnose decay of a->data to int *, but IIRC
> > there's no way (or proposal) to allow declaring a corresponding
> > fat pointer, so it's not a good designed feature.
> 
> As a language feature, I fully agree.  I see the
> counted_by attribute has a makeshift solution.
> 
> But we can already do:
> 
> auto p = flag ? &aa : &bb;
> 
> and this already works perfectly:
> 
> https://godbolt.org/z/rvb6xWWPj
> 
> We can also name the variably-modifed type: 
> 
> char (*p)[flag ? n1 : n2] = flag ? &aa : &bb;
> https://godbolt.org/z/13cTT1vGP
> 
> The problem with this version is that consistency
> is not checked. (I have patch for adding run-time
> checks).
> 
> And then the next step would be to allow
> 
> char (*p)[:] = flag ? &aa : &bb;
> 
> or similar.  Dennis Ritchie proposed this himself
> a long time ago.
> 
> So far this seems straightfoward.
> 
> If we then want to allow such wide pointers as
> function arguments or in structs, we would need
> to define an ABI. But the ABI could just be
> 
> struct { char (*p)[.s]; size_t s; };
> 
> Maybe we could try to make the following
> ABI compatible:
> 
> int foo(int p[s], size_t s);
> int foo(int p[:]);
> 
> 
> > Having __builtin_with_size at allocation would possibly make
> > the BOS use-def walk discover both objects.
> 
> Yes. But I do not think this there is any fundamental
> difference to discovering allocation functions.
> 
> >   I think you can't
> > insert __builtin_with_size at the access to *p, but in practice
> > that would be very much needed.
> 
> Usually the access to *p would follow directly the
> access x.buf, so BDOS should find it.
> 
> But yes, to get full bounds safety, the pointer type 
> has to change to a variably-modified type (which would work
> today) or a fat pointer type. The later can be built on
> vm-types easily because all the FE semantics already
> exists.

We could insert the __builtin_with_size everywhere
we have to convert a wide pointer or let an array
decay to traditional pointer for reason of compatibility 
with legacy code.

Martin

> 
> Martin
> 
> > 
> > Richard.
> > 
> > > But yes,
> > > for full bounds safety we would need the language feature.
> > > In C people should start to variably-modified types
> > > more.  I think we can build perfect bounds safety on top of
> > > them in a very good way with only FE changes.
> > > 
> > > All these attributes are just a best effort.  But for a while,
> > > this will be necessary.
> > > 
> > > Martin
> > > 
> > > > 
> > > > > Evaluating at this point requires that the size is correctly set
> > > > > before the access to the FAM and the user has to make sure
> > > > > this is the case. But to me this requirement would make sense.
> > > > > 
> > > > > Semantically, it could aöso make sense to evaluate the size at a
> > > > > later time.  But then the reordering becomes problematic again.
> > > > > 
> > > > > Also I think this would make this feature generally more useful.
> > > > > For example, it could work also for others pointers in the struct
> > > > > and not just for FAMs.  In this case, the struct may already be
> > > > > freed when  BDOS is called, so it might also not possible to
> > > > > access the size member at a later time.
> > > > > 
> > > > > Martin
> > > > > 
> > > > > 
> > > > > > 
> > > > > 
> > > 
>
Richard Biener Oct. 26, 2023, 2:05 p.m. UTC | #79
> Am 26.10.2023 um 12:14 schrieb Martin Uecker <uecker@tugraz.at>:
> 
> Am Donnerstag, dem 26.10.2023 um 11:20 +0200 schrieb Martin Uecker:
>>> Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
>>> On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker <uecker@tugraz.at> wrote:
>>>> 
>>>> Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
>>>>> 
>>>>>> Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>>> 
>>>>>> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>>>>>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>>>>>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>>>>>>> 
>>>>>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>>>>>>> 
>>>>>>>>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>>>>>>>>> Hi, Sid,
>>>>>>>>>>> 
>>>>>>>>>>> Really appreciate for your example and detailed explanation. Very helpful.
>>>>>>>>>>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>>>>>>>>>>> 
>>>>>>>>>>> I slightly modified this example to make it to be compilable and run-able, as following:
>>>>>>>>>>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>>>>>>>>>>> 
>>>>>>>>>>> 1 #include <malloc.h>
>>>>>>>>>>> 2 struct A
>>>>>>>>>>> 3 {
>>>>>>>>>>> 4  size_t size;
>>>>>>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>>>>>>> 6 };
>>>>>>>>>>> 7
>>>>>>>>>>> 8 static size_t
>>>>>>>>>>> 9 get_size_from (void *ptr)
>>>>>>>>>>> 10 {
>>>>>>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>>>>>>> 12 }
>>>>>>>>>>> 13
>>>>>>>>>>> 14 void
>>>>>>>>>>> 15 foo (size_t sz)
>>>>>>>>>>> 16 {
>>>>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>>>>>>> 18  obj->size = sz;
>>>>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>>>>>>> 21  return;
>>>>>>>>>>> 22 }
>>>>>>>>>>> 23
>>>>>>>>>>> 24 int main ()
>>>>>>>>>>> 25 {
>>>>>>>>>>> 26  foo (20);
>>>>>>>>>>> 27  return 0;
>>>>>>>>>>> 28 }
>>>>>>>>>>> 
>>>>>>> 
>>>>>>> <snip>
>>>>>>> 
>>>>>>>>> When it’s set I suppose.  Turn
>>>>>>>>> 
>>>>>>>>> X.l = n;
>>>>>>>>> 
>>>>>>>>> Into
>>>>>>>>> 
>>>>>>>>> X.l = __builtin_with_size (x.buf, n);
>>>>>>>> 
>>>>>>>> It would turn
>>>>>>>> 
>>>>>>>> some_variable = (&) x.buf
>>>>>>>> 
>>>>>>>> into
>>>>>>>> 
>>>>>>>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> So the later access to x.buf and not the initialization
>>>>>>>> of a member of the struct (which is too early).
>>>>>>>> 
>>>>>>> 
>>>>>>> Hmm, so with Qing's example above, are you suggesting the transformation
>>>>>>> be to foo like so:
>>>>>>> 
>>>>>>> 14 void
>>>>>>> 15 foo (size_t sz)
>>>>>>> 16 {
>>>>>>> 16.5  void * _1;
>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>>> 18  obj->size = sz;
>>>>>>> 19  obj->buf[0] = 2;
>>>>>>> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (_1));
>>>>>>> 21  return;
>>>>>>> 22 }
>>>>>>> 
>>>>>>> If yes then this could indeed work.  I think I got thrown off by the
>>>>>>> reference to __bdos.
>>>>>> 
>>>>>> Yes. I think it is important not to evaluate the size at the
>>>>>> access to buf and not the allocation, because the point is to
>>>>>> recover it from the size member even when the compiler can't
>>>>>> see the original allocation.
>>>>> 
>>>>> But if the access is through a pointer without the attribute visible
>>>>> even the Frontend cannot recover?
>>>> 
>>>> Yes, if the access is using a struct-with-FAM without the attribute
>>>> the FE would not be insert the builtin.  BDOS could potentially
>>>> still see the original allocation but if it doesn't, then there is
>>>> no information.
>>>> 
>>>>> We’d need to force type correctness and give up on indirecting
>>>>> through an int * when it can refer to two diffenent container types.
>>>>> The best we can do I think is mark allocation sites and hope for
>>>>> some basic code hygiene (not clobbering size or array pointer
>>>>> through pointers without the appropriately attributed type)
>>>> 
>>>> I am do not fully understand what you are referring to.
>>> 
>>> struct A { int n; int data[n]; };
>>> struct B { long n; int data[n]; };
>>> 
>>> int *p = flag ? a->data : b->data;
>>> 
>>> access *p;
>>> 
>>> Since we need to allow interoperability of pointers (a->data is
>>> convertible to a non-fat pointer of type int *) this leaves us with
>>> ambiguity we need to conservatively handle to avoid false positives.
>> 
>> For BDOS, I would expect this to work exactly like:
>> 
>> char aa[n1];
>> char bb[n2];
>> char *p = flag ? aa : bb;
>> 
>> (or similar code with malloc). In fact it does:
>> 
>> https://godbolt.org/z/bK68YKqhe
>> (cheating a bit and also the sub-object version of
>> BDOS does not seem to work)
>> 
>>> 
>>> We _might_ want to diagnose decay of a->data to int *, but IIRC
>>> there's no way (or proposal) to allow declaring a corresponding
>>> fat pointer, so it's not a good designed feature.
>> 
>> As a language feature, I fully agree.  I see the
>> counted_by attribute has a makeshift solution.
>> 
>> But we can already do:
>> 
>> auto p = flag ? &aa : &bb;
>> 
>> and this already works perfectly:
>> 
>> https://godbolt.org/z/rvb6xWWPj
>> 
>> We can also name the variably-modifed type: 
>> 
>> char (*p)[flag ? n1 : n2] = flag ? &aa : &bb;
>> https://godbolt.org/z/13cTT1vGP
>> 
>> The problem with this version is that consistency
>> is not checked. (I have patch for adding run-time
>> checks).
>> 
>> And then the next step would be to allow
>> 
>> char (*p)[:] = flag ? &aa : &bb;
>> 
>> or similar.  Dennis Ritchie proposed this himself
>> a long time ago.
>> 
>> So far this seems straightfoward.
>> 
>> If we then want to allow such wide pointers as
>> function arguments or in structs, we would need
>> to define an ABI. But the ABI could just be
>> 
>> struct { char (*p)[.s]; size_t s; };
>> 
>> Maybe we could try to make the following
>> ABI compatible:
>> 
>> int foo(int p[s], size_t s);
>> int foo(int p[:]);
>> 
>> 
>>> Having __builtin_with_size at allocation would possibly make
>>> the BOS use-def walk discover both objects.
>> 
>> Yes. But I do not think this there is any fundamental
>> difference to discovering allocation functions.
>> 
>>>  I think you can't
>>> insert __builtin_with_size at the access to *p, but in practice
>>> that would be very much needed.
>> 
>> Usually the access to *p would follow directly the
>> access x.buf, so BDOS should find it.
>> 
>> But yes, to get full bounds safety, the pointer type 
>> has to change to a variably-modified type (which would work
>> today) or a fat pointer type. The later can be built on
>> vm-types easily because all the FE semantics already
>> exists.
> 
> We could insert the __builtin_with_size everywhere
> we have to convert a wide pointer or let an array
> decay to traditional pointer for reason of compatibility 
> with legacy code.

That sounds like a nice idea.  Note I’d like to see the consumer side implemented so we can play with different points of insertion (and I’ll try to show corner cases where it goes wrong).  It all seems a bit late for GCC 14 though.

Richard 

> Martin
> 
>> 
>> Martin
>> 
>>> 
>>> Richard.
>>> 
>>>> But yes,
>>>> for full bounds safety we would need the language feature.
>>>> In C people should start to variably-modified types
>>>> more.  I think we can build perfect bounds safety on top of
>>>> them in a very good way with only FE changes.
>>>> 
>>>> All these attributes are just a best effort.  But for a while,
>>>> this will be necessary.
>>>> 
>>>> Martin
>>>> 
>>>>> 
>>>>>> Evaluating at this point requires that the size is correctly set
>>>>>> before the access to the FAM and the user has to make sure
>>>>>> this is the case. But to me this requirement would make sense.
>>>>>> 
>>>>>> Semantically, it could aöso make sense to evaluate the size at a
>>>>>> later time.  But then the reordering becomes problematic again.
>>>>>> 
>>>>>> Also I think this would make this feature generally more useful.
>>>>>> For example, it could work also for others pointers in the struct
>>>>>> and not just for FAMs.  In this case, the struct may already be
>>>>>> freed when  BDOS is called, so it might also not possible to
>>>>>> access the size member at a later time.
>>>>>> 
>>>>>> Martin
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> 
>
Qing Zhao Oct. 26, 2023, 2:41 p.m. UTC | #80
> On Oct 26, 2023, at 1:21 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> On Wed, Oct 25, 2023 at 07:03:43PM +0000, Qing Zhao wrote:
>> For the code generation impact:
>> 
>> turning the original  x.buf 
>> to a builtin function call
>> __builtin_with_access_and_size(x,buf, x.L,-1)
>> 
>> might inhibit some optimizations from happening before the builtin is
>> evaluated into object size info (phase  .objsz1).  I guess there might be
>> some performance impact.
>> 
>> However, if we mark this builtin as PURE, NOTRROW, etc, then the negative
>> performance impact will be reduced to minimum?
> 
> You can't drop it during objsz1 pass though, otherwise __bdos wouldn't
> be able to figure out the dynamic sizes in case of normal (non-early)
> inlining - caller takes address of a counted_by array, passes it down
> to callee which is only inlined late and uses __bdos, or callee takes address
> and returns it and caller uses __bdos, etc. - so it would need to be objsz2.

I guess that I didn’t say it very clear previously. Let me explain again:

My understanding is, there are “early_objsz” phase and then later “objsz1” phase for -O[1|2|3]. 
For -Og, there are “early_objsz” and then later “objsz2”. 

So, the “objsz1” I mentioned (for the case -O[1|2|3])  should be the same as the “objsz2” you mentioned above?  -:)
It’s the second objsz phase. 

In the second objsz phase, I believe that all the inlining (including early inlining and IPA inlining) are all applied?
> 
> And while the builtin (or if it is an internal detail rather than user
> accessible builtin an internal function)

Okay, will use an “internal function” instead of “ builtin function”. 

> could be even const/nothrow/leaf if
> the arguments contain the loads from the structure 2 fields, I'm afraid it
> will still have huge code generation impact, prevent tons of pre-IPA
> optimizations.  And it will need some work to handle it properly during
> inlining heuristics, because in GIMPLE the COMPONENT_REF loads aren't gimple
> values, so it wouldn't be just the builtin/internal-fn call to be ignored,
> but also the count load from memory.

Are you worrying about the potential additional LOADs will change the inlining decision
 since the inlining heuristic depends on the # of loads from memory? 

In additional to the # of loads, the # of instructions and the # of calls of the function 
might be increased too, will these have impact on inlining decision? 

In addition to inlining decision, any other impact to other IPA optimizations? 

thanks.

Qing


> 
> 	Jakub
>
Qing Zhao Oct. 26, 2023, 2:58 p.m. UTC | #81
> On Oct 26, 2023, at 4:56 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Thu, Oct 26, 2023 at 7:22 AM Jakub Jelinek <jakub@redhat.com> wrote:
>> 
>> On Wed, Oct 25, 2023 at 07:03:43PM +0000, Qing Zhao wrote:
>>> For the code generation impact:
>>> 
>>> turning the original  x.buf
>>> to a builtin function call
>>> __builtin_with_access_and_size(x,buf, x.L,-1)
>>> 
>>> might inhibit some optimizations from happening before the builtin is
>>> evaluated into object size info (phase  .objsz1).  I guess there might be
>>> some performance impact.
>>> 
>>> However, if we mark this builtin as PURE, NOTRROW, etc, then the negative
>>> performance impact will be reduced to minimum?
>> 
>> You can't drop it during objsz1 pass though, otherwise __bdos wouldn't
>> be able to figure out the dynamic sizes in case of normal (non-early)
>> inlining - caller takes address of a counted_by array, passes it down
>> to callee which is only inlined late and uses __bdos, or callee takes address
>> and returns it and caller uses __bdos, etc. - so it would need to be objsz2.
>> 
>> And while the builtin (or if it is an internal detail rather than user
>> accessible builtin an internal function) could be even const/nothrow/leaf if
>> the arguments contain the loads from the structure 2 fields, I'm afraid it
>> will still have huge code generation impact, prevent tons of pre-IPA
>> optimizations.  And it will need some work to handle it properly during
>> inlining heuristics, because in GIMPLE the COMPONENT_REF loads aren't gimple
>> values, so it wouldn't be just the builtin/internal-fn call to be ignored,
>> but also the count load from memory.
> 
> I think we want to track the value, not the "memory" in the builtin call,
> so GIMPLE would be
> 
> _1 = x.L;
> .. = __builtin_with_access_and_size (&x.buf, _1, -1);

Before adding the __builtin_with_access_and_size, the code is:

&x.buf

After inserting the built-in, it becomes:

_1 = x.L;
__builtin_with_access_and_size (&x.buf, _1, -1).


So, the # of total instructions, the # of LOADs, and the # of calls will all be increased.
There will be impact to the inlining decision definitely.

> 
> also please make sure to use an internal function for
> __builtin_with_access_and_size,
> I don't think we want to expose this to users - it's an implementation detail.

Okay, will define it as an internal function (add it to internal-fn.def). -:)

Qing
> 
> Richard.
> 
>> 
>>        Jakub
>>
Richard Biener Oct. 26, 2023, 3:48 p.m. UTC | #82
> Am 26.10.2023 um 16:58 schrieb Qing Zhao <qing.zhao@oracle.com>:
> 
> 
> 
>> On Oct 26, 2023, at 4:56 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>>> On Thu, Oct 26, 2023 at 7:22 AM Jakub Jelinek <jakub@redhat.com> wrote:
>>> 
>>> On Wed, Oct 25, 2023 at 07:03:43PM +0000, Qing Zhao wrote:
>>>> For the code generation impact:
>>>> 
>>>> turning the original  x.buf
>>>> to a builtin function call
>>>> __builtin_with_access_and_size(x,buf, x.L,-1)
>>>> 
>>>> might inhibit some optimizations from happening before the builtin is
>>>> evaluated into object size info (phase  .objsz1).  I guess there might be
>>>> some performance impact.
>>>> 
>>>> However, if we mark this builtin as PURE, NOTRROW, etc, then the negative
>>>> performance impact will be reduced to minimum?
>>> 
>>> You can't drop it during objsz1 pass though, otherwise __bdos wouldn't
>>> be able to figure out the dynamic sizes in case of normal (non-early)
>>> inlining - caller takes address of a counted_by array, passes it down
>>> to callee which is only inlined late and uses __bdos, or callee takes address
>>> and returns it and caller uses __bdos, etc. - so it would need to be objsz2.
>>> 
>>> And while the builtin (or if it is an internal detail rather than user
>>> accessible builtin an internal function) could be even const/nothrow/leaf if
>>> the arguments contain the loads from the structure 2 fields, I'm afraid it
>>> will still have huge code generation impact, prevent tons of pre-IPA
>>> optimizations.  And it will need some work to handle it properly during
>>> inlining heuristics, because in GIMPLE the COMPONENT_REF loads aren't gimple
>>> values, so it wouldn't be just the builtin/internal-fn call to be ignored,
>>> but also the count load from memory.
>> 
>> I think we want to track the value, not the "memory" in the builtin call,
>> so GIMPLE would be
>> 
>> _1 = x.L;
>> .. = __builtin_with_access_and_size (&x.buf, _1, -1);
> 
> Before adding the __builtin_with_access_and_size, the code is:
> 
> &x.buf
> 
> After inserting the built-in, it becomes:
> 
> _1 = x.L;
> __builtin_with_access_and_size (&x.buf, _1, -1).
> 
> 
> So, the # of total instructions, the # of LOADs, and the # of calls will all be increased.
> There will be impact to the inlining decision definitely.

Note we have to make sure, if x is a pointer and we want to instrument &x->buf that we
Can dereference x.  Possibly doing

_1 = x ? x->Len : -1;

I’m not sure the C standard makes accessing x->Len unconditionally not undefined behavior when &x->buf is computed.  Definitely it’s a violation of the abstract machine of Len is volatile qualified (but we can reject such counted_by or instantiations as volatile qualified types).

Richard 

> 
>> 
>> also please make sure to use an internal function for
>> __builtin_with_access_and_size,
>> I don't think we want to expose this to users - it's an implementation detail.
> 
> Okay, will define it as an internal function (add it to internal-fn.def). -:)
> 
> Qing
>> 
>> Richard.
>> 
>>> 
>>>       Jakub
>>> 
>
Kees Cook Oct. 26, 2023, 4:13 p.m. UTC | #83
On Thu, Oct 26, 2023 at 10:15:10AM +0200, Martin Uecker wrote:
> but not this:
> 
> char *p = &x->buf;
> x->count = 1;
> p[10] = 1; // !

This seems fine to me -- it's how I'd expect it to work: "10" is beyond
"1".

> (because the pointer is passed around the
> store to the counter)
> 
> and also here the second store is then irrelevant
> for the access:
> 
> x->count = 10;
> char* p = &x->buf;
> ...
> x->count = 1; // somewhere else
> ----
> p[9] = 1; // ok, because count matter when buf was accesssed.

This is less great, but I can understand why it happens. "p" loses the
association with "x". It'd be nice if "p" had to way to retain that it
was just an alias for x->buf, so future p access would check count.

But this appears to be an existing limitation in other areas where an
assignment will cause the loss of object association. (I've run into
this before.) It's just more surprising in the above example because in
the past the loss of association would cause __bdos() to revert back to
"SIZE_MAX" results ("I don't know the size") rather than an "outdated"
size, which may get us into unexpected places...

> IMHO this makes sense also from the user side and
> are the desirable semantics we discussed before.
> 
> But can you take a look at this?
> 
> 
> This should simulate it fairly well:
> https://godbolt.org/z/xq89aM7Gr
> 
> (the call to the noinline function would go away,
> but not necessarily its impact on optimization)

Yeah, this example should be a very rare situation: a leaf function is
changing the characteristics of the struct but returning a buffer within
it to the caller. The more likely glitch would be from:

int main()
{
	struct foo *f = foo_alloc(7);
	char *p = FAM_ACCESS(f, size, buf);

	printf("%ld\n", __builtin_dynamic_object_size(p, 0));
	test1(f); // or just "f->count = 10;" no function call needed
	printf("%ld\n", __builtin_dynamic_object_size(p, 0));

	return 0;
}

which reports:
7
7

instead of:
7
10

This kind of "get an alias" situation is pretty common in the kernel
as a way to have a convenient "handle" to the array. In the case of a
"fill the array without knowing the actual final size" code pattern,
things would immediately break:

	struct foo *f;
	char *p;
	int i;

	f = alloc(maximum_possible);
	f->count = 0;
	p = f->buf;

	for (i; data_is_available() && i < maximum_possible; i++) {
		f->count ++;
		p[i] = next_data_item();
	}

Now perhaps the problem here is that "count" cannot be used for a count
of "logically valid members in the array" but must always be a count of
"allocated member space in the array", which I guess is tolerable, but
isn't ideal -- I'd like to catch logic bugs in addition to allocation
bugs, but the latter is certainly much more important to catch.
Martin Uecker Oct. 26, 2023, 4:16 p.m. UTC | #84
Am Donnerstag, dem 26.10.2023 um 17:48 +0200 schrieb Richard Biener:
> 
> > Am 26.10.2023 um 16:58 schrieb Qing Zhao <qing.zhao@oracle.com>:
> > 
> > 
> > 
> > > On Oct 26, 2023, at 4:56 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> > > 
> > > > On Thu, Oct 26, 2023 at 7:22 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > > > 
> > > > On Wed, Oct 25, 2023 at 07:03:43PM +0000, Qing Zhao wrote:
> > > > > For the code generation impact:
> > > > > 
> > > > > turning the original  x.buf
> > > > > to a builtin function call
> > > > > __builtin_with_access_and_size(x,buf, x.L,-1)
> > > > > 
> > > > > might inhibit some optimizations from happening before the builtin is
> > > > > evaluated into object size info (phase  .objsz1).  I guess there might be
> > > > > some performance impact.
> > > > > 
> > > > > However, if we mark this builtin as PURE, NOTRROW, etc, then the negative
> > > > > performance impact will be reduced to minimum?
> > > > 
> > > > You can't drop it during objsz1 pass though, otherwise __bdos wouldn't
> > > > be able to figure out the dynamic sizes in case of normal (non-early)
> > > > inlining - caller takes address of a counted_by array, passes it down
> > > > to callee which is only inlined late and uses __bdos, or callee takes address
> > > > and returns it and caller uses __bdos, etc. - so it would need to be objsz2.
> > > > 
> > > > And while the builtin (or if it is an internal detail rather than user
> > > > accessible builtin an internal function) could be even const/nothrow/leaf if
> > > > the arguments contain the loads from the structure 2 fields, I'm afraid it
> > > > will still have huge code generation impact, prevent tons of pre-IPA
> > > > optimizations.  And it will need some work to handle it properly during
> > > > inlining heuristics, because in GIMPLE the COMPONENT_REF loads aren't gimple
> > > > values, so it wouldn't be just the builtin/internal-fn call to be ignored,
> > > > but also the count load from memory.
> > > 
> > > I think we want to track the value, not the "memory" in the builtin call,
> > > so GIMPLE would be
> > > 
> > > _1 = x.L;
> > > .. = __builtin_with_access_and_size (&x.buf, _1, -1);
> > 
> > Before adding the __builtin_with_access_and_size, the code is:
> > 
> > &x.buf
> > 
> > After inserting the built-in, it becomes:
> > 
> > _1 = x.L;
> > __builtin_with_access_and_size (&x.buf, _1, -1).
> > 
> > 
> > So, the # of total instructions, the # of LOADs, and the # of calls will all be increased.
> > There will be impact to the inlining decision definitely.
> 
> Note we have to make sure, if x is a pointer and we want to instrument &x->buf that we
> Can dereference x.  Possibly doing
> 
> _1 = x ? x->Len : -1;
> 
> I’m not sure the C standard makes accessing x->Len unconditionally not undefined behavior when &x->buf is computed.  Definitely it’s a violation of the abstract machine of Len is volatile qualified (but we can reject such counted_by or instantiations as volatile qualified types).

I believe it is implicit UB to do &x->buf if there is
no object *x because the wording assumes the existence
of an object.  In that case accessing x->L should
be fine too.  

In practice the access may trap  for other reasons 
(mprotect etc.),  but I guess this is acceptable,
but should probably be documented...

We might need the x?  to not run into trouble with
those offsetof  implementations written using null
pointer.  Although in this case maybe one could
hope that the load will get optimized anyway ...

Martin

> 
> Richard 
> 
> > 
> > > 
> > > also please make sure to use an internal function for
> > > __builtin_with_access_and_size,
> > > I don't think we want to expose this to users - it's an implementation detail.
> > 
> > Okay, will define it as an internal function (add it to internal-fn.def). -:)
> > 
> > Qing
> > > 
> > > Richard.
> > > 
> > > > 
> > > >       Jakub
> > > > 
> >
Qing Zhao Oct. 26, 2023, 4:41 p.m. UTC | #85
> On Oct 26, 2023, at 5:20 AM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
>> On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker <uecker@tugraz.at> wrote:
>>> 
>>> Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
>>>> 
>>>>> Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>> 
>>>>> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>>>>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>>>>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>>>>>> 
>>>>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>>>>>> 
>>>>>>>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>>>>>>>> Hi, Sid,
>>>>>>>>>> 
>>>>>>>>>> Really appreciate for your example and detailed explanation. Very helpful.
>>>>>>>>>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>>>>>>>>>> 
>>>>>>>>>> I slightly modified this example to make it to be compilable and run-able, as following:
>>>>>>>>>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>>>>>>>>>> 
>>>>>>>>>> 1 #include <malloc.h>
>>>>>>>>>> 2 struct A
>>>>>>>>>> 3 {
>>>>>>>>>> 4  size_t size;
>>>>>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>>>>>> 6 };
>>>>>>>>>> 7
>>>>>>>>>> 8 static size_t
>>>>>>>>>> 9 get_size_from (void *ptr)
>>>>>>>>>> 10 {
>>>>>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>>>>>> 12 }
>>>>>>>>>> 13
>>>>>>>>>> 14 void
>>>>>>>>>> 15 foo (size_t sz)
>>>>>>>>>> 16 {
>>>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>>>>>> 18  obj->size = sz;
>>>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>>>>>> 21  return;
>>>>>>>>>> 22 }
>>>>>>>>>> 23
>>>>>>>>>> 24 int main ()
>>>>>>>>>> 25 {
>>>>>>>>>> 26  foo (20);
>>>>>>>>>> 27  return 0;
>>>>>>>>>> 28 }
>>>>>>>>>> 
>>>>>> 
>>>>>> <snip>
>>>>>> 
>>>>>>>> When it’s set I suppose.  Turn
>>>>>>>> 
>>>>>>>> X.l = n;
>>>>>>>> 
>>>>>>>> Into
>>>>>>>> 
>>>>>>>> X.l = __builtin_with_size (x.buf, n);
>>>>>>> 
>>>>>>> It would turn
>>>>>>> 
>>>>>>> some_variable = (&) x.buf
>>>>>>> 
>>>>>>> into
>>>>>>> 
>>>>>>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>>>>>>> 
>>>>>>> 
>>>>>>> So the later access to x.buf and not the initialization
>>>>>>> of a member of the struct (which is too early).
>>>>>>> 
>>>>>> 
>>>>>> Hmm, so with Qing's example above, are you suggesting the transformation
>>>>>> be to foo like so:
>>>>>> 
>>>>>> 14 void
>>>>>> 15 foo (size_t sz)
>>>>>> 16 {
>>>>>> 16.5  void * _1;
>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>> 18  obj->size = sz;
>>>>>> 19  obj->buf[0] = 2;
>>>>>> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
>>>>>> 20  __builtin_printf (“%d\n", get_size_from (_1));
>>>>>> 21  return;
>>>>>> 22 }
>>>>>> 
>>>>>> If yes then this could indeed work.  I think I got thrown off by the
>>>>>> reference to __bdos.
>>>>> 
>>>>> Yes. I think it is important not to evaluate the size at the
>>>>> access to buf and not the allocation, because the point is to
>>>>> recover it from the size member even when the compiler can't
>>>>> see the original allocation.
>>>> 
>>>> But if the access is through a pointer without the attribute visible
>>>> even the Frontend cannot recover?
>>> 
>>> Yes, if the access is using a struct-with-FAM without the attribute
>>> the FE would not be insert the builtin.  BDOS could potentially
>>> still see the original allocation but if it doesn't, then there is
>>> no information.
>>> 
>>>> We’d need to force type correctness and give up on indirecting
>>>> through an int * when it can refer to two diffenent container types.
>>>> The best we can do I think is mark allocation sites and hope for
>>>> some basic code hygiene (not clobbering size or array pointer
>>>> through pointers without the appropriately attributed type)
>>> 
>>> I am do not fully understand what you are referring to.
>> 
>> struct A { int n; int data[n]; };
>> struct B { long n; int data[n]; };
>> 
>> int *p = flag ? a->data : b->data;
>> 
>> access *p;
>> 
>> Since we need to allow interoperability of pointers (a->data is
>> convertible to a non-fat pointer of type int *) this leaves us with
>> ambiguity we need to conservatively handle to avoid false positives.
> 
> For BDOS, I would expect this to work exactly like:
> 
> char aa[n1];
> char bb[n2];
> char *p = flag ? aa : bb;
> 
> (or similar code with malloc). In fact it does:
> 
> https://godbolt.org/z/bK68YKqhe
> (cheating a bit and also the sub-object version of
> BDOS does not seem to work)
> 
>> 
>> We _might_ want to diagnose decay of a->data to int *, but IIRC
>> there's no way (or proposal) to allow declaring a corresponding
>> fat pointer, so it's not a good designed feature.
> 
> As a language feature, I fully agree.  I see the
> counted_by attribute has a makeshift solution.

The “counted_by” attribute is necessary at this moment since
 it will be much easier to be adopted by the existing source code,
 for example, the Linux Kernel. 

Though I agree that embedding the bound information into TYPE 
system  should be the ultimate goal. 

> 
> But we can already do:
> 
> auto p = flag ? &aa : &bb;
> 
> and this already works perfectly:
> 
> https://godbolt.org/z/rvb6xWWPj
> 
> We can also name the variably-modifed type: 
> 
> char (*p)[flag ? n1 : n2] = flag ? &aa : &bb;
> https://godbolt.org/z/13cTT1vGP
> 
> The problem with this version is that consistency
> is not checked. (I have patch for adding run-time
> checks).
> 
> And then the next step would be to allow
> 
> char (*p)[:] = flag ? &aa : &bb;
> 
> or similar.  Dennis Ritchie proposed this himself
> a long time ago.
> 
> So far this seems straightfoward.
> 
> If we then want to allow such wide pointers as
> function arguments or in structs, we would need
> to define an ABI. But the ABI could just be
> 
> struct { char (*p)[.s]; size_t s; };
> 
> Maybe we could try to make the following
> ABI compatible:
> 
> int foo(int p[s], size_t s);
> int foo(int p[:]);
> 
> 
>> Having __builtin_with_size at allocation would possibly make
>> the BOS use-def walk discover both objects.
> 
> Yes. But I do not think this there is any fundamental
> difference to discovering allocation functions.
> 
>>  I think you can't
>> insert __builtin_with_size at the access to *p, but in practice
>> that would be very much needed.
> 
> Usually the access to *p would follow directly the
> access x.buf, so BDOS should find it.
> 
> But yes, to get full bounds safety, the pointer type 
> has to change to a variably-modified type (which would work
> today) or a fat pointer type.

By variable-modified type, you mean the VLA?

There is one major difference between VLA and (FAM or Pointer array):

For VLA, the compiler is responsible for allocating the memory for it, 
the size assignment and the memory allocation are both done by the
 compiler at the same time and tied together. 

But for FAM and pointer arrays, right now, users allocate the memory for them
In the source code, so, when we add the “counted_by” attribute, we need to
specify the additional requirement for the order of size assignment and memory
allocation into the source code, and specify this requirement in the user documentation.

Later, if we try to make the bound information of FAM/pointer array into TYPE 
system, similar as the current VLA, should we also need to move the memory allocation 
of the FAM/pointer arrays into compiler (similar as VLA too)? 
> The later can be built on
> vm-types easily because all the FE semantics already
> exists.

Except the memory allocation part…

Do I miss anything here?

Qing
> 
> Martin
> 
>> 
>> Richard.
>> 
>>> But yes,
>>> for full bounds safety we would need the language feature.
>>> In C people should start to variably-modified types
>>> more.  I think we can build perfect bounds safety on top of
>>> them in a very good way with only FE changes.
>>> 
>>> All these attributes are just a best effort.  But for a while,
>>> this will be necessary.
>>> 
>>> Martin
>>> 
>>>> 
>>>>> Evaluating at this point requires that the size is correctly set
>>>>> before the access to the FAM and the user has to make sure
>>>>> this is the case. But to me this requirement would make sense.
>>>>> 
>>>>> Semantically, it could aöso make sense to evaluate the size at a
>>>>> later time.  But then the reordering becomes problematic again.
>>>>> 
>>>>> Also I think this would make this feature generally more useful.
>>>>> For example, it could work also for others pointers in the struct
>>>>> and not just for FAMs.  In this case, the struct may already be
>>>>> freed when  BDOS is called, so it might also not possible to
>>>>> access the size member at a later time.
>>>>> 
>>>>> Martin
>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>> 
>
Martin Uecker Oct. 26, 2023, 4:45 p.m. UTC | #86
Am Donnerstag, dem 26.10.2023 um 09:13 -0700 schrieb Kees Cook:
> On Thu, Oct 26, 2023 at 10:15:10AM +0200, Martin Uecker wrote:
> > but not this:
> > 

x->count = 11;
> > char *p = &x->buf;
> > x->count = 1;
> > p[10] = 1; // !
> 
> This seems fine to me -- it's how I'd expect it to work: "10" is beyond
> "1".

Note that the store would be allowed.

> 
> > (because the pointer is passed around the
> > store to the counter)
> > 
> > and also here the second store is then irrelevant
> > for the access:
> > 
> > x->count = 10;
> > char* p = &x->buf;
> > ...
> > x->count = 1; // somewhere else
> > ----
> > p[9] = 1; // ok, because count matter when buf was accesssed.
> 
> This is less great, but I can understand why it happens. "p" loses the
> association with "x". It'd be nice if "p" had to way to retain that it
> was just an alias for x->buf, so future p access would check count.

The problem is not to discover that p is an alias to x->buf, 
but that it seems difficult to make sure that stores to 
x->count are not reordered relative to the final access to
p[i] you want to check, so that you then get the right value.

> 
> But this appears to be an existing limitation in other areas where an
> assignment will cause the loss of object association. (I've run into
> this before.) It's just more surprising in the above example because in
> the past the loss of association would cause __bdos() to revert back to
> "SIZE_MAX" results ("I don't know the size") rather than an "outdated"
> size, which may get us into unexpected places...
> 
> > IMHO this makes sense also from the user side and
> > are the desirable semantics we discussed before.
> > 
> > But can you take a look at this?
> > 
> > 
> > This should simulate it fairly well:
> > https://godbolt.org/z/xq89aM7Gr
> > 
> > (the call to the noinline function would go away,
> > but not necessarily its impact on optimization)
> 
> Yeah, this example should be a very rare situation: a leaf function is
> changing the characteristics of the struct but returning a buffer within
> it to the caller. The more likely glitch would be from:
> 
> int main()
> {
> 	struct foo *f = foo_alloc(7);
> 	char *p = FAM_ACCESS(f, size, buf);
> 
> 	printf("%ld\n", __builtin_dynamic_object_size(p, 0));
> 	test1(f); // or just "f->count = 10;" no function call needed
> 	printf("%ld\n", __builtin_dynamic_object_size(p, 0));
> 
> 	return 0;
> }
> 
> which reports:
> 7
> 7
> 
> instead of:
> 7
> 10
> 
> This kind of "get an alias" situation is pretty common in the kernel
> as a way to have a convenient "handle" to the array. In the case of a
> "fill the array without knowing the actual final size" code pattern,
> things would immediately break:
> 
> 	struct foo *f;
> 	char *p;
> 	int i;
> 
> 	f = alloc(maximum_possible);
> 	f->count = 0;
> 	p = f->buf;
> 
> 	for (i; data_is_available() && i < maximum_possible; i++) {
> 		f->count ++;
> 		p[i] = next_data_item();
> 	}
> 
> Now perhaps the problem here is that "count" cannot be used for a count
> of "logically valid members in the array" but must always be a count of
> "allocated member space in the array", which I guess is tolerable, but
> isn't ideal -- I'd like to catch logic bugs in addition to allocation
> bugs, but the latter is certainly much more important to catch.

Maybe we could have a warning when f->buf is not directly
accessed.

Martin

>
Martin Uecker Oct. 26, 2023, 5:05 p.m. UTC | #87
Am Donnerstag, dem 26.10.2023 um 16:41 +0000 schrieb Qing Zhao:
> 
> > On Oct 26, 2023, at 5:20 AM, Martin Uecker <uecker@tugraz.at> wrote:
> > 
> > Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
> > > On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker <uecker@tugraz.at> wrote:
> > > > 
> > > > Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
> > > > > 
> > > > > > Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
> > > > > > 
> > > > > > Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
> > > > > > > > On 2023-10-25 04:16, Martin Uecker wrote:
> > > > > > > > Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> > > > > > > > > 
> > > > > > > > > > Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
> > > > > > > > > > 
> > > > > > > > > > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> > > > > > > > > > > Hi, Sid,
> > > > > > > > > > > 
> > > > > > > > > > > Really appreciate for your example and detailed explanation. Very helpful.
> > > > > > > > > > > I think that this example is an excellent example to show (almost) all the issues we need to consider.
> > > > > > > > > > > 
> > > > > > > > > > > I slightly modified this example to make it to be compilable and run-able, as following:
> > > > > > > > > > > (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
> > > > > > > > > > > 
> > > > > > > > > > > 1 #include <malloc.h>
> > > > > > > > > > > 2 struct A
> > > > > > > > > > > 3 {
> > > > > > > > > > > 4  size_t size;
> > > > > > > > > > > 5  char buf[] __attribute__((counted_by(size)));
> > > > > > > > > > > 6 };
> > > > > > > > > > > 7
> > > > > > > > > > > 8 static size_t
> > > > > > > > > > > 9 get_size_from (void *ptr)
> > > > > > > > > > > 10 {
> > > > > > > > > > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > > > > > > > > > 12 }
> > > > > > > > > > > 13
> > > > > > > > > > > 14 void
> > > > > > > > > > > 15 foo (size_t sz)
> > > > > > > > > > > 16 {
> > > > > > > > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > > > > > > > > > 18  obj->size = sz;
> > > > > > > > > > > 19  obj->buf[0] = 2;
> > > > > > > > > > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > > > > > > > > > 21  return;
> > > > > > > > > > > 22 }
> > > > > > > > > > > 23
> > > > > > > > > > > 24 int main ()
> > > > > > > > > > > 25 {
> > > > > > > > > > > 26  foo (20);
> > > > > > > > > > > 27  return 0;
> > > > > > > > > > > 28 }
> > > > > > > > > > > 
> > > > > > > 
> > > > > > > <snip>
> > > > > > > 
> > > > > > > > > When it’s set I suppose.  Turn
> > > > > > > > > 
> > > > > > > > > X.l = n;
> > > > > > > > > 
> > > > > > > > > Into
> > > > > > > > > 
> > > > > > > > > X.l = __builtin_with_size (x.buf, n);
> > > > > > > > 
> > > > > > > > It would turn
> > > > > > > > 
> > > > > > > > some_variable = (&) x.buf
> > > > > > > > 
> > > > > > > > into
> > > > > > > > 
> > > > > > > > some_variable = __builtin_with_size ( (&) x.buf. x.len)
> > > > > > > > 
> > > > > > > > 
> > > > > > > > So the later access to x.buf and not the initialization
> > > > > > > > of a member of the struct (which is too early).
> > > > > > > > 
> > > > > > > 
> > > > > > > Hmm, so with Qing's example above, are you suggesting the transformation
> > > > > > > be to foo like so:
> > > > > > > 
> > > > > > > 14 void
> > > > > > > 15 foo (size_t sz)
> > > > > > > 16 {
> > > > > > > 16.5  void * _1;
> > > > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > > > > > 18  obj->size = sz;
> > > > > > > 19  obj->buf[0] = 2;
> > > > > > > 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
> > > > > > > 20  __builtin_printf (“%d\n", get_size_from (_1));
> > > > > > > 21  return;
> > > > > > > 22 }
> > > > > > > 
> > > > > > > If yes then this could indeed work.  I think I got thrown off by the
> > > > > > > reference to __bdos.
> > > > > > 
> > > > > > Yes. I think it is important not to evaluate the size at the
> > > > > > access to buf and not the allocation, because the point is to
> > > > > > recover it from the size member even when the compiler can't
> > > > > > see the original allocation.
> > > > > 
> > > > > But if the access is through a pointer without the attribute visible
> > > > > even the Frontend cannot recover?
> > > > 
> > > > Yes, if the access is using a struct-with-FAM without the attribute
> > > > the FE would not be insert the builtin.  BDOS could potentially
> > > > still see the original allocation but if it doesn't, then there is
> > > > no information.
> > > > 
> > > > > We’d need to force type correctness and give up on indirecting
> > > > > through an int * when it can refer to two diffenent container types.
> > > > > The best we can do I think is mark allocation sites and hope for
> > > > > some basic code hygiene (not clobbering size or array pointer
> > > > > through pointers without the appropriately attributed type)
> > > > 
> > > > I am do not fully understand what you are referring to.
> > > 
> > > struct A { int n; int data[n]; };
> > > struct B { long n; int data[n]; };
> > > 
> > > int *p = flag ? a->data : b->data;
> > > 
> > > access *p;
> > > 
> > > Since we need to allow interoperability of pointers (a->data is
> > > convertible to a non-fat pointer of type int *) this leaves us with
> > > ambiguity we need to conservatively handle to avoid false positives.
> > 
> > For BDOS, I would expect this to work exactly like:
> > 
> > char aa[n1];
> > char bb[n2];
> > char *p = flag ? aa : bb;
> > 
> > (or similar code with malloc). In fact it does:
> > 
> > https://godbolt.org/z/bK68YKqhe
> > (cheating a bit and also the sub-object version of
> > BDOS does not seem to work)
> > 
> > > 
> > > We _might_ want to diagnose decay of a->data to int *, but IIRC
> > > there's no way (or proposal) to allow declaring a corresponding
> > > fat pointer, so it's not a good designed feature.
> > 
> > As a language feature, I fully agree.  I see the
> > counted_by attribute has a makeshift solution.
> 
> The “counted_by” attribute is necessary at this moment since
>  it will be much easier to be adopted by the existing source code,
>  for example, the Linux Kernel. 

Yes, this is understood.

> 
> Though I agree that embedding the bound information into TYPE 
> system  should be the ultimate goal. 
> 
> > 
> > But we can already do:
> > 
> > auto p = flag ? &aa : &bb;
> > 
> > and this already works perfectly:
> > 
> > https://godbolt.org/z/rvb6xWWPj
> > 
> > We can also name the variably-modifed type: 
> > 
> > char (*p)[flag ? n1 : n2] = flag ? &aa : &bb;
> > https://godbolt.org/z/13cTT1vGP
> > 
> > The problem with this version is that consistency
> > is not checked. (I have patch for adding run-time
> > checks).
> > 
> > And then the next step would be to allow
> > 
> > char (*p)[:] = flag ? &aa : &bb;
> > 
> > or similar.  Dennis Ritchie proposed this himself
> > a long time ago.
> > 
> > So far this seems straightfoward.
> > 
> > If we then want to allow such wide pointers as
> > function arguments or in structs, we would need
> > to define an ABI. But the ABI could just be
> > 
> > struct { char (*p)[.s]; size_t s; };
> > 
> > Maybe we could try to make the following
> > ABI compatible:
> > 
> > int foo(int p[s], size_t s);
> > int foo(int p[:]);
> > 
> > 
> > > Having __builtin_with_size at allocation would possibly make
> > > the BOS use-def walk discover both objects.
> > 
> > Yes. But I do not think this there is any fundamental
> > difference to discovering allocation functions.
> > 
> > >  I think you can't
> > > insert __builtin_with_size at the access to *p, but in practice
> > > that would be very much needed.
> > 
> > Usually the access to *p would follow directly the
> > access x.buf, so BDOS should find it.
> > 
> > But yes, to get full bounds safety, the pointer type 
> > has to change to a variably-modified type (which would work
> > today) or a fat pointer type.
> 
> By variable-modified type, you mean the VLA?

I mean a pointer to a VLA type.

> 
> There is one major difference between VLA and (FAM or Pointer array):
> 
> For VLA, the compiler is responsible for allocating the memory for it, 
> the size assignment and the memory allocation are both done by the
>  compiler at the same time and tied together. 

A VLA can also exist on the heap:

char (*buf)[n] = malloc(sizeof(*buf));

> 
> But for FAM and pointer arrays, right now, users allocate the memory for them
> In the source code, so, when we add the “counted_by” attribute, we need to
> specify the additional requirement for the order of size assignment and memory
> allocation into the source code, and specify this requirement in the user documentation.
> 
> Later, if we try to make the bound information of FAM/pointer array into TYPE 
> system, similar as the current VLA, should we also need to move the memory allocation 
> of the FAM/pointer arrays into compiler (similar as VLA too)? 

I think memory allocation can be done either
as an automatic variable or by malloc.

The following works today in GNU C:

int N = ..;
struct foo { char buf[N]; } x;
struct foo *p = malloc(sizeof(struct foo));

The only limitation today is that the size 'n' 
can not refer to the field member.

struct foo { int n; char buf[.n]; };

I am not yet sure how we would set the size for
an automatic object, but I have some ideas.  Maybe
simply using an initializer:

struct foo x = { .n = 10 };


Martin

> > The later can be built on
> > vm-types easily because all the FE semantics already
> > exists.
> 
> Except the memory allocation part…
> 
> Do I miss anything here?
> 
> Qing
> > 
> > Martin
> > 
> > > 
> > > Richard.
> > > 
> > > > But yes,
> > > > for full bounds safety we would need the language feature.
> > > > In C people should start to variably-modified types
> > > > more.  I think we can build perfect bounds safety on top of
> > > > them in a very good way with only FE changes.
> > > > 
> > > > All these attributes are just a best effort.  But for a while,
> > > > this will be necessary.
> > > > 
> > > > Martin
> > > > 
> > > > > 
> > > > > > Evaluating at this point requires that the size is correctly set
> > > > > > before the access to the FAM and the user has to make sure
> > > > > > this is the case. But to me this requirement would make sense.
> > > > > > 
> > > > > > Semantically, it could aöso make sense to evaluate the size at a
> > > > > > later time.  But then the reordering becomes problematic again.
> > > > > > 
> > > > > > Also I think this would make this feature generally more useful.
> > > > > > For example, it could work also for others pointers in the struct
> > > > > > and not just for FAMs.  In this case, the struct may already be
> > > > > > freed when  BDOS is called, so it might also not possible to
> > > > > > access the size member at a later time.
> > > > > > 
> > > > > > Martin
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > 
> > > > 
> > 
>
Richard Biener Oct. 26, 2023, 5:35 p.m. UTC | #88
> Am 26.10.2023 um 19:05 schrieb Martin Uecker <uecker@tugraz.at>:
> 
> Am Donnerstag, dem 26.10.2023 um 16:41 +0000 schrieb Qing Zhao:
>> 
>>>> On Oct 26, 2023, at 5:20 AM, Martin Uecker <uecker@tugraz.at> wrote:
>>> 
>>> Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
>>>> On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker <uecker@tugraz.at> wrote:
>>>>> 
>>>>> Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
>>>>>> 
>>>>>>> Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>>>> 
>>>>>>> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>>>>>>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>>>>>>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>>>>>>>> 
>>>>>>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>>>>>>>> 
>>>>>>>>>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>>>>>>>>>> Hi, Sid,
>>>>>>>>>>>> 
>>>>>>>>>>>> Really appreciate for your example and detailed explanation. Very helpful.
>>>>>>>>>>>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>>>>>>>>>>>> 
>>>>>>>>>>>> I slightly modified this example to make it to be compilable and run-able, as following:
>>>>>>>>>>>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>>>>>>>>>>>> 
>>>>>>>>>>>> 1 #include <malloc.h>
>>>>>>>>>>>> 2 struct A
>>>>>>>>>>>> 3 {
>>>>>>>>>>>> 4  size_t size;
>>>>>>>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>>>>>>>> 6 };
>>>>>>>>>>>> 7
>>>>>>>>>>>> 8 static size_t
>>>>>>>>>>>> 9 get_size_from (void *ptr)
>>>>>>>>>>>> 10 {
>>>>>>>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>>>>>>>> 12 }
>>>>>>>>>>>> 13
>>>>>>>>>>>> 14 void
>>>>>>>>>>>> 15 foo (size_t sz)
>>>>>>>>>>>> 16 {
>>>>>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>>>>>>>> 18  obj->size = sz;
>>>>>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>>>>>>>> 21  return;
>>>>>>>>>>>> 22 }
>>>>>>>>>>>> 23
>>>>>>>>>>>> 24 int main ()
>>>>>>>>>>>> 25 {
>>>>>>>>>>>> 26  foo (20);
>>>>>>>>>>>> 27  return 0;
>>>>>>>>>>>> 28 }
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> <snip>
>>>>>>>> 
>>>>>>>>>> When it’s set I suppose.  Turn
>>>>>>>>>> 
>>>>>>>>>> X.l = n;
>>>>>>>>>> 
>>>>>>>>>> Into
>>>>>>>>>> 
>>>>>>>>>> X.l = __builtin_with_size (x.buf, n);
>>>>>>>>> 
>>>>>>>>> It would turn
>>>>>>>>> 
>>>>>>>>> some_variable = (&) x.buf
>>>>>>>>> 
>>>>>>>>> into
>>>>>>>>> 
>>>>>>>>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> So the later access to x.buf and not the initialization
>>>>>>>>> of a member of the struct (which is too early).
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hmm, so with Qing's example above, are you suggesting the transformation
>>>>>>>> be to foo like so:
>>>>>>>> 
>>>>>>>> 14 void
>>>>>>>> 15 foo (size_t sz)
>>>>>>>> 16 {
>>>>>>>> 16.5  void * _1;
>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>>>> 18  obj->size = sz;
>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (_1));
>>>>>>>> 21  return;
>>>>>>>> 22 }
>>>>>>>> 
>>>>>>>> If yes then this could indeed work.  I think I got thrown off by the
>>>>>>>> reference to __bdos.
>>>>>>> 
>>>>>>> Yes. I think it is important not to evaluate the size at the
>>>>>>> access to buf and not the allocation, because the point is to
>>>>>>> recover it from the size member even when the compiler can't
>>>>>>> see the original allocation.
>>>>>> 
>>>>>> But if the access is through a pointer without the attribute visible
>>>>>> even the Frontend cannot recover?
>>>>> 
>>>>> Yes, if the access is using a struct-with-FAM without the attribute
>>>>> the FE would not be insert the builtin.  BDOS could potentially
>>>>> still see the original allocation but if it doesn't, then there is
>>>>> no information.
>>>>> 
>>>>>> We’d need to force type correctness and give up on indirecting
>>>>>> through an int * when it can refer to two diffenent container types.
>>>>>> The best we can do I think is mark allocation sites and hope for
>>>>>> some basic code hygiene (not clobbering size or array pointer
>>>>>> through pointers without the appropriately attributed type)
>>>>> 
>>>>> I am do not fully understand what you are referring to.
>>>> 
>>>> struct A { int n; int data[n]; };
>>>> struct B { long n; int data[n]; };
>>>> 
>>>> int *p = flag ? a->data : b->data;
>>>> 
>>>> access *p;
>>>> 
>>>> Since we need to allow interoperability of pointers (a->data is
>>>> convertible to a non-fat pointer of type int *) this leaves us with
>>>> ambiguity we need to conservatively handle to avoid false positives.
>>> 
>>> For BDOS, I would expect this to work exactly like:
>>> 
>>> char aa[n1];
>>> char bb[n2];
>>> char *p = flag ? aa : bb;
>>> 
>>> (or similar code with malloc). In fact it does:
>>> 
>>> https://godbolt.org/z/bK68YKqhe
>>> (cheating a bit and also the sub-object version of
>>> BDOS does not seem to work)
>>> 
>>>> 
>>>> We _might_ want to diagnose decay of a->data to int *, but IIRC
>>>> there's no way (or proposal) to allow declaring a corresponding
>>>> fat pointer, so it's not a good designed feature.
>>> 
>>> As a language feature, I fully agree.  I see the
>>> counted_by attribute has a makeshift solution.
>> 
>> The “counted_by” attribute is necessary at this moment since
>> it will be much easier to be adopted by the existing source code,
>> for example, the Linux Kernel. 
> 
> Yes, this is understood.
> 
>> 
>> Though I agree that embedding the bound information into TYPE 
>> system  should be the ultimate goal. 
>> 
>>> 
>>> But we can already do:
>>> 
>>> auto p = flag ? &aa : &bb;
>>> 
>>> and this already works perfectly:
>>> 
>>> https://godbolt.org/z/rvb6xWWPj
>>> 
>>> We can also name the variably-modifed type: 
>>> 
>>> char (*p)[flag ? n1 : n2] = flag ? &aa : &bb;
>>> https://godbolt.org/z/13cTT1vGP
>>> 
>>> The problem with this version is that consistency
>>> is not checked. (I have patch for adding run-time
>>> checks).
>>> 
>>> And then the next step would be to allow
>>> 
>>> char (*p)[:] = flag ? &aa : &bb;
>>> 
>>> or similar.  Dennis Ritchie proposed this himself
>>> a long time ago.
>>> 
>>> So far this seems straightfoward.
>>> 
>>> If we then want to allow such wide pointers as
>>> function arguments or in structs, we would need
>>> to define an ABI. But the ABI could just be
>>> 
>>> struct { char (*p)[.s]; size_t s; };
>>> 
>>> Maybe we could try to make the following
>>> ABI compatible:
>>> 
>>> int foo(int p[s], size_t s);
>>> int foo(int p[:]);
>>> 
>>> 
>>>> Having __builtin_with_size at allocation would possibly make
>>>> the BOS use-def walk discover both objects.
>>> 
>>> Yes. But I do not think this there is any fundamental
>>> difference to discovering allocation functions.
>>> 
>>>> I think you can't
>>>> insert __builtin_with_size at the access to *p, but in practice
>>>> that would be very much needed.
>>> 
>>> Usually the access to *p would follow directly the
>>> access x.buf, so BDOS should find it.
>>> 
>>> But yes, to get full bounds safety, the pointer type 
>>> has to change to a variably-modified type (which would work
>>> today) or a fat pointer type.
>> 
>> By variable-modified type, you mean the VLA?
> 
> I mean a pointer to a VLA type.
> 
>> 
>> There is one major difference between VLA and (FAM or Pointer array):
>> 
>> For VLA, the compiler is responsible for allocating the memory for it, 
>> the size assignment and the memory allocation are both done by the
>> compiler at the same time and tied together. 
> 
> A VLA can also exist on the heap:
> 
> char (*buf)[n] = malloc(sizeof(*buf));
> 
>> 
>> But for FAM and pointer arrays, right now, users allocate the memory for them
>> In the source code, so, when we add the “counted_by” attribute, we need to
>> specify the additional requirement for the order of size assignment and memory
>> allocation into the source code, and specify this requirement in the user documentation.
>> 
>> Later, if we try to make the bound information of FAM/pointer array into TYPE 
>> system, similar as the current VLA, should we also need to move the memory allocation 
>> of the FAM/pointer arrays into compiler (similar as VLA too)? 
> 
> I think memory allocation can be done either
> as an automatic variable or by malloc.
> 
> The following works today in GNU C:
> 
> int N = ..;
> struct foo { char buf[N]; } x;
> struct foo *p = malloc(sizeof(struct foo));
> 
> The only limitation today is that the size 'n' 
> can not refer to the field member.
> 
> struct foo { int n; char buf[.n]; };

Note the middle end is perfectly capable of
This since at least Ada extensively supports this kind of type layout.  You just need to extend the parser to accept it (and properly declare the FIELD_DECLS).  Of course it’s only half of the story since int[n] still decays to int * losing the information again.

Richard 

> I am not yet sure how we would set the size for
> an automatic object, but I have some ideas.  Maybe
> simply using an initializer:
> 
> struct foo x = { .n = 10 };
> 
> 
> Martin
> 
>>> The later can be built on
>>> vm-types easily because all the FE semantics already
>>> exists.
>> 
>> Except the memory allocation part…
>> 
>> Do I miss anything here?
>> 
>> Qing
>>> 
>>> Martin
>>> 
>>>> 
>>>> Richard.
>>>> 
>>>>> But yes,
>>>>> for full bounds safety we would need the language feature.
>>>>> In C people should start to variably-modified types
>>>>> more.  I think we can build perfect bounds safety on top of
>>>>> them in a very good way with only FE changes.
>>>>> 
>>>>> All these attributes are just a best effort.  But for a while,
>>>>> this will be necessary.
>>>>> 
>>>>> Martin
>>>>> 
>>>>>> 
>>>>>>> Evaluating at this point requires that the size is correctly set
>>>>>>> before the access to the FAM and the user has to make sure
>>>>>>> this is the case. But to me this requirement would make sense.
>>>>>>> 
>>>>>>> Semantically, it could aöso make sense to evaluate the size at a
>>>>>>> later time.  But then the reordering becomes problematic again.
>>>>>>> 
>>>>>>> Also I think this would make this feature generally more useful.
>>>>>>> For example, it could work also for others pointers in the struct
>>>>>>> and not just for FAMs.  In this case, the struct may already be
>>>>>>> freed when  BDOS is called, so it might also not possible to
>>>>>>> access the size member at a later time.
>>>>>>> 
>>>>>>> Martin
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> 
>
Qing Zhao Oct. 26, 2023, 6:54 p.m. UTC | #89
> On Oct 26, 2023, at 10:05 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> 
> 
>> Am 26.10.2023 um 12:14 schrieb Martin Uecker <uecker@tugraz.at>:
>> 
>> Am Donnerstag, dem 26.10.2023 um 11:20 +0200 schrieb Martin Uecker:
>>>> Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
>>>> On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker <uecker@tugraz.at> wrote:
>>>>> 
>>>>> Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
>>>>>> 
>>>>>>> Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>>>> 
>>>>>>> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>>>>>>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>>>>>>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>>>>>>>> 
>>>>>>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>>>>>>>> 
>>>>>>>>>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>>>>>>>>>> Hi, Sid,
>>>>>>>>>>>> 
>>>>>>>>>>>> Really appreciate for your example and detailed explanation. Very helpful.
>>>>>>>>>>>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>>>>>>>>>>>> 
>>>>>>>>>>>> I slightly modified this example to make it to be compilable and run-able, as following:
>>>>>>>>>>>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>>>>>>>>>>>> 
>>>>>>>>>>>> 1 #include <malloc.h>
>>>>>>>>>>>> 2 struct A
>>>>>>>>>>>> 3 {
>>>>>>>>>>>> 4  size_t size;
>>>>>>>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>>>>>>>> 6 };
>>>>>>>>>>>> 7
>>>>>>>>>>>> 8 static size_t
>>>>>>>>>>>> 9 get_size_from (void *ptr)
>>>>>>>>>>>> 10 {
>>>>>>>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>>>>>>>> 12 }
>>>>>>>>>>>> 13
>>>>>>>>>>>> 14 void
>>>>>>>>>>>> 15 foo (size_t sz)
>>>>>>>>>>>> 16 {
>>>>>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>>>>>>>> 18  obj->size = sz;
>>>>>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>>>>>>>> 21  return;
>>>>>>>>>>>> 22 }
>>>>>>>>>>>> 23
>>>>>>>>>>>> 24 int main ()
>>>>>>>>>>>> 25 {
>>>>>>>>>>>> 26  foo (20);
>>>>>>>>>>>> 27  return 0;
>>>>>>>>>>>> 28 }
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> <snip>
>>>>>>>> 
>>>>>>>>>> When it’s set I suppose.  Turn
>>>>>>>>>> 
>>>>>>>>>> X.l = n;
>>>>>>>>>> 
>>>>>>>>>> Into
>>>>>>>>>> 
>>>>>>>>>> X.l = __builtin_with_size (x.buf, n);
>>>>>>>>> 
>>>>>>>>> It would turn
>>>>>>>>> 
>>>>>>>>> some_variable = (&) x.buf
>>>>>>>>> 
>>>>>>>>> into
>>>>>>>>> 
>>>>>>>>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> So the later access to x.buf and not the initialization
>>>>>>>>> of a member of the struct (which is too early).
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hmm, so with Qing's example above, are you suggesting the transformation
>>>>>>>> be to foo like so:
>>>>>>>> 
>>>>>>>> 14 void
>>>>>>>> 15 foo (size_t sz)
>>>>>>>> 16 {
>>>>>>>> 16.5  void * _1;
>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>>>> 18  obj->size = sz;
>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (_1));
>>>>>>>> 21  return;
>>>>>>>> 22 }
>>>>>>>> 
>>>>>>>> If yes then this could indeed work.  I think I got thrown off by the
>>>>>>>> reference to __bdos.
>>>>>>> 
>>>>>>> Yes. I think it is important not to evaluate the size at the
>>>>>>> access to buf and not the allocation, because the point is to
>>>>>>> recover it from the size member even when the compiler can't
>>>>>>> see the original allocation.
>>>>>> 
>>>>>> But if the access is through a pointer without the attribute visible
>>>>>> even the Frontend cannot recover?
>>>>> 
>>>>> Yes, if the access is using a struct-with-FAM without the attribute
>>>>> the FE would not be insert the builtin.  BDOS could potentially
>>>>> still see the original allocation but if it doesn't, then there is
>>>>> no information.
>>>>> 
>>>>>> We’d need to force type correctness and give up on indirecting
>>>>>> through an int * when it can refer to two diffenent container types.
>>>>>> The best we can do I think is mark allocation sites and hope for
>>>>>> some basic code hygiene (not clobbering size or array pointer
>>>>>> through pointers without the appropriately attributed type)
>>>>> 
>>>>> I am do not fully understand what you are referring to.
>>>> 
>>>> struct A { int n; int data[n]; };
>>>> struct B { long n; int data[n]; };
>>>> 
>>>> int *p = flag ? a->data : b->data;
>>>> 
>>>> access *p;
>>>> 
>>>> Since we need to allow interoperability of pointers (a->data is
>>>> convertible to a non-fat pointer of type int *) this leaves us with
>>>> ambiguity we need to conservatively handle to avoid false positives.
>>> 
>>> For BDOS, I would expect this to work exactly like:
>>> 
>>> char aa[n1];
>>> char bb[n2];
>>> char *p = flag ? aa : bb;
>>> 
>>> (or similar code with malloc). In fact it does:
>>> 
>>> https://godbolt.org/z/bK68YKqhe
>>> (cheating a bit and also the sub-object version of
>>> BDOS does not seem to work)
>>> 
>>>> 
>>>> We _might_ want to diagnose decay of a->data to int *, but IIRC
>>>> there's no way (or proposal) to allow declaring a corresponding
>>>> fat pointer, so it's not a good designed feature.
>>> 
>>> As a language feature, I fully agree.  I see the
>>> counted_by attribute has a makeshift solution.
>>> 
>>> But we can already do:
>>> 
>>> auto p = flag ? &aa : &bb;
>>> 
>>> and this already works perfectly:
>>> 
>>> https://godbolt.org/z/rvb6xWWPj
>>> 
>>> We can also name the variably-modifed type: 
>>> 
>>> char (*p)[flag ? n1 : n2] = flag ? &aa : &bb;
>>> https://godbolt.org/z/13cTT1vGP
>>> 
>>> The problem with this version is that consistency
>>> is not checked. (I have patch for adding run-time
>>> checks).
>>> 
>>> And then the next step would be to allow
>>> 
>>> char (*p)[:] = flag ? &aa : &bb;
>>> 
>>> or similar.  Dennis Ritchie proposed this himself
>>> a long time ago.
>>> 
>>> So far this seems straightfoward.
>>> 
>>> If we then want to allow such wide pointers as
>>> function arguments or in structs, we would need
>>> to define an ABI. But the ABI could just be
>>> 
>>> struct { char (*p)[.s]; size_t s; };
>>> 
>>> Maybe we could try to make the following
>>> ABI compatible:
>>> 
>>> int foo(int p[s], size_t s);
>>> int foo(int p[:]);
>>> 
>>> 
>>>> Having __builtin_with_size at allocation would possibly make
>>>> the BOS use-def walk discover both objects.
>>> 
>>> Yes. But I do not think this there is any fundamental
>>> difference to discovering allocation functions.
>>> 
>>>> I think you can't
>>>> insert __builtin_with_size at the access to *p, but in practice
>>>> that would be very much needed.
>>> 
>>> Usually the access to *p would follow directly the
>>> access x.buf, so BDOS should find it.
>>> 
>>> But yes, to get full bounds safety, the pointer type 
>>> has to change to a variably-modified type (which would work
>>> today) or a fat pointer type. The later can be built on
>>> vm-types easily because all the FE semantics already
>>> exists.
>> 
>> We could insert the __builtin_with_size everywhere
>> we have to convert a wide pointer or let an array
>> decay to traditional pointer for reason of compatibility 
>> with legacy code.
> 
> That sounds like a nice idea.  Note I’d like to see the consumer side implemented so we can play with different points of insertion (and I’ll try to show corner cases where it goes wrong).

Giving the example I mentioned previously:

  1 #include <malloc.h>
  2 struct A
  3 {
  4  size_t size;
  5  char buf[] __attribute__((counted_by(size)));
  6 };
  7 
  8 static size_t
  9 get_size_from (void *ptr)
 10 {
 11  return __builtin_dynamic_object_size (ptr, 1);
 12 }
 13 
 14 void
 15 foo (size_t sz)
 16 {
 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 18  obj->size = sz;
 19  obj->buf[0] = 2;
 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
 21  return;
 22 }

So, the different points of insertion the new __builtin_with_size in FE include the following points: (per my understanding so far)

Point 1. When the “obj->buf” is referenced at line 19, and line 20?
Point 2. When the “obj” is allocated at line 17? 

Are these correct?

Any other points we need to consider?


>  It all seems a bit late for GCC 14 though.

Yes, I agree.  Given the potential impact on the code generation and other potential issues, it’s better to be put in the early stage of the next release.

Qing
> 
> Richard 
> 
>> Martin
>> 
>>> 
>>> Martin
>>> 
>>>> 
>>>> Richard.
>>>> 
>>>>> But yes,
>>>>> for full bounds safety we would need the language feature.
>>>>> In C people should start to variably-modified types
>>>>> more.  I think we can build perfect bounds safety on top of
>>>>> them in a very good way with only FE changes.
>>>>> 
>>>>> All these attributes are just a best effort.  But for a while,
>>>>> this will be necessary.
>>>>> 
>>>>> Martin
>>>>> 
>>>>>> 
>>>>>>> Evaluating at this point requires that the size is correctly set
>>>>>>> before the access to the FAM and the user has to make sure
>>>>>>> this is the case. But to me this requirement would make sense.
>>>>>>> 
>>>>>>> Semantically, it could aöso make sense to evaluate the size at a
>>>>>>> later time.  But then the reordering becomes problematic again.
>>>>>>> 
>>>>>>> Also I think this would make this feature generally more useful.
>>>>>>> For example, it could work also for others pointers in the struct
>>>>>>> and not just for FAMs.  In this case, the struct may already be
>>>>>>> freed when  BDOS is called, so it might also not possible to
>>>>>>> access the size member at a later time.
>>>>>>> 
>>>>>>> Martin
Qing Zhao Oct. 26, 2023, 7:20 p.m. UTC | #90
> On Oct 26, 2023, at 1:05 PM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Donnerstag, dem 26.10.2023 um 16:41 +0000 schrieb Qing Zhao:
>> 
>>> On Oct 26, 2023, at 5:20 AM, Martin Uecker <uecker@tugraz.at> wrote:
>>> 
>>> Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
>>>> On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker <uecker@tugraz.at> wrote:
>>>>> 
>>>>> Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
>>>>>> 
>>>>>>> Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>>>> 
>>>>>>> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>>>>>>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>>>>>>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>>>>>>>> 
>>>>>>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>>>>>>>> 
>>>>>>>>>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>>>>>>>>>> Hi, Sid,
>>>>>>>>>>>> 
>>>>>>>>>>>> Really appreciate for your example and detailed explanation. Very helpful.
>>>>>>>>>>>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>>>>>>>>>>>> 
>>>>>>>>>>>> I slightly modified this example to make it to be compilable and run-able, as following:
>>>>>>>>>>>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>>>>>>>>>>>> 
>>>>>>>>>>>> 1 #include <malloc.h>
>>>>>>>>>>>> 2 struct A
>>>>>>>>>>>> 3 {
>>>>>>>>>>>> 4  size_t size;
>>>>>>>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>>>>>>>> 6 };
>>>>>>>>>>>> 7
>>>>>>>>>>>> 8 static size_t
>>>>>>>>>>>> 9 get_size_from (void *ptr)
>>>>>>>>>>>> 10 {
>>>>>>>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>>>>>>>> 12 }
>>>>>>>>>>>> 13
>>>>>>>>>>>> 14 void
>>>>>>>>>>>> 15 foo (size_t sz)
>>>>>>>>>>>> 16 {
>>>>>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>>>>>>>> 18  obj->size = sz;
>>>>>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>>>>>>>> 21  return;
>>>>>>>>>>>> 22 }
>>>>>>>>>>>> 23
>>>>>>>>>>>> 24 int main ()
>>>>>>>>>>>> 25 {
>>>>>>>>>>>> 26  foo (20);
>>>>>>>>>>>> 27  return 0;
>>>>>>>>>>>> 28 }
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> <snip>
>>>>>>>> 
>>>>>>>>>> When it’s set I suppose.  Turn
>>>>>>>>>> 
>>>>>>>>>> X.l = n;
>>>>>>>>>> 
>>>>>>>>>> Into
>>>>>>>>>> 
>>>>>>>>>> X.l = __builtin_with_size (x.buf, n);
>>>>>>>>> 
>>>>>>>>> It would turn
>>>>>>>>> 
>>>>>>>>> some_variable = (&) x.buf
>>>>>>>>> 
>>>>>>>>> into
>>>>>>>>> 
>>>>>>>>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> So the later access to x.buf and not the initialization
>>>>>>>>> of a member of the struct (which is too early).
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hmm, so with Qing's example above, are you suggesting the transformation
>>>>>>>> be to foo like so:
>>>>>>>> 
>>>>>>>> 14 void
>>>>>>>> 15 foo (size_t sz)
>>>>>>>> 16 {
>>>>>>>> 16.5  void * _1;
>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>>>> 18  obj->size = sz;
>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (_1));
>>>>>>>> 21  return;
>>>>>>>> 22 }
>>>>>>>> 
>>>>>>>> If yes then this could indeed work.  I think I got thrown off by the
>>>>>>>> reference to __bdos.
>>>>>>> 
>>>>>>> Yes. I think it is important not to evaluate the size at the
>>>>>>> access to buf and not the allocation, because the point is to
>>>>>>> recover it from the size member even when the compiler can't
>>>>>>> see the original allocation.
>>>>>> 
>>>>>> But if the access is through a pointer without the attribute visible
>>>>>> even the Frontend cannot recover?
>>>>> 
>>>>> Yes, if the access is using a struct-with-FAM without the attribute
>>>>> the FE would not be insert the builtin.  BDOS could potentially
>>>>> still see the original allocation but if it doesn't, then there is
>>>>> no information.
>>>>> 
>>>>>> We’d need to force type correctness and give up on indirecting
>>>>>> through an int * when it can refer to two diffenent container types.
>>>>>> The best we can do I think is mark allocation sites and hope for
>>>>>> some basic code hygiene (not clobbering size or array pointer
>>>>>> through pointers without the appropriately attributed type)
>>>>> 
>>>>> I am do not fully understand what you are referring to.
>>>> 
>>>> struct A { int n; int data[n]; };
>>>> struct B { long n; int data[n]; };
>>>> 
>>>> int *p = flag ? a->data : b->data;
>>>> 
>>>> access *p;
>>>> 
>>>> Since we need to allow interoperability of pointers (a->data is
>>>> convertible to a non-fat pointer of type int *) this leaves us with
>>>> ambiguity we need to conservatively handle to avoid false positives.
>>> 
>>> For BDOS, I would expect this to work exactly like:
>>> 
>>> char aa[n1];
>>> char bb[n2];
>>> char *p = flag ? aa : bb;
>>> 
>>> (or similar code with malloc). In fact it does:
>>> 
>>> https://godbolt.org/z/bK68YKqhe
>>> (cheating a bit and also the sub-object version of
>>> BDOS does not seem to work)
>>> 
>>>> 
>>>> We _might_ want to diagnose decay of a->data to int *, but IIRC
>>>> there's no way (or proposal) to allow declaring a corresponding
>>>> fat pointer, so it's not a good designed feature.
>>> 
>>> As a language feature, I fully agree.  I see the
>>> counted_by attribute has a makeshift solution.
>> 
>> The “counted_by” attribute is necessary at this moment since
>> it will be much easier to be adopted by the existing source code,
>> for example, the Linux Kernel. 
> 
> Yes, this is understood.
> 
>> 
>> Though I agree that embedding the bound information into TYPE 
>> system  should be the ultimate goal. 
>> 
>>> 
>>> But we can already do:
>>> 
>>> auto p = flag ? &aa : &bb;
>>> 
>>> and this already works perfectly:
>>> 
>>> https://godbolt.org/z/rvb6xWWPj
>>> 
>>> We can also name the variably-modifed type: 
>>> 
>>> char (*p)[flag ? n1 : n2] = flag ? &aa : &bb;
>>> https://godbolt.org/z/13cTT1vGP
>>> 
>>> The problem with this version is that consistency
>>> is not checked. (I have patch for adding run-time
>>> checks).
>>> 
>>> And then the next step would be to allow
>>> 
>>> char (*p)[:] = flag ? &aa : &bb;
>>> 
>>> or similar.  Dennis Ritchie proposed this himself
>>> a long time ago.
>>> 
>>> So far this seems straightfoward.
>>> 
>>> If we then want to allow such wide pointers as
>>> function arguments or in structs, we would need
>>> to define an ABI. But the ABI could just be
>>> 
>>> struct { char (*p)[.s]; size_t s; };
>>> 
>>> Maybe we could try to make the following
>>> ABI compatible:
>>> 
>>> int foo(int p[s], size_t s);
>>> int foo(int p[:]);
>>> 
>>> 
>>>> Having __builtin_with_size at allocation would possibly make
>>>> the BOS use-def walk discover both objects.
>>> 
>>> Yes. But I do not think this there is any fundamental
>>> difference to discovering allocation functions.
>>> 
>>>> I think you can't
>>>> insert __builtin_with_size at the access to *p, but in practice
>>>> that would be very much needed.
>>> 
>>> Usually the access to *p would follow directly the
>>> access x.buf, so BDOS should find it.
>>> 
>>> But yes, to get full bounds safety, the pointer type 
>>> has to change to a variably-modified type (which would work
>>> today) or a fat pointer type.
>> 
>> By variable-modified type, you mean the VLA?
> 
> I mean a pointer to a VLA type.
> 
>> 
>> There is one major difference between VLA and (FAM or Pointer array):
>> 
>> For VLA, the compiler is responsible for allocating the memory for it, 
>> the size assignment and the memory allocation are both done by the
>> compiler at the same time and tied together. 
> 
> A VLA can also exist on the heap:
> 
> char (*buf)[n] = malloc(sizeof(*buf));

Okay. I see.
> 
>> 
>> But for FAM and pointer arrays, right now, users allocate the memory for them
>> In the source code, so, when we add the “counted_by” attribute, we need to
>> specify the additional requirement for the order of size assignment and memory
>> allocation into the source code, and specify this requirement in the user documentation.
>> 
>> Later, if we try to make the bound information of FAM/pointer array into TYPE 
>> system, similar as the current VLA, should we also need to move the memory allocation 
>> of the FAM/pointer arrays into compiler (similar as VLA too)? 
> 
> I think memory allocation can be done either
> as an automatic variable or by malloc.
> 
> The following works today in GNU C:
> 
> int N = ..;
> struct foo { char buf[N]; } x;
> struct foo *p = malloc(sizeof(struct foo));

Yes, tried this, did work. -:) thanks. 

Qing
> 
> The only limitation today is that the size 'n' 
> can not refer to the field member.
> 
> struct foo { int n; char buf[.n]; };
> 
> I am not yet sure how we would set the size for
> an automatic object, but I have some ideas.  Maybe
> simply using an initializer:
> 
> struct foo x = { .n = 10 };
> 
> 
> Martin
> 
>>> The later can be built on
>>> vm-types easily because all the FE semantics already
>>> exists.
>> 
>> Except the memory allocation part…
>> 
>> Do I miss anything here?
>> 
>> Qing
>>> 
>>> Martin
>>> 
>>>> 
>>>> Richard.
>>>> 
>>>>> But yes,
>>>>> for full bounds safety we would need the language feature.
>>>>> In C people should start to variably-modified types
>>>>> more.  I think we can build perfect bounds safety on top of
>>>>> them in a very good way with only FE changes.
>>>>> 
>>>>> All these attributes are just a best effort.  But for a while,
>>>>> this will be necessary.
>>>>> 
>>>>> Martin
>>>>> 
>>>>>> 
>>>>>>> Evaluating at this point requires that the size is correctly set
>>>>>>> before the access to the FAM and the user has to make sure
>>>>>>> this is the case. But to me this requirement would make sense.
>>>>>>> 
>>>>>>> Semantically, it could aöso make sense to evaluate the size at a
>>>>>>> later time.  But then the reordering becomes problematic again.
>>>>>>> 
>>>>>>> Also I think this would make this feature generally more useful.
>>>>>>> For example, it could work also for others pointers in the struct
>>>>>>> and not just for FAMs.  In this case, the struct may already be
>>>>>>> freed when  BDOS is called, so it might also not possible to
>>>>>>> access the size member at a later time.
>>>>>>> 
>>>>>>> Martin
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> 
>
Qing Zhao Oct. 26, 2023, 7:57 p.m. UTC | #91
I guess that what Kees wanted, ""fill the array without knowing the actual final size" code pattern”, as following:

>> 	struct foo *f;
>> 	char *p;
>> 	int i;
>> 
>> 	f = alloc(maximum_possible);
>> 	f->count = 0;
>> 	p = f->buf;
>> 
>> 	for (i; data_is_available() && i < maximum_possible; i++) {
>> 		f->count ++;
>> 		p[i] = next_data_item();
>> 	}

actually is a dynamic array, or more accurately, Bounded-size dynamic array: ( but not a dynamic allocated array as we discussed so far)

https://en.wikipedia.org/wiki/Dynamic_array

This dynamic array, also is called growable array, or resizable array, whose size can 
be changed during the lifetime. 

For VLA or FAM, I believe that they are both dynamic allocated array, i.e, even though the size is not know at the compilation time, but the size
will be fixed after the array is allocated. 

I am not sure whether C has support to such Dynamic array? Or whether it’s easy to provide dynamic array support in C?

Qing


> On Oct 26, 2023, at 12:45 PM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Donnerstag, dem 26.10.2023 um 09:13 -0700 schrieb Kees Cook:
>> On Thu, Oct 26, 2023 at 10:15:10AM +0200, Martin Uecker wrote:
>>> but not this:
>>> 
> 
> x->count = 11;
>>> char *p = &x->buf;
>>> x->count = 1;
>>> p[10] = 1; // !
>> 
>> This seems fine to me -- it's how I'd expect it to work: "10" is beyond
>> "1".
> 
> Note that the store would be allowed.
> 
>> 
>>> (because the pointer is passed around the
>>> store to the counter)
>>> 
>>> and also here the second store is then irrelevant
>>> for the access:
>>> 
>>> x->count = 10;
>>> char* p = &x->buf;
>>> ...
>>> x->count = 1; // somewhere else
>>> ----
>>> p[9] = 1; // ok, because count matter when buf was accesssed.
>> 
>> This is less great, but I can understand why it happens. "p" loses the
>> association with "x". It'd be nice if "p" had to way to retain that it
>> was just an alias for x->buf, so future p access would check count.
> 
> The problem is not to discover that p is an alias to x->buf, 
> but that it seems difficult to make sure that stores to 
> x->count are not reordered relative to the final access to
> p[i] you want to check, so that you then get the right value.
> 
>> 
>> But this appears to be an existing limitation in other areas where an
>> assignment will cause the loss of object association. (I've run into
>> this before.) It's just more surprising in the above example because in
>> the past the loss of association would cause __bdos() to revert back to
>> "SIZE_MAX" results ("I don't know the size") rather than an "outdated"
>> size, which may get us into unexpected places...
>> 
>>> IMHO this makes sense also from the user side and
>>> are the desirable semantics we discussed before.
>>> 
>>> But can you take a look at this?
>>> 
>>> 
>>> This should simulate it fairly well:
>>> https://godbolt.org/z/xq89aM7Gr
>>> 
>>> (the call to the noinline function would go away,
>>> but not necessarily its impact on optimization)
>> 
>> Yeah, this example should be a very rare situation: a leaf function is
>> changing the characteristics of the struct but returning a buffer within
>> it to the caller. The more likely glitch would be from:
>> 
>> int main()
>> {
>> 	struct foo *f = foo_alloc(7);
>> 	char *p = FAM_ACCESS(f, size, buf);
>> 
>> 	printf("%ld\n", __builtin_dynamic_object_size(p, 0));
>> 	test1(f); // or just "f->count = 10;" no function call needed
>> 	printf("%ld\n", __builtin_dynamic_object_size(p, 0));
>> 
>> 	return 0;
>> }
>> 
>> which reports:
>> 7
>> 7
>> 
>> instead of:
>> 7
>> 10
>> 
>> This kind of "get an alias" situation is pretty common in the kernel
>> as a way to have a convenient "handle" to the array. In the case of a
>> "fill the array without knowing the actual final size" code pattern,
>> things would immediately break:
>> 
>> 	struct foo *f;
>> 	char *p;
>> 	int i;
>> 
>> 	f = alloc(maximum_possible);
>> 	f->count = 0;
>> 	p = f->buf;
>> 
>> 	for (i; data_is_available() && i < maximum_possible; i++) {
>> 		f->count ++;
>> 		p[i] = next_data_item();
>> 	}
>> 
>> Now perhaps the problem here is that "count" cannot be used for a count
>> of "logically valid members in the array" but must always be a count of
>> "allocated member space in the array", which I guess is tolerable, but
>> isn't ideal -- I'd like to catch logic bugs in addition to allocation
>> bugs, but the latter is certainly much more important to catch.
> 
> Maybe we could have a warning when f->buf is not directly
> accessed.
> 
> Martin
> 
>> 
>
Martin Uecker Oct. 27, 2023, 7:21 a.m. UTC | #92
Am Donnerstag, dem 26.10.2023 um 19:57 +0000 schrieb Qing Zhao:
> I guess that what Kees wanted, ""fill the array without knowing the actual final size" code pattern”, as following:
> 
> > > 	struct foo *f;
> > > 	char *p;
> > > 	int i;
> > > 
> > > 	f = alloc(maximum_possible);
> > > 	f->count = 0;
> > > 	p = f->buf;
> > > 
> > > 	for (i; data_is_available() && i < maximum_possible; i++) {
> > > 		f->count ++;
> > > 		p[i] = next_data_item();
> > > 	}
> 
> actually is a dynamic array, or more accurately, Bounded-size dynamic array: ( but not a dynamic allocated array as we discussed so far)
> 
> https://en.wikipedia.org/wiki/Dynamic_array
> 
> This dynamic array, also is called growable array, or resizable array, whose size can 
> be changed during the lifetime. 
> 
> For VLA or FAM, I believe that they are both dynamic allocated array, i.e, even though the size is not know at the compilation time, but the size
> will be fixed after the array is allocated. 
> 
> I am not sure whether C has support to such Dynamic array? Or whether it’s easy to provide dynamic array support in C?

It is possible to support dynamic arrays in C even with
good checking, but not safely using the pattern above
where you derive a pointer which you later use independently.

While we could track the connection to the original struct,
the necessary synchronization between the counter and the
access to the buffer is difficult.  I do not see how this
could be supported with reasonable effort and cost.
 

But with this restriction in mind, we can do a lot in C.
For example, see my experimental (!) container library
which has vector type.
https://github.com/uecker/noplate/blob/main/test.c
You can get an array view for the vector (which then
also can decay to a pointer), so it interoperates nicely
with C but you can get good bounds checking.


But once you derive a pointer and pass it on, it gets
difficult.  But if you want safety, you just have to 
to simply avoid this in code. 

What we could potentially do is add restrictions so 
that the access to buf always has to go via x->buf 
or you get at least a warning.

Martin




> 
> Qing
> 
> 
> > On Oct 26, 2023, at 12:45 PM, Martin Uecker <uecker@tugraz.at> wrote:
> > 
> > Am Donnerstag, dem 26.10.2023 um 09:13 -0700 schrieb Kees Cook:
> > > On Thu, Oct 26, 2023 at 10:15:10AM +0200, Martin Uecker wrote:
> > > > but not this:
> > > > 
> > 
> > x->count = 11;
> > > > char *p = &x->buf;
> > > > x->count = 1;
> > > > p[10] = 1; // !
> > > 
> > > This seems fine to me -- it's how I'd expect it to work: "10" is beyond
> > > "1".
> > 
> > Note that the store would be allowed.
> > 
> > > 
> > > > (because the pointer is passed around the
> > > > store to the counter)
> > > > 
> > > > and also here the second store is then irrelevant
> > > > for the access:
> > > > 
> > > > x->count = 10;
> > > > char* p = &x->buf;
> > > > ...
> > > > x->count = 1; // somewhere else
> > > > ----
> > > > p[9] = 1; // ok, because count matter when buf was accesssed.
> > > 
> > > This is less great, but I can understand why it happens. "p" loses the
> > > association with "x". It'd be nice if "p" had to way to retain that it
> > > was just an alias for x->buf, so future p access would check count.
> > 
> > The problem is not to discover that p is an alias to x->buf, 
> > but that it seems difficult to make sure that stores to 
> > x->count are not reordered relative to the final access to
> > p[i] you want to check, so that you then get the right value.
> > 
> > > 
> > > But this appears to be an existing limitation in other areas where an
> > > assignment will cause the loss of object association. (I've run into
> > > this before.) It's just more surprising in the above example because in
> > > the past the loss of association would cause __bdos() to revert back to
> > > "SIZE_MAX" results ("I don't know the size") rather than an "outdated"
> > > size, which may get us into unexpected places...
> > > 
> > > > IMHO this makes sense also from the user side and
> > > > are the desirable semantics we discussed before.
> > > > 
> > > > But can you take a look at this?
> > > > 
> > > > 
> > > > This should simulate it fairly well:
> > > > https://godbolt.org/z/xq89aM7Gr
> > > > 
> > > > (the call to the noinline function would go away,
> > > > but not necessarily its impact on optimization)
> > > 
> > > Yeah, this example should be a very rare situation: a leaf function is
> > > changing the characteristics of the struct but returning a buffer within
> > > it to the caller. The more likely glitch would be from:
> > > 
> > > int main()
> > > {
> > > 	struct foo *f = foo_alloc(7);
> > > 	char *p = FAM_ACCESS(f, size, buf);
> > > 
> > > 	printf("%ld\n", __builtin_dynamic_object_size(p, 0));
> > > 	test1(f); // or just "f->count = 10;" no function call needed
> > > 	printf("%ld\n", __builtin_dynamic_object_size(p, 0));
> > > 
> > > 	return 0;
> > > }
> > > 
> > > which reports:
> > > 7
> > > 7
> > > 
> > > instead of:
> > > 7
> > > 10
> > > 
> > > This kind of "get an alias" situation is pretty common in the kernel
> > > as a way to have a convenient "handle" to the array. In the case of a
> > > "fill the array without knowing the actual final size" code pattern,
> > > things would immediately break:
> > > 
> > > 	struct foo *f;
> > > 	char *p;
> > > 	int i;
> > > 
> > > 	f = alloc(maximum_possible);
> > > 	f->count = 0;
> > > 	p = f->buf;
> > > 
> > > 	for (i; data_is_available() && i < maximum_possible; i++) {
> > > 		f->count ++;
> > > 		p[i] = next_data_item();
> > > 	}
> > > 
> > > Now perhaps the problem here is that "count" cannot be used for a count
> > > of "logically valid members in the array" but must always be a count of
> > > "allocated member space in the array", which I guess is tolerable, but
> > > isn't ideal -- I'd like to catch logic bugs in addition to allocation
> > > bugs, but the latter is certainly much more important to catch.
> > 
> > Maybe we could have a warning when f->buf is not directly
> > accessed.
> > 
> > Martin
> > 
> > > 
> > 
>
Qing Zhao Oct. 27, 2023, 2:32 p.m. UTC | #93
> On Oct 27, 2023, at 3:21 AM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Donnerstag, dem 26.10.2023 um 19:57 +0000 schrieb Qing Zhao:
>> I guess that what Kees wanted, ""fill the array without knowing the actual final size" code pattern”, as following:
>> 
>>>> 	struct foo *f;
>>>> 	char *p;
>>>> 	int i;
>>>> 
>>>> 	f = alloc(maximum_possible);
>>>> 	f->count = 0;
>>>> 	p = f->buf;
>>>> 
>>>> 	for (i; data_is_available() && i < maximum_possible; i++) {
>>>> 		f->count ++;
>>>> 		p[i] = next_data_item();
>>>> 	}
>> 
>> actually is a dynamic array, or more accurately, Bounded-size dynamic array: ( but not a dynamic allocated array as we discussed so far)
>> 
>> https://en.wikipedia.org/wiki/Dynamic_array
>> 
>> This dynamic array, also is called growable array, or resizable array, whose size can 
>> be changed during the lifetime. 
>> 
>> For VLA or FAM, I believe that they are both dynamic allocated array, i.e, even though the size is not know at the compilation time, but the size
>> will be fixed after the array is allocated. 
>> 
>> I am not sure whether C has support to such Dynamic array? Or whether it’s easy to provide dynamic array support in C?
> 
> It is possible to support dynamic arrays in C even with
> good checking, but not safely using the pattern above
> where you derive a pointer which you later use independently.
> 
> While we could track the connection to the original struct,
> the necessary synchronization between the counter and the
> access to the buffer is difficult.  I do not see how this
> could be supported with reasonable effort and cost.
> 
> 
> But with this restriction in mind, we can do a lot in C.
> For example, see my experimental (!) container library
> which has vector type.
> https://github.com/uecker/noplate/blob/main/test.c
> You can get an array view for the vector (which then
> also can decay to a pointer), so it interoperates nicely
> with C but you can get good bounds checking.
> 
> 
> But once you derive a pointer and pass it on, it gets
> difficult.  But if you want safety, you just have to 
> to simply avoid this in code. 

So, for the following modified code: (without the additional pointer “p”)

struct foo
{
 size_t count;
 char buf[] __attribute__((counted_by(count)));
};

struct foo *f;
int i;  

f = alloc(maximum_possible);
f->count = 0;

for (i; data_is_available() && i < maximum_possible; i++) {
  f->count ++;  
  f->buf[i] = next_data_item();
}       

The support for dynamic array should be possible? 


> 
> What we could potentially do is add restrictions so 
> that the access to buf always has to go via x->buf 
> or you get at least a warning.

Are the following two restrictions to the user enough:

1. The access to buf should always go via x->buf, 
    no assignment to another independent pointer 
    and access buf through this new pointer.
2.  User need to keep the synchronization between
      the counter and the access to the buffer all the time.


Qing
> 
> Martin
> 
> 
> 
> 
>> 
>> Qing
>> 
>> 
>>> On Oct 26, 2023, at 12:45 PM, Martin Uecker <uecker@tugraz.at> wrote:
>>> 
>>> Am Donnerstag, dem 26.10.2023 um 09:13 -0700 schrieb Kees Cook:
>>>> On Thu, Oct 26, 2023 at 10:15:10AM +0200, Martin Uecker wrote:
>>>>> but not this:
>>>>> 
>>> 
>>> x->count = 11;
>>>>> char *p = &x->buf;
>>>>> x->count = 1;
>>>>> p[10] = 1; // !
>>>> 
>>>> This seems fine to me -- it's how I'd expect it to work: "10" is beyond
>>>> "1".
>>> 
>>> Note that the store would be allowed.
>>> 
>>>> 
>>>>> (because the pointer is passed around the
>>>>> store to the counter)
>>>>> 
>>>>> and also here the second store is then irrelevant
>>>>> for the access:
>>>>> 
>>>>> x->count = 10;
>>>>> char* p = &x->buf;
>>>>> ...
>>>>> x->count = 1; // somewhere else
>>>>> ----
>>>>> p[9] = 1; // ok, because count matter when buf was accesssed.
>>>> 
>>>> This is less great, but I can understand why it happens. "p" loses the
>>>> association with "x". It'd be nice if "p" had to way to retain that it
>>>> was just an alias for x->buf, so future p access would check count.
>>> 
>>> The problem is not to discover that p is an alias to x->buf, 
>>> but that it seems difficult to make sure that stores to 
>>> x->count are not reordered relative to the final access to
>>> p[i] you want to check, so that you then get the right value.
>>> 
>>>> 
>>>> But this appears to be an existing limitation in other areas where an
>>>> assignment will cause the loss of object association. (I've run into
>>>> this before.) It's just more surprising in the above example because in
>>>> the past the loss of association would cause __bdos() to revert back to
>>>> "SIZE_MAX" results ("I don't know the size") rather than an "outdated"
>>>> size, which may get us into unexpected places...
>>>> 
>>>>> IMHO this makes sense also from the user side and
>>>>> are the desirable semantics we discussed before.
>>>>> 
>>>>> But can you take a look at this?
>>>>> 
>>>>> 
>>>>> This should simulate it fairly well:
>>>>> https://godbolt.org/z/xq89aM7Gr
>>>>> 
>>>>> (the call to the noinline function would go away,
>>>>> but not necessarily its impact on optimization)
>>>> 
>>>> Yeah, this example should be a very rare situation: a leaf function is
>>>> changing the characteristics of the struct but returning a buffer within
>>>> it to the caller. The more likely glitch would be from:
>>>> 
>>>> int main()
>>>> {
>>>> 	struct foo *f = foo_alloc(7);
>>>> 	char *p = FAM_ACCESS(f, size, buf);
>>>> 
>>>> 	printf("%ld\n", __builtin_dynamic_object_size(p, 0));
>>>> 	test1(f); // or just "f->count = 10;" no function call needed
>>>> 	printf("%ld\n", __builtin_dynamic_object_size(p, 0));
>>>> 
>>>> 	return 0;
>>>> }
>>>> 
>>>> which reports:
>>>> 7
>>>> 7
>>>> 
>>>> instead of:
>>>> 7
>>>> 10
>>>> 
>>>> This kind of "get an alias" situation is pretty common in the kernel
>>>> as a way to have a convenient "handle" to the array. In the case of a
>>>> "fill the array without knowing the actual final size" code pattern,
>>>> things would immediately break:
>>>> 
>>>> 	struct foo *f;
>>>> 	char *p;
>>>> 	int i;
>>>> 
>>>> 	f = alloc(maximum_possible);
>>>> 	f->count = 0;
>>>> 	p = f->buf;
>>>> 
>>>> 	for (i; data_is_available() && i < maximum_possible; i++) {
>>>> 		f->count ++;
>>>> 		p[i] = next_data_item();
>>>> 	}
>>>> 
>>>> Now perhaps the problem here is that "count" cannot be used for a count
>>>> of "logically valid members in the array" but must always be a count of
>>>> "allocated member space in the array", which I guess is tolerable, but
>>>> isn't ideal -- I'd like to catch logic bugs in addition to allocation
>>>> bugs, but the latter is certainly much more important to catch.
>>> 
>>> Maybe we could have a warning when f->buf is not directly
>>> accessed.
>>> 
>>> Martin
>>> 
>>>> 
>>> 
>> 
>
Martin Uecker Oct. 27, 2023, 2:53 p.m. UTC | #94
Am Freitag, dem 27.10.2023 um 14:32 +0000 schrieb Qing Zhao:
> 
> > On Oct 27, 2023, at 3:21 AM, Martin Uecker <uecker@tugraz.at> wrote:
> > 
> > Am Donnerstag, dem 26.10.2023 um 19:57 +0000 schrieb Qing Zhao:
> > > I guess that what Kees wanted, ""fill the array without knowing the actual final size" code pattern”, as following:
> > > 
> > > > > 	struct foo *f;
> > > > > 	char *p;
> > > > > 	int i;
> > > > > 
> > > > > 	f = alloc(maximum_possible);
> > > > > 	f->count = 0;
> > > > > 	p = f->buf;
> > > > > 
> > > > > 	for (i; data_is_available() && i < maximum_possible; i++) {
> > > > > 		f->count ++;
> > > > > 		p[i] = next_data_item();
> > > > > 	}
> > > 
> > > actually is a dynamic array, or more accurately, Bounded-size dynamic array: ( but not a dynamic allocated array as we discussed so far)
> > > 
> > > https://en.wikipedia.org/wiki/Dynamic_array
> > > 
> > > This dynamic array, also is called growable array, or resizable array, whose size can 
> > > be changed during the lifetime. 
> > > 
> > > For VLA or FAM, I believe that they are both dynamic allocated array, i.e, even though the size is not know at the compilation time, but the size
> > > will be fixed after the array is allocated. 
> > > 
> > > I am not sure whether C has support to such Dynamic array? Or whether it’s easy to provide dynamic array support in C?
> > 
> > It is possible to support dynamic arrays in C even with
> > good checking, but not safely using the pattern above
> > where you derive a pointer which you later use independently.
> > 
> > While we could track the connection to the original struct,
> > the necessary synchronization between the counter and the
> > access to the buffer is difficult.  I do not see how this
> > could be supported with reasonable effort and cost.
> > 
> > 
> > But with this restriction in mind, we can do a lot in C.
> > For example, see my experimental (!) container library
> > which has vector type.
> > https://github.com/uecker/noplate/blob/main/test.c
> > You can get an array view for the vector (which then
> > also can decay to a pointer), so it interoperates nicely
> > with C but you can get good bounds checking.
> > 
> > 
> > But once you derive a pointer and pass it on, it gets
> > difficult.  But if you want safety, you just have to 
> > to simply avoid this in code. 
> 
> So, for the following modified code: (without the additional pointer “p”)
> 
> struct foo
> {
>  size_t count;
>  char buf[] __attribute__((counted_by(count)));
> };
> 
> struct foo *f;
> int i;  
> 
> f = alloc(maximum_possible);
> f->count = 0;
> 
> for (i; data_is_available() && i < maximum_possible; i++) {
>   f->count ++;  
>   f->buf[i] = next_data_item();
> }       
> 
> The support for dynamic array should be possible? 

With the design we discussed this should work because
__builtin_with_access (or whatever) it reads:

f = alloc(maximum_possible);
f->count = 0;

for (i; data_is_available() && i < maximum_possible; i++) {
  f->count ++;  
  __builtin_with_access(f->buf, f->count)[i] = next_data_item();
}   

> 
> 
> > 
> > What we could potentially do is add restrictions so 
> > that the access to buf always has to go via x->buf 
> > or you get at least a warning.
> 
> Are the following two restrictions to the user enough:
> 
> 1. The access to buf should always go via x->buf, 
>     no assignment to another independent pointer 
>     and access buf through this new pointer.

Yes, maybe. One could also try to be smarter.

For example, one warn only when &f->buf is
assigned to another pointer and one of the
following conditions is fulfilled:

- the pointer escapes from the local context 

- there is a store to f->counter in the
local context that does not dominate &f->buf.

Then Kees' example would work too in most cases.

But I would probably wait until we have some
initial experience with this feature.

Martin

> 2.  User need to keep the synchronization between
>       the counter and the access to the buffer all the time.



> 
>
Qing Zhao Oct. 27, 2023, 3:10 p.m. UTC | #95
> On Oct 27, 2023, at 10:53 AM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Freitag, dem 27.10.2023 um 14:32 +0000 schrieb Qing Zhao:
>> 
>>> On Oct 27, 2023, at 3:21 AM, Martin Uecker <uecker@tugraz.at> wrote:
>>> 
>>> Am Donnerstag, dem 26.10.2023 um 19:57 +0000 schrieb Qing Zhao:
>>>> I guess that what Kees wanted, ""fill the array without knowing the actual final size" code pattern”, as following:
>>>> 
>>>>>> 	struct foo *f;
>>>>>> 	char *p;
>>>>>> 	int i;
>>>>>> 
>>>>>> 	f = alloc(maximum_possible);
>>>>>> 	f->count = 0;
>>>>>> 	p = f->buf;
>>>>>> 
>>>>>> 	for (i; data_is_available() && i < maximum_possible; i++) {
>>>>>> 		f->count ++;
>>>>>> 		p[i] = next_data_item();
>>>>>> 	}
>>>> 
>>>> actually is a dynamic array, or more accurately, Bounded-size dynamic array: ( but not a dynamic allocated array as we discussed so far)
>>>> 
>>>> https://en.wikipedia.org/wiki/Dynamic_array
>>>> 
>>>> This dynamic array, also is called growable array, or resizable array, whose size can 
>>>> be changed during the lifetime. 
>>>> 
>>>> For VLA or FAM, I believe that they are both dynamic allocated array, i.e, even though the size is not know at the compilation time, but the size
>>>> will be fixed after the array is allocated. 
>>>> 
>>>> I am not sure whether C has support to such Dynamic array? Or whether it’s easy to provide dynamic array support in C?
>>> 
>>> It is possible to support dynamic arrays in C even with
>>> good checking, but not safely using the pattern above
>>> where you derive a pointer which you later use independently.
>>> 
>>> While we could track the connection to the original struct,
>>> the necessary synchronization between the counter and the
>>> access to the buffer is difficult.  I do not see how this
>>> could be supported with reasonable effort and cost.
>>> 
>>> 
>>> But with this restriction in mind, we can do a lot in C.
>>> For example, see my experimental (!) container library
>>> which has vector type.
>>> https://github.com/uecker/noplate/blob/main/test.c
>>> You can get an array view for the vector (which then
>>> also can decay to a pointer), so it interoperates nicely
>>> with C but you can get good bounds checking.
>>> 
>>> 
>>> But once you derive a pointer and pass it on, it gets
>>> difficult.  But if you want safety, you just have to 
>>> to simply avoid this in code. 
>> 
>> So, for the following modified code: (without the additional pointer “p”)
>> 
>> struct foo
>> {
>> size_t count;
>> char buf[] __attribute__((counted_by(count)));
>> };
>> 
>> struct foo *f;
>> int i;  
>> 
>> f = alloc(maximum_possible);
>> f->count = 0;
>> 
>> for (i; data_is_available() && i < maximum_possible; i++) {
>>  f->count ++;  
>>  f->buf[i] = next_data_item();
>> }       
>> 
>> The support for dynamic array should be possible? 
> 
> With the design we discussed this should work because
> __builtin_with_access (or whatever) it reads:
> 
> f = alloc(maximum_possible);
> f->count = 0;
> 
> for (i; data_is_available() && i < maximum_possible; i++) {
>  f->count ++;  
>  __builtin_with_access(f->buf, f->count)[i] = next_data_item();
> }   
> 

Yes, with the data flow, f->count should get the latest value of f->count. 
>> 
>> 
>>> 
>>> What we could potentially do is add restrictions so 
>>> that the access to buf always has to go via x->buf 
>>> or you get at least a warning.
>> 
>> Are the following two restrictions to the user enough:
>> 
>> 1. The access to buf should always go via x->buf, 
>>    no assignment to another independent pointer 
>>    and access buf through this new pointer.
> 
> Yes, maybe. One could also try to be smarter.
> 
> For example, one warn only when &f->buf is
> assigned to another pointer and one of the
> following conditions is fulfilled:
> 
> - the pointer escapes from the local context 
> 
> - there is a store to f->counter in the
> local context that does not dominate &f->buf.
> 
> Then Kees' example would work too in most cases.

I guess that we might need to come up with the list of concrete restrictions to the user, 
and list these restrictions in the user documentation.

Since  the dynamic array support is quite important to the kernel (is this true, Kees? ),
We might need to include such support into our design in the beginning. 

> 
> But I would probably wait until we have some
> initial experience with this feature.

You mean after we have an initial implementation of the “builtin_with_size”?
Yes, at this moment, I think that the “builtin_with_size” approach is the best one.
Just some details need more thinking before the real implementation.  -:)

Qing
> 
> Martin
> 
>> 2.  User need to keep the synchronization between
>>      the counter and the access to the buffer all the time.
Qing Zhao Oct. 27, 2023, 4:43 p.m. UTC | #96
About where we should insert the new __builtin_with_access_and_size:

> On Oct 26, 2023, at 2:54 PM, Qing Zhao <qing.zhao@oracle.com> wrote:
> 
> 
> 
>> On Oct 26, 2023, at 10:05 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> 
>> 
>>> Am 26.10.2023 um 12:14 schrieb Martin Uecker <uecker@tugraz.at>:
>>> 
>>> Am Donnerstag, dem 26.10.2023 um 11:20 +0200 schrieb Martin Uecker:
>>>>> Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
>>>>> On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker <uecker@tugraz.at> wrote:
>>>>>> 
>>>>>> Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
>>>>>>> 
>>>>>>>> Am 25.10.2023 um 12:47 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>>>>> 
>>>>>>>> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>>>>>>>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>>>>>>>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>>>>>>>>> 
>>>>>>>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uecker@tugraz.at>:
>>>>>>>>>>>> 
>>>>>>>>>>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>>>>>>>>>>> Hi, Sid,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Really appreciate for your example and detailed explanation. Very helpful.
>>>>>>>>>>>>> I think that this example is an excellent example to show (almost) all the issues we need to consider.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I slightly modified this example to make it to be compilable and run-able, as following:
>>>>>>>>>>>>> (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1 #include <malloc.h>
>>>>>>>>>>>>> 2 struct A
>>>>>>>>>>>>> 3 {
>>>>>>>>>>>>> 4  size_t size;
>>>>>>>>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>>>>>>>>> 6 };
>>>>>>>>>>>>> 7
>>>>>>>>>>>>> 8 static size_t
>>>>>>>>>>>>> 9 get_size_from (void *ptr)
>>>>>>>>>>>>> 10 {
>>>>>>>>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>>>>>>>>> 12 }
>>>>>>>>>>>>> 13
>>>>>>>>>>>>> 14 void
>>>>>>>>>>>>> 15 foo (size_t sz)
>>>>>>>>>>>>> 16 {
>>>>>>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>>>>>>>>> 18  obj->size = sz;
>>>>>>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>>>>>>>>> 21  return;
>>>>>>>>>>>>> 22 }
>>>>>>>>>>>>> 23
>>>>>>>>>>>>> 24 int main ()
>>>>>>>>>>>>> 25 {
>>>>>>>>>>>>> 26  foo (20);
>>>>>>>>>>>>> 27  return 0;
>>>>>>>>>>>>> 28 }
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> <snip>
>>>>>>>>> 
>>>>>>>>>>> When it’s set I suppose.  Turn
>>>>>>>>>>> 
>>>>>>>>>>> X.l = n;
>>>>>>>>>>> 
>>>>>>>>>>> Into
>>>>>>>>>>> 
>>>>>>>>>>> X.l = __builtin_with_size (x.buf, n);
>>>>>>>>>> 
>>>>>>>>>> It would turn
>>>>>>>>>> 
>>>>>>>>>> some_variable = (&) x.buf
>>>>>>>>>> 
>>>>>>>>>> into
>>>>>>>>>> 
>>>>>>>>>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> So the later access to x.buf and not the initialization
>>>>>>>>>> of a member of the struct (which is too early).
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hmm, so with Qing's example above, are you suggesting the transformation
>>>>>>>>> be to foo like so:
>>>>>>>>> 
>>>>>>>>> 14 void
>>>>>>>>> 15 foo (size_t sz)
>>>>>>>>> 16 {
>>>>>>>>> 16.5  void * _1;
>>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>>>>> 18  obj->size = sz;
>>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>>> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
>>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (_1));
>>>>>>>>> 21  return;
>>>>>>>>> 22 }
>>>>>>>>> 
>>>>>>>>> If yes then this could indeed work.  I think I got thrown off by the
>>>>>>>>> reference to __bdos.
>>>>>>>> 
>>>>>>>> Yes. I think it is important not to evaluate the size at the
>>>>>>>> access to buf and not the allocation, because the point is to
>>>>>>>> recover it from the size member even when the compiler can't
>>>>>>>> see the original allocation.
>>>>>>> 
>>>>>>> But if the access is through a pointer without the attribute visible
>>>>>>> even the Frontend cannot recover?
>>>>>> 
>>>>>> Yes, if the access is using a struct-with-FAM without the attribute
>>>>>> the FE would not be insert the builtin.  BDOS could potentially
>>>>>> still see the original allocation but if it doesn't, then there is
>>>>>> no information.
>>>>>> 
>>>>>>> We’d need to force type correctness and give up on indirecting
>>>>>>> through an int * when it can refer to two diffenent container types.
>>>>>>> The best we can do I think is mark allocation sites and hope for
>>>>>>> some basic code hygiene (not clobbering size or array pointer
>>>>>>> through pointers without the appropriately attributed type)
>>>>>> 
>>>>>> I am do not fully understand what you are referring to.
>>>>> 
>>>>> struct A { int n; int data[n]; };
>>>>> struct B { long n; int data[n]; };
>>>>> 
>>>>> int *p = flag ? a->data : b->data;
>>>>> 
>>>>> access *p;
>>>>> 
>>>>> Since we need to allow interoperability of pointers (a->data is
>>>>> convertible to a non-fat pointer of type int *) this leaves us with
>>>>> ambiguity we need to conservatively handle to avoid false positives.
>>>> 
>>>> For BDOS, I would expect this to work exactly like:
>>>> 
>>>> char aa[n1];
>>>> char bb[n2];
>>>> char *p = flag ? aa : bb;
>>>> 
>>>> (or similar code with malloc). In fact it does:
>>>> 
>>>> https://godbolt.org/z/bK68YKqhe
>>>> (cheating a bit and also the sub-object version of
>>>> BDOS does not seem to work)
>>>> 
>>>>> 
>>>>> We _might_ want to diagnose decay of a->data to int *, but IIRC
>>>>> there's no way (or proposal) to allow declaring a corresponding
>>>>> fat pointer, so it's not a good designed feature.
>>>> 
>>>> As a language feature, I fully agree.  I see the
>>>> counted_by attribute has a makeshift solution.
>>>> 
>>>> But we can already do:
>>>> 
>>>> auto p = flag ? &aa : &bb;
>>>> 
>>>> and this already works perfectly:
>>>> 
>>>> https://godbolt.org/z/rvb6xWWPj
>>>> 
>>>> We can also name the variably-modifed type: 
>>>> 
>>>> char (*p)[flag ? n1 : n2] = flag ? &aa : &bb;
>>>> https://godbolt.org/z/13cTT1vGP
>>>> 
>>>> The problem with this version is that consistency
>>>> is not checked. (I have patch for adding run-time
>>>> checks).
>>>> 
>>>> And then the next step would be to allow
>>>> 
>>>> char (*p)[:] = flag ? &aa : &bb;
>>>> 
>>>> or similar.  Dennis Ritchie proposed this himself
>>>> a long time ago.
>>>> 
>>>> So far this seems straightfoward.
>>>> 
>>>> If we then want to allow such wide pointers as
>>>> function arguments or in structs, we would need
>>>> to define an ABI. But the ABI could just be
>>>> 
>>>> struct { char (*p)[.s]; size_t s; };
>>>> 
>>>> Maybe we could try to make the following
>>>> ABI compatible:
>>>> 
>>>> int foo(int p[s], size_t s);
>>>> int foo(int p[:]);
>>>> 
>>>> 
>>>>> Having __builtin_with_size at allocation would possibly make
>>>>> the BOS use-def walk discover both objects.
>>>> 
>>>> Yes. But I do not think this there is any fundamental
>>>> difference to discovering allocation functions.
>>>> 
>>>>> I think you can't
>>>>> insert __builtin_with_size at the access to *p, but in practice
>>>>> that would be very much needed.
>>>> 
>>>> Usually the access to *p would follow directly the
>>>> access x.buf, so BDOS should find it.
>>>> 
>>>> But yes, to get full bounds safety, the pointer type 
>>>> has to change to a variably-modified type (which would work
>>>> today) or a fat pointer type. The later can be built on
>>>> vm-types easily because all the FE semantics already
>>>> exists.
>>> 
>>> We could insert the __builtin_with_size everywhere
>>> we have to convert a wide pointer or let an array
>>> decay to traditional pointer for reason of compatibility 
>>> with legacy code.
>> 
>> That sounds like a nice idea.  Note I’d like to see the consumer side implemented so we can play with different points of insertion (and I’ll try to show corner cases where it goes wrong).
> 
> Giving the example I mentioned previously:
> 
>  1 #include <malloc.h>
>  2 struct A
>  3 {
>  4  size_t size;
>  5  char buf[] __attribute__((counted_by(size)));
>  6 };
>  7 
>  8 static size_t
>  9 get_size_from (void *ptr)
> 10 {
> 11  return __builtin_dynamic_object_size (ptr, 1);
> 12 }
> 13 
> 14 void
> 15 foo (size_t sz)
> 16 {
> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 18  obj->size = sz;
> 19  obj->buf[0] = 2;
> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> 21  return;
> 22 }
> 
> So, the different points of insertion the new __builtin_with_size in FE include the following points: (per my understanding so far)
> 
> Point 1. When the “obj->buf” is referenced at line 19, and line 20?
> Point 2. When the “obj” is allocated at line 17? 
> 
> Are these correct?
> 
> Any other points we need to consider?

After more thinking, I think for the current “counted_by” attribute, the best insertion point is 

Point 1: when the “obj->buf” is referenced.

As we discussed before, we need to add a semantic requirement in the user documentation first:

The setting to “obj->size” should be done before the first reference to “obj->buf”. 

So, only when the “obj->buf” is referenced, we can guarantee that the “obj->size” is initialized already. 
No additional check for whether “obj->size” is valid in the IL when adding the __builtin_with_access_and_size.

> 
> 
>> It all seems a bit late for GCC 14 though.
> 
> Yes, I agree.  Given the potential impact on the code generation and other potential issues, it’s better to be put in the early stage of the next release.

I will write a detailed proposal based on the discussion so far with the following information:
   1. User interface for counted_by attribute and user level requirement;
   2. __builtin_with_access_and_size() approach for implementation:
       2.1 definition of this new internal fn;
       2.2 implementation details in FE and Middle end;
   3. Testings;

And send out for more comments.

Let me know if you have more suggestions.

Thanks a lot for all the help.

Qing
> 
> Qing
>> 
>> Richard 
>> 
>>> Martin
>>> 
>>>> 
>>>> Martin
>>>> 
>>>>> 
>>>>> Richard.
>>>>> 
>>>>>> But yes,
>>>>>> for full bounds safety we would need the language feature.
>>>>>> In C people should start to variably-modified types
>>>>>> more.  I think we can build perfect bounds safety on top of
>>>>>> them in a very good way with only FE changes.
>>>>>> 
>>>>>> All these attributes are just a best effort.  But for a while,
>>>>>> this will be necessary.
>>>>>> 
>>>>>> Martin
>>>>>> 
>>>>>>> 
>>>>>>>> Evaluating at this point requires that the size is correctly set
>>>>>>>> before the access to the FAM and the user has to make sure
>>>>>>>> this is the case. But to me this requirement would make sense.
>>>>>>>> 
>>>>>>>> Semantically, it could aöso make sense to evaluate the size at a
>>>>>>>> later time.  But then the reordering becomes problematic again.
>>>>>>>> 
>>>>>>>> Also I think this would make this feature generally more useful.
>>>>>>>> For example, it could work also for others pointers in the struct
>>>>>>>> and not just for FAMs.  In this case, the struct may already be
>>>>>>>> freed when  BDOS is called, so it might also not possible to
>>>>>>>> access the size member at a later time.
>>>>>>>> 
>>>>>>>> Martin
>
Kees Cook Oct. 27, 2023, 5:19 p.m. UTC | #97
On Fri, Oct 27, 2023 at 03:10:22PM +0000, Qing Zhao wrote:
> Since  the dynamic array support is quite important to the kernel (is this true, Kees? ),
> We might need to include such support into our design in the beginning. 

tl;dr: We don't need "dynamic array support" in the 1st version of __counted_by

I'm not sure it's as strong as "quite important", but it is a code
pattern that exists. The vast majority of FAM usage is run-time fixed,
in the sense that the allocation matches the usage. Only sometimes do we
over-allocate and then slowly fill it up like I've shown.

So really my thoughts on this are to bring light to the usage pattern
in the hopes that we don't make it an impossible thing to do. And if
it's a limitation of the initial version of __counted_by, the kernel can
still use it: it will just need to use __counted_by strictly for
allocation sizes, not "usage" size:

struct foo {
	int allocated;
	int used;
	int array[] __counted_by(allocated); // would nice to use "used"
};

	struct foo *p;

	p = alloc(sizeof(*p) + sizeof(*p->array) * max_items);
	p->allocated = max_items;
	p->used = 0;

	while (data_available())
		p->array[++p->used] = next_datum();

With this, we'll still catch p->array accesses beyond "allocated",
but other code in the kernel won't catch "invalid data" accesses for
p->array beyond "used". (i.e. we still have memory corruption protection,
just not logic error protection.)

We can deal with aliasing in the future if we want to expand to catching
logic errors.

I should not that we don't get logic error protection from things like
ARM's Memory Tagging Extension either -- it only tracks allocation size
(and is very expensive to change as the "used" part of an allocation
grows), so this isn't an unreasonable condition for __counted_by to
require as well.
Qing Zhao Oct. 27, 2023, 6:13 p.m. UTC | #98
Okay, thanks for the explanation.
We will keep this in mind.

Qing

> On Oct 27, 2023, at 1:19 PM, Kees Cook <keescook@chromium.org> wrote:
> 
> On Fri, Oct 27, 2023 at 03:10:22PM +0000, Qing Zhao wrote:
>> Since  the dynamic array support is quite important to the kernel (is this true, Kees? ),
>> We might need to include such support into our design in the beginning. 
> 
> tl;dr: We don't need "dynamic array support" in the 1st version of __counted_by
> 
> I'm not sure it's as strong as "quite important", but it is a code
> pattern that exists. The vast majority of FAM usage is run-time fixed,
> in the sense that the allocation matches the usage. Only sometimes do we
> over-allocate and then slowly fill it up like I've shown.
> 
> So really my thoughts on this are to bring light to the usage pattern
> in the hopes that we don't make it an impossible thing to do. And if
> it's a limitation of the initial version of __counted_by, the kernel can
> still use it: it will just need to use __counted_by strictly for
> allocation sizes, not "usage" size:
> 
> struct foo {
> 	int allocated;
> 	int used;
> 	int array[] __counted_by(allocated); // would nice to use "used"
> };
> 
> 	struct foo *p;
> 
> 	p = alloc(sizeof(*p) + sizeof(*p->array) * max_items);
> 	p->allocated = max_items;
> 	p->used = 0;
> 
> 	while (data_available())
> 		p->array[++p->used] = next_datum();
> 
> With this, we'll still catch p->array accesses beyond "allocated",
> but other code in the kernel won't catch "invalid data" accesses for
> p->array beyond "used". (i.e. we still have memory corruption protection,
> just not logic error protection.)
> 
> We can deal with aliasing in the future if we want to expand to catching
> logic errors.
> 
> I should not that we don't get logic error protection from things like
> ARM's Memory Tagging Extension either -- it only tracks allocation size
> (and is very expensive to change as the "used" part of an allocation
> grows), so this isn't an unreasonable condition for __counted_by to
> require as well.
> 
> -- 
> Kees Cook