mbox series

[RFC,V3,0/5] Support for CTF in GCC

Message ID 1561617445-9328-1-git-send-email-indu.bhagat@oracle.com
Headers show
Series Support for CTF in GCC | expand

Message

Indu Bhagat June 27, 2019, 6:37 a.m. UTC
Hello,

This patch series adds support for CTF generation in GCC.

[Changes from V2]
 - Patch 1, 2, and 3 have minor edits if any.
 - Patch 4 is a new addition.
 - Patch 5 is a new addition.

Summary of the GCC RFC V3 patch set :
Patch 1, 2, and 3 do the preparatory work of adding the CTF command line options
and setting up the framework for CTF generation and emission.  More details on
these patches can be seen in the previous posting
https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00718.html

With Patch 4 in the current set, the compiler can generate a .ctf section for a
single compilation unit if -gt (when unspecified, LEVEL defaults to 2) or -gt2
is specified.  Recall that -gt2 produces type information for entities
(functions, variables etc.) at file-scope or global-scope.

For each translation unit, a CTF container (ctf_container_t) is used to
keep the generated CTF.  Two hash_map structures are kept to hold the generated
CTF for type and variables.  CTF does need pre-processing before emission into
a section; there are code comments in ctfout.c to help understand this.

There are a couple of TBDs and FIXMEs in Patch 4 which will be resolved as I
progress further; Inputs on some of which will be very helpful :

- ctf_dtdef_hash : The compiler uses a hashing scheme to keep track of whether
  CTF has been generated for a type of decl.  For a type, the hashing scheme
  uses TYPE_UID, but for a decl it uses htab_hash_pointer (decl).  Is there a
  better way to do this ? (See hash_dtd_tree_decl in ctfout.c)

- delete_ctf_container routine in ctfout.c : I have used the GTY (()) tags in
  the CTF container structs.  Does this ensure that if I set the CTF container
  global variable (ctfc) to NULL, the garbage collection machinery will take
  care of cleaning up the the internals of the container (including hash_map).
  Haven't been able to get a definitive answer looking at the code in
  hash-map.h and the generated code in gtype-desc.c.

Testing :
- Bootstrapped and regression tested on x86_64/linux and aarch64/linux.
  Also bootstrapped on SPARC64/linux with some testing.
- Parsed .ctf sections of libdtrace-ctf files via a CTF dumping utility on
  x86_64/linux.  This simply ensures that the CTF sections are well-formed.
- Interaction with an internally available GDB looks promising.  Basic whatis
  and ptype tests work.  GDB patches to uptake CTF debug info are in the works
  and will be upstreamed soon.

In the subsequent patches, I intend to close some open ends in the current
patch and add LTO support.

Thanks,

Indu Bhagat (5):
  Add new function lang_GNU_GIMPLE
  Add CTF command line options : -gtLEVEL
  Setup for CTF generation and emission
  CTF generation for a single compilation unit
  Update CTF testsuite

 gcc/ChangeLog                                      |   91 +
 gcc/Makefile.in                                    |    5 +
 gcc/cgraphunit.c                                   |   12 +-
 gcc/common.opt                                     |    9 +
 gcc/ctfcreate.c                                    |  526 ++++++
 gcc/ctfout.c                                       | 1739 ++++++++++++++++++++
 gcc/ctfout.h                                       |  359 ++++
 gcc/ctfutils.c                                     |  198 +++
 gcc/doc/invoke.texi                                |   16 +
 gcc/flag-types.h                                   |   13 +
 gcc/gengtype.c                                     |    4 +-
 gcc/langhooks.c                                    |    9 +
 gcc/langhooks.h                                    |    1 +
 gcc/opts.c                                         |   26 +
 gcc/passes.c                                       |    7 +-
 gcc/testsuite/ChangeLog                            |   35 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-1.c             |    6 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-2.c             |   10 +
 .../gcc.dg/debug/ctf/ctf-anonymous-struct-1.c      |   23 +
 .../gcc.dg/debug/ctf/ctf-anonymous-union-1.c       |   26 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-array-1.c       |   31 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-1.c   |   30 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-2.c   |   39 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-1.c   |   44 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-2.c   |   30 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-3.c   |   41 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-enum-1.c        |   21 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-float-1.c       |   16 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-forward-1.c     |   36 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-forward-2.c     |   16 +
 .../gcc.dg/debug/ctf/ctf-function-pointers-1.c     |   24 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-functions-1.c   |   34 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-int-1.c         |   17 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-pointers-1.c    |   26 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-preamble-1.c    |   11 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-str-table-1.c   |   26 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-struct-1.c      |   25 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-struct-2.c      |   30 +
 .../gcc.dg/debug/ctf/ctf-struct-array-1.c          |   36 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-typedef-1.c     |   23 +
 .../gcc.dg/debug/ctf/ctf-typedef-struct-1.c        |   12 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-union-1.c       |   14 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-variables-1.c   |   25 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf.exp             |   41 +
 gcc/testsuite/gcc.dg/debug/dwarf2-ctf-1.c          |    7 +
 gcc/toplev.c                                       |   18 +
 include/ChangeLog                                  |    8 +
 include/ctf.h                                      |  487 ++++++
 48 files changed, 4277 insertions(+), 6 deletions(-)
 create mode 100644 gcc/ctfcreate.c
 create mode 100644 gcc/ctfout.c
 create mode 100644 gcc/ctfout.h
 create mode 100644 gcc/ctfutils.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-anonymous-struct-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-anonymous-union-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-array-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-enum-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-float-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-forward-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-forward-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-function-pointers-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-functions-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-int-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-pointers-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-preamble-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-str-table-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-struct-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-struct-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-struct-array-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-typedef-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-typedef-struct-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-union-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-variables-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf.exp
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2-ctf-1.c
 create mode 100644 include/ctf.h

Comments

Indu Bhagat July 2, 2019, 5:54 p.m. UTC | #1
Ping.
Can someone please review these patches ? We would like to get the
support for CTF integrated soon.
Thanks
Indu

On Wed, Jun 26, 2019 at 11:38 PM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>
> Hello,
>
> This patch series adds support for CTF generation in GCC.
>
> [Changes from V2]
>  - Patch 1, 2, and 3 have minor edits if any.
>  - Patch 4 is a new addition.
>  - Patch 5 is a new addition.
>
> Summary of the GCC RFC V3 patch set :
> Patch 1, 2, and 3 do the preparatory work of adding the CTF command line options
> and setting up the framework for CTF generation and emission.  More details on
> these patches can be seen in the previous posting
> https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00718.html
>
> With Patch 4 in the current set, the compiler can generate a .ctf section for a
> single compilation unit if -gt (when unspecified, LEVEL defaults to 2) or -gt2
> is specified.  Recall that -gt2 produces type information for entities
> (functions, variables etc.) at file-scope or global-scope.
>
> For each translation unit, a CTF container (ctf_container_t) is used to
> keep the generated CTF.  Two hash_map structures are kept to hold the generated
> CTF for type and variables.  CTF does need pre-processing before emission into
> a section; there are code comments in ctfout.c to help understand this.
>
> There are a couple of TBDs and FIXMEs in Patch 4 which will be resolved as I
> progress further; Inputs on some of which will be very helpful :
>
> - ctf_dtdef_hash : The compiler uses a hashing scheme to keep track of whether
>   CTF has been generated for a type of decl.  For a type, the hashing scheme
>   uses TYPE_UID, but for a decl it uses htab_hash_pointer (decl).  Is there a
>   better way to do this ? (See hash_dtd_tree_decl in ctfout.c)
>
> - delete_ctf_container routine in ctfout.c : I have used the GTY (()) tags in
>   the CTF container structs.  Does this ensure that if I set the CTF container
>   global variable (ctfc) to NULL, the garbage collection machinery will take
>   care of cleaning up the the internals of the container (including hash_map).
>   Haven't been able to get a definitive answer looking at the code in
>   hash-map.h and the generated code in gtype-desc.c.
>
> Testing :
> - Bootstrapped and regression tested on x86_64/linux and aarch64/linux.
>   Also bootstrapped on SPARC64/linux with some testing.
> - Parsed .ctf sections of libdtrace-ctf files via a CTF dumping utility on
>   x86_64/linux.  This simply ensures that the CTF sections are well-formed.
> - Interaction with an internally available GDB looks promising.  Basic whatis
>   and ptype tests work.  GDB patches to uptake CTF debug info are in the works
>   and will be upstreamed soon.
>
> In the subsequent patches, I intend to close some open ends in the current
> patch and add LTO support.
>
> Thanks,
>
> Indu Bhagat (5):
>   Add new function lang_GNU_GIMPLE
>   Add CTF command line options : -gtLEVEL
>   Setup for CTF generation and emission
>   CTF generation for a single compilation unit
>   Update CTF testsuite
>
>  gcc/ChangeLog                                      |   91 +
>  gcc/Makefile.in                                    |    5 +
>  gcc/cgraphunit.c                                   |   12 +-
>  gcc/common.opt                                     |    9 +
>  gcc/ctfcreate.c                                    |  526 ++++++
>  gcc/ctfout.c                                       | 1739 ++++++++++++++++++++
>  gcc/ctfout.h                                       |  359 ++++
>  gcc/ctfutils.c                                     |  198 +++
>  gcc/doc/invoke.texi                                |   16 +
>  gcc/flag-types.h                                   |   13 +
>  gcc/gengtype.c                                     |    4 +-
>  gcc/langhooks.c                                    |    9 +
>  gcc/langhooks.h                                    |    1 +
>  gcc/opts.c                                         |   26 +
>  gcc/passes.c                                       |    7 +-
>  gcc/testsuite/ChangeLog                            |   35 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-1.c             |    6 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-2.c             |   10 +
>  .../gcc.dg/debug/ctf/ctf-anonymous-struct-1.c      |   23 +
>  .../gcc.dg/debug/ctf/ctf-anonymous-union-1.c       |   26 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-array-1.c       |   31 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-1.c   |   30 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-2.c   |   39 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-1.c   |   44 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-2.c   |   30 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-3.c   |   41 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-enum-1.c        |   21 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-float-1.c       |   16 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-forward-1.c     |   36 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-forward-2.c     |   16 +
>  .../gcc.dg/debug/ctf/ctf-function-pointers-1.c     |   24 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-functions-1.c   |   34 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-int-1.c         |   17 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-pointers-1.c    |   26 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-preamble-1.c    |   11 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-str-table-1.c   |   26 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-struct-1.c      |   25 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-struct-2.c      |   30 +
>  .../gcc.dg/debug/ctf/ctf-struct-array-1.c          |   36 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-typedef-1.c     |   23 +
>  .../gcc.dg/debug/ctf/ctf-typedef-struct-1.c        |   12 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-union-1.c       |   14 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-variables-1.c   |   25 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf.exp             |   41 +
>  gcc/testsuite/gcc.dg/debug/dwarf2-ctf-1.c          |    7 +
>  gcc/toplev.c                                       |   18 +
>  include/ChangeLog                                  |    8 +
>  include/ctf.h                                      |  487 ++++++
>  48 files changed, 4277 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/ctfcreate.c
>  create mode 100644 gcc/ctfout.c
>  create mode 100644 gcc/ctfout.h
>  create mode 100644 gcc/ctfutils.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-anonymous-struct-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-anonymous-union-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-array-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-enum-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-float-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-forward-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-forward-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-function-pointers-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-functions-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-int-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-pointers-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-preamble-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-str-table-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-struct-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-struct-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-struct-array-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-typedef-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-typedef-struct-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-union-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-variables-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf.exp
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2-ctf-1.c
>  create mode 100644 include/ctf.h
>
> --
> 1.8.3.1
>
Jeff Law July 3, 2019, 3:18 a.m. UTC | #2
On 7/2/19 11:54 AM, Indu Bhagat wrote:
> Ping.
> Can someone please review these patches ? We would like to get the
> support for CTF integrated soon.
I'm not sure there's really even consensus that we want CTF support in
GCC.  Though I think that the changes you've made in the last several
weeks do make it somewhat more palatable.  But ultimately the first step
is to get that consensus.

I'd hazard a guess that Jakub in particular isn't on board as he's been
pushing to some degree for post-processing or perhaps doing it via a
plug in.

Richi has been guiding you a bit through how to make the changes easier
to integrate, but I haven't seen him state one way or the other his
preference on whether or not CTF support is something we want.

I'm hesitant to add CTF support in GCC, but can understand how it might
be useful given the kernel's aversion to everything dwarf.  But if the
kernel is the primary consumer than I'd lean towards post-processing.

Jeff
Richard Biener July 3, 2019, 12:31 p.m. UTC | #3
On Wed, Jul 3, 2019 at 5:18 AM Jeff Law <law@redhat.com> wrote:
>
> On 7/2/19 11:54 AM, Indu Bhagat wrote:
> > Ping.
> > Can someone please review these patches ? We would like to get the
> > support for CTF integrated soon.
> I'm not sure there's really even consensus that we want CTF support in
> GCC.  Though I think that the changes you've made in the last several
> weeks do make it somewhat more palatable.  But ultimately the first step
> is to get that consensus.
>
> I'd hazard a guess that Jakub in particular isn't on board as he's been
> pushing to some degree for post-processing or perhaps doing it via a
> plug in.
>
> Richi has been guiding you a bit through how to make the changes easier
> to integrate, but I haven't seen him state one way or the other his
> preference on whether or not CTF support is something we want.

I'm mostly worried about the lack of a specification and the appearant
restriction on a subset of C (the patches have gcc_unreachable ()
in paths that can be reached by VECTOR_TYPE or COMPLEX_TYPE
not to mention FIXED_POINT_TYPE, etc...).

While CTF might be easy and fast to parse and small I fear it will
go the STABS way of being not extensible and bitrotten.

Given it appears to generate only debug info for symbols and no locations
or whatnot it should be sufficient to introspect the compilation to generate
the CTF info on the side and then merge it in at link-time.  Which makes
me wonder if this shouldn't be a plugin for now until it is more complete
and can be evaluated better (comments in the patches indicate even the
on-disk format is in flux?).  Adding plugin hook invocations to the three
places the CTF info generation hooks off should be easy.

That said, the patch series isn't ready for integration since it will
crash left and right -- did you bootstrap and run the testsuite
with -gt?

Richard.

> I'm hesitant to add CTF support in GCC, but can understand how it might
> be useful given the kernel's aversion to everything dwarf.  But if the
> kernel is the primary consumer than I'd lean towards post-processing.
>
> Jeff
>
Indu Bhagat July 4, 2019, 12:10 a.m. UTC | #4
On 07/02/2019 08:18 PM, Jeff Law wrote:
> On 7/2/19 11:54 AM, Indu Bhagat wrote:
>> Ping.
>> Can someone please review these patches ? We would like to get the
>> support for CTF integrated soon.
> I'm not sure there's really even consensus that we want CTF support in
> GCC.  Though I think that the changes you've made in the last several
> weeks do make it somewhat more palatable.  But ultimately the first step
> is to get that consensus.

Thanks for your message.  Absolutely, consensus is the first step.  We are
happy to take all the constructive feedback and answer all the concerns to make
certain that CTF support in toolchain will be a useful and worthwhile
contribution.

>
> I'd hazard a guess that Jakub in particular isn't on board as he's been
> pushing to some degree for post-processing or perhaps doing it via a
> plug in.
>
> Richi has been guiding you a bit through how to make the changes easier
> to integrate, but I haven't seen him state one way or the other his
> preference on whether or not CTF support is something we want.
>
> I'm hesitant to add CTF support in GCC, but can understand how it might
> be useful given the kernel's aversion to everything dwarf.  But if the
> kernel is the primary consumer than I'd lean towards post-processing.
>
Kernel is just *one* of the consumers. There are other applications, external
and internal to Oracle, that have shown interest. Not just that, a couple of
distro and package maintainers have shown interest in enabling CTF by default.

Post-processing in kernel and other internally available large applications has
been a deterrent for adoption because of high space and compile-time costs. I
answered some of Jakub's concerns in the post here
https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00131.html.

I would even argue that the usecases will only grow if CTF is properly
supported in the toolchain.

Thanks
Indu Bhagat July 4, 2019, 12:47 a.m. UTC | #5
On 07/03/2019 05:31 AM, Richard Biener wrote:
> On Wed, Jul 3, 2019 at 5:18 AM Jeff Law <law@redhat.com> wrote:
>> On 7/2/19 11:54 AM, Indu Bhagat wrote:
>>> Ping.
>>> Can someone please review these patches ? We would like to get the
>>> support for CTF integrated soon.
>> I'm not sure there's really even consensus that we want CTF support in
>> GCC.  Though I think that the changes you've made in the last several
>> weeks do make it somewhat more palatable.  But ultimately the first step
>> is to get that consensus.
>>
>> I'd hazard a guess that Jakub in particular isn't on board as he's been
>> pushing to some degree for post-processing or perhaps doing it via a
>> plug in.
>>
>> Richi has been guiding you a bit through how to make the changes easier
>> to integrate, but I haven't seen him state one way or the other his
>> preference on whether or not CTF support is something we want.
> I'm mostly worried about the lack of a specification and the appearant
> restriction on a subset of C (the patches have gcc_unreachable ()
> in paths that can be reached by VECTOR_TYPE or COMPLEX_TYPE
> not to mention FIXED_POINT_TYPE, etc...).

RE lack of specification : I cannot agree more; This does need to absolutely exist
if we envision CTF support in toolchain to be useful to the community.
We plan on getting to this task once the Linker changes are scoped and closer
to done (~ a couple of weeks from now). Will this work ?

RE subset of C : It is true that CTF format currently does leave out a very
small subset of C like FIXED_POINT as you noted ( CTF does have representation
for COMPLEX_TYPE, if my code paths culminate to gcc_unreachable () for that, I
should fix them ).  The end goal is to make it support all of C, and not just a
subset.

Meanwhile, I intend to make the compiler skip types when a C construct is not
supported instead of crashing because of gcc_unreachable (). (You may have also
noted stubs with "TBD WARN instead" notes in the patch series I sent.)	


>
> While CTF might be easy and fast to parse and small I fear it will
> go the STABS way of being not extensible and bitrotten.

FWIW, I can understand this. We will maintain it. And I hope it will also be a
community effort thereafter with active consumers, so there is a positive
feedback loop.

>
> Given it appears to generate only debug info for symbols and no locations
> or whatnot it should be sufficient to introspect the compilation to generate
> the CTF info on the side and then merge it in at link-time.  Which makes
> me wonder if this shouldn't be a plugin for now until it is more complete
> and can be evaluated better (comments in the patches indicate even the
> on-disk format is in flux?).  Adding plugin hook invocations to the three
> places the CTF info generation hooks off should be easy.

Yes, some bits of the on-disk format are being adapted to make it easier to
adopt the CTF format across the board. E.g., we recently added CU name in the
CTF header. As another example, we added CTF_K_SLICE type because there existed
no way in CTF to represent enum bitfields. For the most part though, CTF format
has stayed as is.

Hmm...a GCC plugin for CTF generation at compile-time may work out for a single
compilation unit.  But I am not sure how will LTO be supported in that case.
Basically, for LTO and -gtLEVEL to work together, I need the lto-wrapper to be
aware of the presence of .ctf sections (so I think). I will need to combine the
.ctf sections from multiple compilation units into a CTF archive, which the
linker can then de-duplicate.

Even if I assume that the technical hurdle in the above paragraph is solvable
within the purview of a plugin, I fear worse problems of adoption, maintenance
and distribution in the long run, if CTF support unfortunately ever remains to be
done via a plugin for reasons unforeseen.

Going the plugin route for the short term, will continue to suffer similar
problems of distribution and support.

- Is the plugin infrastructure supported on most platforms ? Also, I see that
   the plugin infrastructure supports all gcc versions from 4.5 onwards.
   Can someone confirm ? ( We minimally want the toolchain support with
   GCC 4.8.5 and GCC 8 and later, for now. )

- How will the plugin be distributed for a variety of platforms and
   architectures outside of what Oracle Linux commits to support ?

   Unless you are suggesting that the GCC plugin be distributed within GCC,
   meanwhile ? Well, that may be acceptable in the short term, depending on how
   I resolve some points raised above.


>
> That said, the patch series isn't ready for integration since it will
> crash left and right -- did you bootstrap and run the testsuite
> with -gt?
>
>
Bootstrap and Testsuite : Yes, I have.  On x86_64/linux, sparc64/linux,
                           aarch64/linux.
Run testsuite with -gt : Not yet. Believe me, it's on my plate. And I already
                          regret not having done it sooner :)
Bootstrap with -gt : Not yet. I should try soon.

(I have compiled libdtrace-ctf with -gt and parsed the .ctf sections with the
patch set.)

About the patch being not ready for integration : Yes, you're right.
That's why I chose to retain 'RFC' for this patch series as well. I am working
on issues, testing the compiler, and closing on the open ends in the
implementation.

I will refresh the patch series when I have made a meaningful stride ahead. Any
further suggestions on functional/performance testing will be helpful too.

Thanks again for your reviews.

Indu
Richard Biener July 4, 2019, 10:43 a.m. UTC | #6
On Thu, Jul 4, 2019 at 2:36 AM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>
>
> On 07/03/2019 05:31 AM, Richard Biener wrote:
> > On Wed, Jul 3, 2019 at 5:18 AM Jeff Law <law@redhat.com> wrote:
> >> On 7/2/19 11:54 AM, Indu Bhagat wrote:
> >>> Ping.
> >>> Can someone please review these patches ? We would like to get the
> >>> support for CTF integrated soon.
> >> I'm not sure there's really even consensus that we want CTF support in
> >> GCC.  Though I think that the changes you've made in the last several
> >> weeks do make it somewhat more palatable.  But ultimately the first step
> >> is to get that consensus.
> >>
> >> I'd hazard a guess that Jakub in particular isn't on board as he's been
> >> pushing to some degree for post-processing or perhaps doing it via a
> >> plug in.
> >>
> >> Richi has been guiding you a bit through how to make the changes easier
> >> to integrate, but I haven't seen him state one way or the other his
> >> preference on whether or not CTF support is something we want.
> > I'm mostly worried about the lack of a specification and the appearant
> > restriction on a subset of C (the patches have gcc_unreachable ()
> > in paths that can be reached by VECTOR_TYPE or COMPLEX_TYPE
> > not to mention FIXED_POINT_TYPE, etc...).
>
> RE lack of specification : I cannot agree more; This does need to absolutely exist
> if we envision CTF support in toolchain to be useful to the community.
> We plan on getting to this task once the Linker changes are scoped and closer
> to done (~ a couple of weeks from now). Will this work ?

Sure - just keep in mind that it's difficult to give feedback to
something without
a specification.

> RE subset of C : It is true that CTF format currently does leave out a very
> small subset of C like FIXED_POINT as you noted ( CTF does have representation
> for COMPLEX_TYPE, if my code paths culminate to gcc_unreachable () for that, I
> should fix them ).  The end goal is to make it support all of C, and not just a
> subset.

What about other languages?  GCC supports C++, Ada, Objective-C, Go, D,
Fortran, Modula-2, BRIG (this list is not necessarily complete and may change
in the future).

> Meanwhile, I intend to make the compiler skip types when a C construct is not
> supported instead of crashing because of gcc_unreachable (). (You may have also
> noted stubs with "TBD WARN instead" notes in the patch series I sent.)
>
>
> >
> > While CTF might be easy and fast to parse and small I fear it will
> > go the STABS way of being not extensible and bitrotten.
>
> FWIW, I can understand this. We will maintain it. And I hope it will also be a
> community effort thereafter with active consumers, so there is a positive
> feedback loop.
>
> >
> > Given it appears to generate only debug info for symbols and no locations
> > or whatnot it should be sufficient to introspect the compilation to generate
> > the CTF info on the side and then merge it in at link-time.  Which makes
> > me wonder if this shouldn't be a plugin for now until it is more complete
> > and can be evaluated better (comments in the patches indicate even the
> > on-disk format is in flux?).  Adding plugin hook invocations to the three
> > places the CTF info generation hooks off should be easy.
>
> Yes, some bits of the on-disk format are being adapted to make it easier to
> adopt the CTF format across the board. E.g., we recently added CU name in the
> CTF header. As another example, we added CTF_K_SLICE type because there existed
> no way in CTF to represent enum bitfields. For the most part though, CTF format
> has stayed as is.

I hope the format is versioned at least.

> Hmm...a GCC plugin for CTF generation at compile-time may work out for a single
> compilation unit.  But I am not sure how will LTO be supported in that case.
> Basically, for LTO and -gtLEVEL to work together, I need the lto-wrapper to be
> aware of the presence of .ctf sections (so I think). I will need to combine the
> .ctf sections from multiple compilation units into a CTF archive, which the
> linker can then de-duplicate.

True.  lto-wrapper does this kind of dancing for the much more complex set of
DWARF sections already.

> Even if I assume that the technical hurdle in the above paragraph is solvable
> within the purview of a plugin, I fear worse problems of adoption, maintenance
> and distribution in the long run, if CTF support unfortunately ever remains to be
> done via a plugin for reasons unforeseen.
>
> Going the plugin route for the short term, will continue to suffer similar
> problems of distribution and support.
>
> - Is the plugin infrastructure supported on most platforms ? Also, I see that
>    the plugin infrastructure supports all gcc versions from 4.5 onwards.
>    Can someone confirm ? ( We minimally want the toolchain support with
>    GCC 4.8.5 and GCC 8 and later, for now. )

The infrastructure is quite old but you'd need new invocation hooks so this
won't help.

> - How will the plugin be distributed for a variety of platforms and
>    architectures outside of what Oracle Linux commits to support ?
>
>    Unless you are suggesting that the GCC plugin be distributed within GCC,
>    meanwhile ? Well, that may be acceptable in the short term, depending on how
>    I resolve some points raised above.
>
> >
> > That said, the patch series isn't ready for integration since it will
> > crash left and right -- did you bootstrap and run the testsuite
> > with -gt?
> >
> >
> Bootstrap and Testsuite : Yes, I have.  On x86_64/linux, sparc64/linux,
>                            aarch64/linux.
> Run testsuite with -gt : Not yet. Believe me, it's on my plate. And I already
>                           regret not having done it sooner :)
> Bootstrap with -gt : Not yet. I should try soon.
>
> (I have compiled libdtrace-ctf with -gt and parsed the .ctf sections with the
> patch set.)
>
> About the patch being not ready for integration : Yes, you're right.
> That's why I chose to retain 'RFC' for this patch series as well. I am working
> on issues, testing the compiler, and closing on the open ends in the
> implementation.
>
> I will refresh the patch series when I have made a meaningful stride ahead. Any
> further suggestions on functional/performance testing will be helpful too.

What's the functional use of CTF?  Print nice backtraces (without showing
function argument values)?

Richard.

> Thanks again for your reviews.
>
> Indu
>
Indu Bhagat July 4, 2019, 10:30 p.m. UTC | #7
On 07/04/2019 03:43 AM, Richard Biener wrote:
> On Thu, Jul 4, 2019 at 2:36 AM Indu Bhagat<indu.bhagat@oracle.com>  wrote:
>> [...]
>> RE subset of C : It is true that CTF format currently does leave out a very
>> small subset of C like FIXED_POINT as you noted ( CTF does have representation
>> for COMPLEX_TYPE, if my code paths culminate to gcc_unreachable () for that, I
>> should fix them ).  The end goal is to make it support all of C, and not just a
>> subset.
> What about other languages?  GCC supports C++, Ada, Objective-C, Go, D,
> Fortran, Modula-2, BRIG (this list is not necessarily complete and may change
> in the future).

The format supports C only at this time. Other languages are not on the radar
yet. However, we have no intrinsic objection to them. Although, languages
that already have fully-fledged type introspection and interpreted/
managed languages are probably out of scope, since they already have
what CTF provides.

>
>>
>>> Given it appears to generate only debug info for symbols and no locations
>>> or whatnot it should be sufficient to introspect the compilation to generate
>>> the CTF info on the side and then merge it in at link-time.  Which makes
>>> me wonder if this shouldn't be a plugin for now until it is more complete
>>> and can be evaluated better (comments in the patches indicate even the
>>> on-disk format is in flux?).  Adding plugin hook invocations to the three
>>> places the CTF info generation hooks off should be easy.
>> Yes, some bits of the on-disk format are being adapted to make it easier to
>> adopt the CTF format across the board. E.g., we recently added CU name in the
>> CTF header. As another example, we added CTF_K_SLICE type because there existed
>> no way in CTF to represent enum bitfields. For the most part though, CTF format
>> has stayed as is.
> I hope the format is versioned at least.

Yes, the format is versioned. The current version is CTF_VERSION_3.  All these
format changes I talked about above are a part of CTF_VERSION_3.

libctf handles backward compatibility for users of CTF in the toolchain; all
transparently to the user. This means that, in future, when CTF version needs
to be bumped, libctf will either support older version and/or transparently
upgrade to the new version for further consumers.

It also means that the compiler does not always need to change merely because
the format has changed: (depending on the change) the linker can transparently
adjust, as will all consumers if they try to read unlinked object files.

>
>>> That said, the patch series isn't ready for integration since it will
>>> crash left and right -- did you bootstrap and run the testsuite
>>> with -gt?
>>>
>>>
>> Bootstrap and Testsuite : Yes, I have.  On x86_64/linux, sparc64/linux,
>>                             aarch64/linux.
>> Run testsuite with -gt : Not yet. Believe me, it's on my plate. And I already
>>                            regret not having done it sooner :)
>> Bootstrap with -gt : Not yet. I should try soon.
>>
>> (I have compiled libdtrace-ctf with -gt and parsed the .ctf sections with the
>> patch set.)
>>
>> About the patch being not ready for integration : Yes, you're right.
>> That's why I chose to retain 'RFC' for this patch series as well. I am working
>> on issues, testing the compiler, and closing on the open ends in the
>> implementation.
>>
>> I will refresh the patch series when I have made a meaningful stride ahead. Any
>> further suggestions on functional/performance testing will be helpful too.
> What's the functional use of CTF?  Print nice backtraces (without showing
> function argument values)?
>
CTF, at this time, is type information for entities at global or file scope.
This can be used by online debuggers, program tracers (dynamic tracing); More
generally, it provides type introspection for C programs, with an optional
library API to allow them to get at their own types quite more easily than
DWARF. So, the umbrella usecases are - all C programs that want to introspect
their own types quickly; and applications that want to introspect other
programs's types quickly.

(Even with the exception of its embedded string table, it is already small
  enough to  be kept around in stripped binaries so that it can be relied upon
  to be present.)

We are also extending the format so it is useful for other on-line debugging
tools, such as backtracers.

Indu
Richard Biener July 5, 2019, 11:16 a.m. UTC | #8
On Fri, Jul 5, 2019 at 12:21 AM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>
> On 07/04/2019 03:43 AM, Richard Biener wrote:
>
> On Thu, Jul 4, 2019 at 2:36 AM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>
> [...]
>
> RE subset of C : It is true that CTF format currently does leave out a very
> small subset of C like FIXED_POINT as you noted ( CTF does have representation
> for COMPLEX_TYPE, if my code paths culminate to gcc_unreachable () for that, I
> should fix them ).  The end goal is to make it support all of C, and not just a
> subset.
>
> What about other languages?  GCC supports C++, Ada, Objective-C, Go, D,
> Fortran, Modula-2, BRIG (this list is not necessarily complete and may change
> in the future).
>
> The format supports C only at this time. Other languages are not on the radar
> yet. However, we have no intrinsic objection to them. Although, languages
> that already have fully-fledged type introspection and interpreted/
> managed languages are probably out of scope, since they already have
> what CTF provides.
>
>
>
> Given it appears to generate only debug info for symbols and no locations
> or whatnot it should be sufficient to introspect the compilation to generate
> the CTF info on the side and then merge it in at link-time.  Which makes
> me wonder if this shouldn't be a plugin for now until it is more complete
> and can be evaluated better (comments in the patches indicate even the
> on-disk format is in flux?).  Adding plugin hook invocations to the three
> places the CTF info generation hooks off should be easy.
>
> Yes, some bits of the on-disk format are being adapted to make it easier to
> adopt the CTF format across the board. E.g., we recently added CU name in the
> CTF header. As another example, we added CTF_K_SLICE type because there existed
> no way in CTF to represent enum bitfields. For the most part though, CTF format
> has stayed as is.
>
> I hope the format is versioned at least.
>
> Yes, the format is versioned. The current version is CTF_VERSION_3.  All these
> format changes I talked about above are a part of CTF_VERSION_3.
>
> libctf handles backward compatibility for users of CTF in the toolchain; all
> transparently to the user. This means that, in future, when CTF version needs
> to be bumped, libctf will either support older version and/or transparently
> upgrade to the new version for further consumers.
>
> It also means that the compiler does not always need to change merely because
> the format has changed: (depending on the change) the linker can transparently
> adjust, as will all consumers if they try to read unlinked object files.
>
>
> That said, the patch series isn't ready for integration since it will
> crash left and right -- did you bootstrap and run the testsuite
> with -gt?
>
>
> Bootstrap and Testsuite : Yes, I have.  On x86_64/linux, sparc64/linux,
>                            aarch64/linux.
> Run testsuite with -gt : Not yet. Believe me, it's on my plate. And I already
>                           regret not having done it sooner :)
> Bootstrap with -gt : Not yet. I should try soon.
>
> (I have compiled libdtrace-ctf with -gt and parsed the .ctf sections with the
> patch set.)
>
> About the patch being not ready for integration : Yes, you're right.
> That's why I chose to retain 'RFC' for this patch series as well. I am working
> on issues, testing the compiler, and closing on the open ends in the
> implementation.
>
> I will refresh the patch series when I have made a meaningful stride ahead. Any
> further suggestions on functional/performance testing will be helpful too.
>
> What's the functional use of CTF?  Print nice backtraces (without showing
> function argument values)?
>
> CTF, at this time, is type information for entities at global or file scope.
> This can be used by online debuggers, program tracers (dynamic tracing); More
> generally, it provides type introspection for C programs, with an optional
> library API to allow them to get at their own types quite more easily than
> DWARF. So, the umbrella usecases are - all C programs that want to introspect
> their own types quickly; and applications that want to introspect other
> programs's types quickly.

What makes it superior to DWARF stripped down to the above feature set?

> (Even with the exception of its embedded string table, it is already small
>  enough to  be kept around in stripped binaries so that it can be relied upon
>  to be present.)

So for distributing a program/library for SUSE we usually split the
distribution into two pieces - the binaries and separated debug information.
With CTF we'd then create both, continue stripping out the DWARF information
but keep the CTF in the binaries?

When a program contains CTF only, can gdb do anything to help debugging
of a running program or a core file?  Do you have gdb support in the works?

> We are also extending the format so it is useful for other on-line debugging
> tools, such as backtracers.

So you become more complex similar to DWARF?

Richard.

>
> Indu
Nix July 5, 2019, 6:28 p.m. UTC | #9
On 5 Jul 2019, Richard Biener said:

> On Fri, Jul 5, 2019 at 12:21 AM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>> CTF, at this time, is type information for entities at global or file scope.
>> This can be used by online debuggers, program tracers (dynamic tracing); More
>> generally, it provides type introspection for C programs, with an optional
>> library API to allow them to get at their own types quite more easily than
>> DWARF. So, the umbrella usecases are - all C programs that want to introspect
>> their own types quickly; and applications that want to introspect other
>> programs's types quickly.
>
> What makes it superior to DWARF stripped down to the above feature set?

Increased compactness. DWARF fundamentally trades off compactness in
favour of its regular structure, which makes it easier to parse (but not
easier to interpret) but very hard to make it much smaller than it is
now. Where DWARF uses word-sized and larger entities for everything, CTF
packs everything much more tightly -- and this is quite visible in the
resulting file sizes, once deduplicated. (CTF for the entire Linux
kernel is about 6MiB after gzipping, and that includes not only complete
descriptions of its tens of thousands of types but also type and string
table entries for every structure and union member name, every
enumeration member, and every global variable. More conventional
programs will be able to eschew spending space on some of these because
the ELF string table already contains their names, and we reuse those
where possible. Insofar as it is possible to tell, the DWARF type info
for the entire kernel, even after deduplication, would be many times
larger: it is certainly much larger as it comes out of the compiler. You
could define a "restricted DWARF" with smaller tags etc that is smaller,
but frankly that would no longer be DWARF at all.)

(I'm using the kernel as an example a lot not because CTF is
kernel-specific but because our *existing deduplicator* happens to be
targetted at the kernel. This is already an annoying limitation: we want
to be able to use CTF in userspace more easily and more widely, without
kludges and without incurring huge costs to generate gigabytes of DWARF
we otherwise aren't using: hence this project.)

When programs try to consume DWARF from large programs the size of the
kernel, even with indexes I observe a multi-second lag and significant
memory usage: no program I have tried has increased its RSS by less than
100MiB. CTF consumers can suck in the CTF for the core kernel in well
under a third of a second, and can traverse the CTF for the kernel and
all modules (multiple CTF sections in an archive, sharing common types
wiht a parent section) in about a second and a half (from a cold cache):
RSS goes up by about 15MiB. If DWARF usage can impose a burden that low
on consumers, it's the first I've ever heard of it.

>> (Even with the exception of its embedded string table, it is already small
>>  enough to  be kept around in stripped binaries so that it can be relied upon
>>  to be present.)
>
> So for distributing a program/library for SUSE we usually split the
> distribution into two pieces - the binaries and separated debug information.
> With CTF we'd then create both, continue stripping out the DWARF information
> but keep the CTF in the binaries?
>
> When a program contains CTF only, can gdb do anything to help debugging
> of a running program or a core file?  Do you have gdb support in the works?

Yes, and it works well enough already to extract types from programs
(going all the way from symbols to types requires some code on the GCC
and linker side that is being written right now, and we can't test the
GDB code that relies on that until then: equally, I'm still working on
the linker so this is a demo on a randomly-chosen object file. This also
means you don't see any benefits from strtab reuse with the containing
ELF object, CTF section compression or deduplication in the following
example's .ctf section size):

[nix@ca-tools3 libiberty]$ /home/ibhagat/GCC/install/gcc-ctf/bin/gcc -c -DHAVE_CONFIG_H -gt -O2  -I. -I../../libiberty/../include  -W -Wall -Wwrite-strings -Wc++-compat -Wstrict-prototypes -Wshadow=local -pedantic  -D_GNU_SOURCE ../../libiberty/hashtab.c -o hashtab.o

[nix@ca-tools3 libiberty]$ size -A hashtab.o
hashtab.o  :
section            size   addr
.text              4112      0
.data                16      0
.bss                  0      0
.ctf              11907      0
.rodata.str1.8       40      0
.rodata.cst8          8      0
.rodata             480      0
.comment             43      0
.note.GNU-stack       0      0
Total             16606

[nix@ca-tools3 libiberty]$ ../gdb/gdb hashtab.o 
GNU gdb (GDB) 8.2.50.20190214-git
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "sparc64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from hashtab.o...
(gdb) info types
All defined types:

File /home/nix/binutils-gdb/foo/libiberty/hashtab.o:
	struct {
    unsigned char __arr[8];
};
	typedef struct <unknown> FILE;
	typedef struct {
    long __pos;
    struct {...} __state;
} _G_fpos64_t;
	typedef struct {
    long __pos;
    struct {...} __state;
} _G_fpos_t;
	typedef short _G_int16_t;
	typedef int _G_int32_t;
	typedef unsigned short _G_uint16_t;
	typedef unsigned int _G_uint32_t;
	typedef struct <unknown> _IO_FILE;
	typedef struct {
    long (*read)();
    long (*write)();
    int (*seek)();
    int (*close)();
} _IO_cookie_io_functions_t;
	typedef void _IO_lock_t;
	typedef struct <unknown> __FILE;
	typedef struct {
    unsigned char __arr[2];
} __STRING2_COPY_ARR2;
	typedef struct {
    unsigned char __arr[3];
} __STRING2_COPY_ARR3;
	typedef struct {
    unsigned char __arr[4];
} __STRING2_COPY_ARR4;
	typedef struct {
    unsigned char __arr[5];
} __STRING2_COPY_ARR5;
	typedef struct {
    unsigned char __arr[6];
} __STRING2_COPY_ARR6;
	typedef struct {
    unsigned char __arr[7];
} __STRING2_COPY_ARR7;
	typedef struct {
    unsigned char __arr[8];
} __STRING2_COPY_ARR8;
	typedef union {
    union wait *__uptr;
    int *__iptr;
} __WAIT_STATUS;
	typedef long __blkcnt64_t;
	typedef long __blkcnt_t;
	typedef long __blksize_t;
	typedef char * __caddr_t;
	typedef long __clock_t;
	typedef int __clockid_t;
	typedef int (*)() __compar_d_fn_t;
	typedef int (*)() __compar_fn_t;
	typedef int __daddr_t;
	typedef unsigned long __dev_t;
	typedef long __fd_mask;
	typedef unsigned long __fsblkcnt64_t;
	typedef unsigned long __fsblkcnt_t;
	typedef unsigned long __fsfilcnt64_t;
	typedef unsigned long __fsfilcnt_t;
	typedef struct {
    int __val[2];
} __fsid_t;
	typedef unsigned int __gid_t;
	typedef void * __gnuc_va_list;
	typedef int __gwchar_t;
	typedef unsigned int __id_t;
	typedef unsigned long __ino64_t;
[... and on and on...]

gdb support, like everything other than GCC, uses the libctf library in
the binutils-gdb repo, which will soon enough be made a public,
versioned shared library so that other consumers can pitch in (I just
don't want to do that before the linker changes are upstreamed).

>> We are also extending the format so it is useful for other on-line debugging
>> tools, such as backtracers.
>
> So you become more complex similar to DWARF?

Simplicity of types is not the goal. Compactness is the goal, and ease
of parsing by end users once the format itself has been decoded (so
nothing like the exprloc interpreter exists). We have simple data
structures, sure, but they are not regular: rather they are tuned for
the type system they are describing, and in some cases tuned further to
maximize compactness for types that are more likely to be referenced
often or occur frequently and types in the majority of non-huge programs
(types used by many other types, etc).

As an example (a lengthy one, sorry!), types themselves have two
overlapping core representations shared by all types, with a sentinel
indicating which is in use for any given type:

typedef struct ctf_stype
{
  uint32_t ctt_name;		/* Reference to name in string table.  */
  uint32_t ctt_info;		/* Encoded kind, variant length (see below).  */
  union
  {
    uint32_t ctt_size;		/* Size of entire type in bytes.  */
    uint32_t ctt_type;		/* Reference to another type.  */
  };
} ctf_stype_t;

typedef struct ctf_type
{
  uint32_t ctt_name;		/* Reference to name in string table.  */
  uint32_t ctt_info;		/* Encoded kind, variant length.  */
  union
  {
    uint32_t ctt_size;		/* Always CTF_LSIZE_SENT.  */
    uint32_t ctt_type;		/* Do not use.  */
  };
  uint32_t ctt_lsizehi;		/* High 32 bits of type size in bytes.  */
  uint32_t ctt_lsizelo;		/* Low 32 bits of type size in bytes.  */
} ctf_type_t;

So the (very rare!) huge types pay the space in the type vector for a
64-bit type word (without requiring all users to have a uint64_t type):
smaller types do not pay.

You might say types so huge are so rare that this adds nothing -- but a
future format extension planned well before the end of this year will
add *another* layer to this, giving us three core representations, and
the third one is notably smaller:

typedef struct ctf_ttype
{
  uint32_t ctt_name;            /* Reference to name in string table.  */
  uint16_t ctt_info;            /* Encoded kind, variant length.  */
  union
  {
    uint16_t ctt_size;          /* Size of entire type in bytes.  */
    uint16_t ctt_type;          /* Reference to another type.  */
  };
} ctf_ttype_t;

(there is another sentinel hiding inside ctt_info to indicate when a
type is represented using one of these). The compiler will not need to
adapt to any of this because libctf will transparently upgrade the older
format into the newer one at link time. The compiler only needs to
change if the format becomes more expressive -- e.g. when support for
the GNU C types you mentioned is added.

This change will allow "smaller" programs (the majority of C programs)
to encode types in only eight bytes per type plus similarly compact
per-type variable-length data for things like structure members, down
from twelve bytes now, and I can probably shrink it further, down to six
bytes per type. Obviously not all types can be this compact: things like
complex types fall back to the larger form, as do huge types and types
that reference types with high IDs. But DWARF needs really quite a lot
more, even for simple types, and there can be many thousands of them.

Structure and union members use similar trickery: as a result of all
this, even now, our biggest space consumer is the strtab giving the
names of the structure members! The backtrace section, when it is
designed, will follow a similar philosophy.

Surprisingly, this sort of bit-shaving actually saves significant space
even when the section is compressed: it seems Huffman dictionaries can't
always elide small runs of high-byte zeroes...
Jakub Jelinek July 5, 2019, 7:11 p.m. UTC | #10
On Fri, Jul 05, 2019 at 07:28:12PM +0100, Nix wrote:
> > What makes it superior to DWARF stripped down to the above feature set?
> 
> Increased compactness. DWARF fundamentally trades off compactness in
> favour of its regular structure, which makes it easier to parse (but not
> easier to interpret) but very hard to make it much smaller than it is
> now. Where DWARF uses word-sized and larger entities for everything, CTF
> packs everything much more tightly -- and this is quite visible in the

That is just not true, most of the data in DWARF are uleb128/sleb128
encoded, or often even present just in the abbreviation table
(DW_FORM_flag_present, DW_FORM_implicit_const), word-sized is typically only
stuff that needs relocations (at assembly time and more importantly at link
time).

> could define a "restricted DWARF" with smaller tags etc that is smaller,
> but frankly that would no longer be DWARF at all.)

You can certainly decide what kind of information you emit and what you
don't, it is done already (look at -g1 vs. -g2 vs. -g3), and can be done for
other stuff, say if you aren't interested in any locations,
DW_AT_{decl,call}_{file,line,column} can be omitted, if you aren't
interested in debug info within functions, a lot of debug info can be
emitted etc.  And it still will be DWARF.
For DWARF you also have various techniques at reducing size and optimizing
redundancies away, look e.g. at the dwz utility.

	Jakub
Nix July 8, 2019, 2:08 p.m. UTC | #11
On 5 Jul 2019, Jakub Jelinek outgrape:

> On Fri, Jul 05, 2019 at 07:28:12PM +0100, Nix wrote:
>> > What makes it superior to DWARF stripped down to the above feature set?
>> 
>> Increased compactness. DWARF fundamentally trades off compactness in
>> favour of its regular structure, which makes it easier to parse (but not
>> easier to interpret) but very hard to make it much smaller than it is
>> now. Where DWARF uses word-sized and larger entities for everything, CTF
>> packs everything much more tightly -- and this is quite visible in the
>
> That is just not true, most of the data in DWARF are uleb128/sleb128
> encoded, or often even present just in the abbreviation table
> (DW_FORM_flag_present, DW_FORM_implicit_const), word-sized is typically only
> stuff that needs relocations (at assembly time and more importantly at link
> time).

Hm. I may have misread the spec.

The fact remains that DWARF is (in typical usage) both large and slow to
use: it is not entirely untrue to say that you can spot a DWARF consumer
because it takes ten seconds to start up. This may be something that can
be avoided with sufficiently clever implementations, but I've never seen
any such implementation and we don't appear to be approaching one
terribly fast :( meanwhile, in CTF we already have a working system that
can reduce multigigabyte DWARF input down to 6MiB of compressed CTF
loading in fractions of a second, though it is true that not all of that
input was global-scope type info, so a large portion of that
multigigabyte input would simply have been dropped and should not be
considered relevant. I'm not sure how to determine how much of the input
is type DIEs at global scope... (The 6MiB figure is slightly misleading,
too, since only 1439845 bytes of that is type data: the rest is mostly
compressed string table.)

Possibly sufficiently clever deduplication can do a similar scrunching
job for DWARF, but I note that what DWARF deduplication GCC did in
earlier releases has subsequently been removed because it never really
worked very well. (Having written code that deduplicates DWARF, I can
see why: it's a complex job when you just have C to think about. Doing
it for C++ as well must have made people's brains dribble out of their
ears).

Type signatures in DWARF 4 were supposed to provide this sort of thing,
too, but yet again the promise does not seem to have been borne out:
DWARF debuginfo remains immense and there is no discussion of leaving
unstripped binaries on production systems for the sake of continuous
tracing tools or introspection, because the debuginfo in those binaries
would still be many times the size of the binaries they relate to, and
obviously leaving it unstripped in that case is ridiculous. Meanwhile,
FreeBSD has a leg-up in continuous debugging because they generate (an
older form of) CTF for everything and deduplicate it, and it's small
enough that they can leave it linked into the binaries rather than
stripping it out, and tracers can and do use it. I'm trying to give us
all that advantage, while not leaving us tied to a format with as many
limitations as FreeBSD's CTF.


As a side note, I tried switching to ULEB128 for the representations of
unsigned integers in CTF a while back, but never even pushed it anywhere
because while it shrank the output a little, the compressed sizes
worsened noticeably, by about 10%, and we don't want to hurt the
compressed sizes any more than we do the uncompressed ones. I found this
quite annoying. So I'm not convinced that ULEB actually buys you
much of anything once compressors get into the mix.

Something similar happened when I tried to do clever things with string
tables last month, sharing common string suffixes, slicing strtabs up on
underscores and changes of case and replacing strings where beneficial
with offset tables pointing into the sliced-up pieces: the uncompressed
size shrank by about 50% and the compressed size grew by 20%... I found
this *very* annoying. :)

> For DWARF you also have various techniques at reducing size and optimizing
> redundancies away, look e.g. at the dwz utility.

... interesting! I'll be looking through this and seeing if any of it is
applicable to CTF as well, that's for sure.
Indu Bhagat July 9, 2019, 5:33 p.m. UTC | #12
On 07/04/2019 03:43 AM, Richard Biener wrote:
>> Hmm...a GCC plugin for CTF generation at compile-time may work out for a single
>> compilation unit.  But I am not sure how will LTO be supported in that case.
>> Basically, for LTO and -gtLEVEL to work together, I need the lto-wrapper to be
>> aware of the presence of .ctf sections (so I think). I will need to combine the
>> .ctf sections from multiple compilation units into a CTF archive, which the
>> linker can then de-duplicate.
> True.  lto-wrapper does this kind of dancing for the much more complex set of
> DWARF sections already.
>
>> Even if I assume that the technical hurdle in the above paragraph is solvable
>> within the purview of a plugin, I fear worse problems of adoption, maintenance
>> and distribution in the long run, if CTF support unfortunately ever remains to be
>> done via a plugin for reasons unforeseen.
>>
>> Going the plugin route for the short term, will continue to suffer similar
>> problems of distribution and support.
>>
>> - Is the plugin infrastructure supported on most platforms ? Also, I see that
>>     the plugin infrastructure supports all gcc versions from 4.5 onwards.
>>     Can someone confirm ? ( We minimally want the toolchain support with
>>     GCC 4.8.5 and GCC 8 and later, for now. )
> The infrastructure is quite old but you'd need new invocation hooks so this
> won't help.
>

OK then.  I will continue to focus on my current implementation without
exploring the plugin option at this time.  Thanks for confirming.

Indu
Mike Stump July 9, 2019, 10:42 p.m. UTC | #13
On Jul 5, 2019, at 11:28 AM, Nix <nix@esperi.org.uk> wrote:
> ICTF for the entire Linux kernel is about 6MiB

Any reason why not add CTF to the next dwarf standard?  Then, we just support the next dwarf standard.  If not, have you started talks with them to add it?

Long term, this is a better solution, as we then get more interoperability, more support, more tools and more goodness.

To me this is the obvious solution to the problem.
Segher Boessenkool July 9, 2019, 11:25 p.m. UTC | #14
On Fri, Jul 05, 2019 at 07:28:12PM +0100, Nix wrote:
> On 5 Jul 2019, Richard Biener said:
> 
> > On Fri, Jul 5, 2019 at 12:21 AM Indu Bhagat <indu.bhagat@oracle.com> wrote:
> >> CTF, at this time, is type information for entities at global or file scope.
> >> This can be used by online debuggers, program tracers (dynamic tracing); More
> >> generally, it provides type introspection for C programs, with an optional
> >> library API to allow them to get at their own types quite more easily than
> >> DWARF. So, the umbrella usecases are - all C programs that want to introspect
> >> their own types quickly; and applications that want to introspect other
> >> programs's types quickly.
> >
> > What makes it superior to DWARF stripped down to the above feature set?
> 
> Increased compactness.

Does CTF support something like -fasynchronous-unwind-tables?  You need
that to have any sane debugging on many platforms.  Without it, you
even have only partial backtraces, on most architectures/ABIs anyway.


Segher
Jeff Law July 10, 2019, 12:27 a.m. UTC | #15
On 7/9/19 5:25 PM, Segher Boessenkool wrote:
> On Fri, Jul 05, 2019 at 07:28:12PM +0100, Nix wrote:
>> On 5 Jul 2019, Richard Biener said:
>>
>>> On Fri, Jul 5, 2019 at 12:21 AM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>>>> CTF, at this time, is type information for entities at global or file scope.
>>>> This can be used by online debuggers, program tracers (dynamic tracing); More
>>>> generally, it provides type introspection for C programs, with an optional
>>>> library API to allow them to get at their own types quite more easily than
>>>> DWARF. So, the umbrella usecases are - all C programs that want to introspect
>>>> their own types quickly; and applications that want to introspect other
>>>> programs's types quickly.
>>>
>>> What makes it superior to DWARF stripped down to the above feature set?
>>
>> Increased compactness.
> 
> Does CTF support something like -fasynchronous-unwind-tables?  You need
> that to have any sane debugging on many platforms.  Without it, you
> even have only partial backtraces, on most architectures/ABIs anyway.
I'd be suprised if it did since you need location information.  FWIW,
low level libraries like glibc depend on this stuff to support cancellation.

jeff
Nix July 11, 2019, 12:22 p.m. UTC | #16
[Sorry for delay: head down in linker plus having nice food poisoning
 bouts]

On 9 Jul 2019, Mike Stump verbalised:

> On Jul 5, 2019, at 11:28 AM, Nix <nix@esperi.org.uk> wrote:
>> ICTF for the entire Linux kernel is about 6MiB
>
> Any reason why not add CTF to the next dwarf standard? Then, we just
> support the next dwarf standard. If not, have you started talks with
> them to add it?

A mixture of impostor syndrome, the fact that CTF is really very
non-DWARFish in all sorts of ways, and the fact that CTF-the-format is
changing quite fast right now means that... well, if it is to be added,
now is not the time. I haven't even documented it in texi yet :)

(Just suggestions for improvement I've had on the binutils list will
lead to a good few changes :) ).

Right now, the rule for compatibility is that libctf will always be able
to read all earlier versions written by any released binutils or
libdtrace-ctf, and rewrite them as the latest version -- and one
improvement I have planned is that it will eventually be able to *write*
older versions as well, as long as doing so doesn't lose information or
run into limitations of the older format (like trying to write >2^16
types to a format v1 container, or add an enum bitfield to a v2
container).

I'm doing this in the obvious fashion: every time the format written by
binutils libctf changes, it keeps the ability to upgrade all older CTF
formats any release of binutils ever accepted to the latest format.
Every binutils release after such a change constitutes a boundary: the
next format change after that will bump the CTF format version, and the
just-released format will be upgraded to be compatible with any new
stuff that gets added. If CTF generation support lands in GCC, I'll
treat compiler releases the same way, nailing the format any released
GCC emits into binutils libctf at release time and ensuring binutils
libctf can always accept it (and thus binutils ld can always link it and
gdb can always use it).

(I do not plan to ever drop support for any older CTF formats: indeed I
plan to extend it so that the FreeBSD/Solaris CTF can be read as well,
and hopefully eventually written too.)

This should suffice to ensure that the CTF emitted by any released
compiler and any released binutils can always be accepted by newer
releases, and is probably the right approach until format evolution
slows and we can start to actually standardize this.

> Long term, this is a better solution, as we then get more
> interoperability, more support, more tools and more goodness.

Agreed! I do hope libctf remains flexible and useful enough that
everyone can use it as a "format oracle", but I would welcome other
implementations: the more the merrier! (It's just that now might be too
early and too annoying for the other implementors, since the format is
evolving faster than it ever has, thanks to all the lovely suggestions
on the binutils list).

If libctf *does* gain the ability to downgrade as well as upgrade
formats, we can keep evolving the format even after standardization,
with libctf translating the standardized version to newer versions and
back down again as needed, restandardizing at intervals so the other
tools can catch up: this seems like a fairly strong reason to gain the
ability to write out old versions as well as new ones. (But I'm getting
way ahead of myself here: the internal intermediate representation
inside libctf that will make this sort of format ubiquity possible only
exists inside my head right now, after all.)
Nix July 11, 2019, 12:25 p.m. UTC | #17
On 10 Jul 2019, Segher Boessenkool spake thusly:

> On Fri, Jul 05, 2019 at 07:28:12PM +0100, Nix wrote:
>> On 5 Jul 2019, Richard Biener said:
>> 
>> > On Fri, Jul 5, 2019 at 12:21 AM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>> >> CTF, at this time, is type information for entities at global or file scope.
>> >> This can be used by online debuggers, program tracers (dynamic tracing); More
>> >> generally, it provides type introspection for C programs, with an optional
>> >> library API to allow them to get at their own types quite more easily than
>> >> DWARF. So, the umbrella usecases are - all C programs that want to introspect
>> >> their own types quickly; and applications that want to introspect other
>> >> programs's types quickly.
>> >
>> > What makes it superior to DWARF stripped down to the above feature set?
>> 
>> Increased compactness.
>
> Does CTF support something like -fasynchronous-unwind-tables?  You need
> that to have any sane debugging on many platforms.  Without it, you
> even have only partial backtraces, on most architectures/ABIs anyway.

The backtrace section is still being designed, so it could! There is
certainly nothing intrinsically preventing it. Am I right that this
stuff works by ensuring that the arg->location picture is consistent at
all times, between every instruction, rather than just at function
calls, i.e. tracking all register moves done by the compiler, even
transiently? Because that sounds doable, given that the compiler is
doing the hard work of identifying such locations anyway (it has to for
DWARF -fasynchronous-unwind-tables, right?).

It seems essential to do this in any case if you want to get correct
args for the function the user is actually stopped at: there's no
requirement that the user is stopped at a function call!
Segher Boessenkool July 11, 2019, 4:48 p.m. UTC | #18
On Thu, Jul 11, 2019 at 01:25:18PM +0100, Nix wrote:
> On 10 Jul 2019, Segher Boessenkool spake thusly:
> 
> > On Fri, Jul 05, 2019 at 07:28:12PM +0100, Nix wrote:
> >> On 5 Jul 2019, Richard Biener said:
> >> 
> >> > On Fri, Jul 5, 2019 at 12:21 AM Indu Bhagat <indu.bhagat@oracle.com> wrote:
> >> >> CTF, at this time, is type information for entities at global or file scope.
> >> >> This can be used by online debuggers, program tracers (dynamic tracing); More
> >> >> generally, it provides type introspection for C programs, with an optional
> >> >> library API to allow them to get at their own types quite more easily than
> >> >> DWARF. So, the umbrella usecases are - all C programs that want to introspect
> >> >> their own types quickly; and applications that want to introspect other
> >> >> programs's types quickly.
> >> >
> >> > What makes it superior to DWARF stripped down to the above feature set?
> >> 
> >> Increased compactness.
> >
> > Does CTF support something like -fasynchronous-unwind-tables?  You need
> > that to have any sane debugging on many platforms.  Without it, you
> > even have only partial backtraces, on most architectures/ABIs anyway.
> 
> The backtrace section is still being designed, so it could! There is
> certainly nothing intrinsically preventing it. Am I right that this
> stuff works by ensuring that the arg->location picture is consistent at
> all times, between every instruction, rather than just at function
> calls, i.e. tracking all register moves done by the compiler, even
> transiently?

Yes, something like that.  You get unwind tables that are valid at each
instruction boundary.  This is esp. important for the return address,
without that backtraces are broken.

> Because that sounds doable, given that the compiler is
> doing the hard work of identifying such locations anyway (it has to for
> DWARF -fasynchronous-unwind-tables, right?).

Yes, every backend outputs DWARF info semi-manually for this.  You have
some work to do if you want to use this for CTF.

> It seems essential to do this in any case if you want to get correct
> args for the function the user is actually stopped at: there's no
> requirement that the user is stopped at a function call!

Yes.  You need the asynchronous option only if you need this info at
any possible point in a program -- but quite a few things do need it
everywhere ;-)


Segher