diff mbox series

[line-map] Better packing of maps

Message ID d136aab8-d01f-4801-1c83-1c8e9c50dfe1@acm.org
State New
Headers show
Series [line-map] Better packing of maps | expand

Commit Message

Nathan Sidwell June 26, 2018, 6:28 p.m. UTC
I've been wandering around the line-map machinery on the modules branch. 
  One thing I noticed was the padding of the line_map type hierarchy was 
unfortunate.  That made me sad.  This patch fixes that.

Rather than keep the the reason field in line_map, I move it to 
line_map_ordinary.  That allows line_map to be a 4 byte struct, rather 
than 5 bytes data and 3 padding.  We can use the source_location it 
contains to distinguish between ordinary and macro maps, by moving the 
LINE_MAP_MAX_LOCATION constant into the header file.  While one 
interpretation of the comments is that the macro and ordinary line maps 
can grow arbitrarily, until they meet, the actual code presumes the 
boundary is fixed, so this is not a new restriction. (see the assert in 
linemap_enter_macro for instance)

Reordering the fields in the ordinary and macro map objects continues 
the reduction in padding.  On a 64-bit target the ordinary map goes from 
32 to 24 bytes.

booted & tested on x86_64-linux.  I'll commit in a few days, unless 
there are objections (it is part of libcpp, but has an explicit 
maintainer listed).

nathan

Comments

David Malcolm June 26, 2018, 7:31 p.m. UTC | #1
On Tue, 2018-06-26 at 14:28 -0400, Nathan Sidwell wrote:
> I've been wandering around the line-map machinery on the modules
> branch. 
>   One thing I noticed was the padding of the line_map type hierarchy
> was 
> unfortunate.  That made me sad.  This patch fixes that.
> 
> Rather than keep the the reason field in line_map, I move it to 
> line_map_ordinary.  That allows line_map to be a 4 byte struct,
> rather 
> than 5 bytes data and 3 padding.  We can use the source_location it 
> contains to distinguish between ordinary and macro maps, by moving
> the 
> LINE_MAP_MAX_LOCATION constant into the header file.  While one 
> interpretation of the comments is that the macro and ordinary line
> maps 
> can grow arbitrarily, until they meet, the actual code presumes the 
> boundary is fixed, so this is not a new restriction. (see the assert
> in 
> linemap_enter_macro for instance)

Which assert are you referring to?

I'm not convinced that there already is a fixed boundary between the
ordinary maps and macro maps (though that's not necessarily an argument
against the patch); my reading of the existing linemap_enter_macro is
that it handles the two maps meeting, but doesn't set any restrictions
on where that meeting point is.

Previously there was an upper bound on the expansion of ordinary
locations:
  const source_location LINE_MAP_MAX_SOURCE_LOCATION = 0x70000000;
but I believe that macro maps could in theory expand below this limit,
and this check in linemap_line_start is meant to prevent collisions
when this happens:

  /* Locations of ordinary tokens are always lower than locations of
     macro tokens.  */
  if (r >= LINEMAPS_MACRO_LOWEST_LOCATION (set))
    return 0;

So if I'm reading the patch correctly, it introduces a new limit on how
big the macro maps can be: 0x10000000, hence about 268 million macro
argument expansions, I believe, before it will stop tracking macro
expansions.

Has this been smoketested on some very large C++ TUs?

> Reordering the fields in the ordinary and macro map objects
> continues 
> the reduction in padding.  On a 64-bit target the ordinary map goes
> from 
> 32 to 24 bytes.

> booted & tested on x86_64-linux.  I'll commit in a few days, unless 
> there are objections (it is part of libcpp, but has an explicit 
> maintainer listed).
> 
> nathan

It would be nice to extend:
  gcc/testsuite/gcc.dg/plugin/location_overflow_plugin.c
  gcc/testsuite/gcc.dg/plugin/location-overflow-test-*.c
to get some test coverage of what happens when the two kinds of map
collide.

Hope this is constructive
Dave
Nathan Sidwell June 26, 2018, 9:13 p.m. UTC | #2
On 06/26/2018 03:31 PM, David Malcolm wrote:

> Which assert are you referring to?

Hm, I think I may have been confused by the unsigned arithmetic check in:
   if (start_location <= set->highest_line
       || start_location > LINEMAPS_MACRO_LOWEST_LOCATION (set))
     /* We ran out of macro map space.   */
     return NULL;

It looks to me that the second condition can only be true with the first 
being false when the macro expansion is >= 2^32 tokens (give or take).

However, AFAICT, linemap_add doesn't check whether ordinary maps have 
run into the macro range:
   if (set->highest_location < LINE_MAP_MAX_LOCATION_WITH_COLS)
    {...}
   else
     start_location = set->highest_location + 1;

> So if I'm reading the patch correctly, it introduces a new limit on how
> big the macro maps can be: 0x10000000, hence about 268 million macro
> argument expansions, I believe, before it will stop tracking macro
> expansions.

You may be correct.  The comment in line-map.h is unclear as to whether 
the  LINE_MAP_MAX_SOURCE_LOCATION is a barrier for macros or not, it can 
be read either way.

> Has this been smoketested on some very large C++ TUs?

No.  The code bases available to me aren't gcc-trunk ready (I don't know 
if they contain single TUs so large).

nathan
diff mbox series

Patch

2018-06-26  Nathan Sidwell  <nathan@acm.org>

	Reorg line_map data structures for better packing.
	* include/line-map.h (enum lc_reason): Add LC_HWM.
	(LINE_MAP_MAX_LOCATION): Define here.
	(struct line_map): Move reason field to line_map_ordinary.  Adjust
	GTY tagging.
	(struct line_map_ordinary): Reorder fields for less padding.
	(struct line_map_macro): Likewise.
	(MAP_ORDINARY_P): New.
	(linemap_check_ordinary, linemap_check_macro): Adjust.
	* line-map.c (LINE_MAP_MAX_SOURCE_LOCATION): Delete.
	(new_linemap): Take start_location, not reason.  Adjust.
	(linemap_add, linemap_enter_macro): Adjust.
	(linemap_line_start): Likewise.
	(linemap_macro_expansion_map_p): Use MAP_ORDINARY_P.
	(linemap_macro_loc_to_spelling_point): Likewise.
	(linemap_macro_loc_to_def_point): Likewise.
	(linemap_dump): Likewise.

Index: include/line-map.h
===================================================================
--- include/line-map.h	(revision 262147)
+++ include/line-map.h	(working copy)
@@ -74,8 +74,9 @@  enum lc_reason
   LC_LEAVE,
   LC_RENAME,
   LC_RENAME_VERBATIM,
-  LC_ENTER_MACRO
+  LC_ENTER_MACRO,
   /* FIXME: add support for stringize and paste.  */
+  LC_HWM /* High Water Mark.  */
 };
 
 /* The typedef "source_location" is a key within the location database,
@@ -168,7 +169,7 @@  enum lc_reason
              |   Beyond this point, ordinary linemaps have 0 bits per column:
              |   each increment of the value corresponds to a new source line.
              |
-  0x70000000 | LINE_MAP_MAX_SOURCE_LOCATION
+  0x70000000 | LINE_MAP_MAX_LOCATION
              |   Beyond the point, we give up on ordinary maps; attempts to
              |   create locations in them lead to UNKNOWN_LOCATION (0).
              |
@@ -307,6 +308,9 @@  const source_location LINE_MAP_MAX_LOCAT
      gcc.dg/plugin/location-overflow-test-*.c.  */
 const source_location LINE_MAP_MAX_LOCATION_WITH_COLS = 0x60000000;
 
+/* Highest possible source location encoded within an ordinary map.  */
+const source_location LINE_MAP_MAX_LOCATION = 0x70000000;
+
 /* A range of source locations.
 
    Ranges are closed:
@@ -377,11 +381,13 @@  typedef size_t (*line_map_round_alloc_si
    location of the expansion point of PLUS. That location is mapped in
    the map that is active right before the location of the invocation
    of PLUS.  */
-struct GTY((tag ("0"), desc ("%h.reason == LC_ENTER_MACRO ? 2 : 1"))) line_map {
+
+/* This contains GTY mark-up to support precompiled headers.
+   line_map is an abstract class, only derived objects exist.  */
+struct GTY((tag ("0"), desc ("MAP_ORDINARY_P (&%h) ? 1 : 2"))) line_map {
   source_location start_location;
 
-  /* The reason for creation of this line map.  */
-  ENUM_BITFIELD (lc_reason) reason : CHAR_BIT;
+  /* Size and alignment is (usually) 4 bytes.  */
 };
 
 /* An ordinary line map encodes physical source locations. Those
@@ -397,13 +403,12 @@  struct GTY((tag ("0"), desc ("%h.reason
 
    The highest possible source location is MAX_SOURCE_LOCATION.  */
 struct GTY((tag ("1"))) line_map_ordinary : public line_map {
-  const char *to_file;
-  linenum_type to_line;
+  /* Base class is 4 bytes.  */
 
-  /* An index into the set that gives the line mapping at whose end
-     the current one was included.  File(s) at the bottom of the
-     include stack have this set to -1.  */
-  int included_from;
+  /* 4 bytes of integers, each 1 byte for easy extraction/insertion.  */
+
+  /* The reason for creation of this line map.  */
+  ENUM_BITFIELD (lc_reason) reason : 8;
 
   /* SYSP is one for a system header, two for a C system header file
      that therefore needs to be extern "C" protected in C++, and zero
@@ -429,6 +434,18 @@  struct GTY((tag ("1"))) line_map_ordinar
      |                         |    (e.g. 7)           |   (e.g. 5)        |
      +-------------------------+-----------------------+-------------------+ */
   unsigned int m_range_bits : 8;
+
+  /* Pointer alignment boundary on both 32 and 64-bit systems.  */
+
+  const char *to_file;
+  linenum_type to_line;
+
+  /* An index into the set that gives the line mapping at whose end
+     the current one was included.  File(s) at the bottom of the
+     include stack have this set to -1.  */
+  int included_from;
+
+  /* Size is 20 or 24 bytes, no padding  */
 };
 
 /* This is the highest possible source location encoded within an
@@ -443,15 +460,20 @@  struct cpp_hashnode;
    The offset from START_LOCATION is used to index into
    MACRO_LOCATIONS; this holds the original location of the token.  */
 struct GTY((tag ("2"))) line_map_macro : public line_map {
-  /* The cpp macro which expansion gave birth to this macro map.  */
-  struct cpp_hashnode * GTY ((nested_ptr (union tree_node,
-				   "%h ? CPP_HASHNODE (GCC_IDENT_TO_HT_IDENT (%h)) : NULL",
-				   "%h ? HT_IDENT_TO_GCC_IDENT (HT_NODE (%h)) : NULL")))
-    macro;
+  /* Base is 4 bytes.  */
 
   /* The number of tokens inside the replacement-list of MACRO.  */
   unsigned int n_tokens;
 
+  /* Pointer alignment boundary.  */
+
+  /* The cpp macro whose expansion gave birth to this macro map.  */
+  struct cpp_hashnode *
+    GTY ((nested_ptr (union tree_node,
+		      "%h ? CPP_HASHNODE (GCC_IDENT_TO_HT_IDENT (%h)) : NULL",
+		      "%h ? HT_IDENT_TO_GCC_IDENT (HT_NODE (%h)) : NULL")))
+    macro;
+
   /* This array of location is actually an array of pairs of
      locations. The elements inside it thus look like:
 
@@ -513,6 +535,8 @@  struct GTY((tag ("2"))) line_map_macro :
      could have been either a macro or an ordinary map, depending on
      if we are in a nested expansion context not.  */
   source_location expansion;
+
+  /* Size is 20 or 32 (4 bytes padding on 64-bit).  */
 };
 
 #if CHECKING_P && (GCC_VERSION >= 2007)
@@ -540,6 +564,14 @@  struct GTY((tag ("2"))) line_map_macro :
 #define linemap_assert_fails(EXPR) (! (EXPR))
 #endif
 
+/* Categorize line map kinds.  */
+
+inline bool
+MAP_ORDINARY_P (const line_map *map)
+{
+  return map->start_location < LINE_MAP_MAX_LOCATION;
+}
+
 /* Return TRUE if MAP encodes locations coming from a macro
    replacement-list at macro expansion point.  */
 bool
@@ -552,7 +584,7 @@  linemap_macro_expansion_map_p (const str
 inline line_map_ordinary *
 linemap_check_ordinary (struct line_map *map)
 {
-  linemap_assert (!linemap_macro_expansion_map_p (map));
+  linemap_assert (MAP_ORDINARY_P (map));
   return (line_map_ordinary *)map;
 }
 
@@ -563,7 +595,7 @@  linemap_check_ordinary (struct line_map
 inline const line_map_ordinary *
 linemap_check_ordinary (const struct line_map *map)
 {
-  linemap_assert (!linemap_macro_expansion_map_p (map));
+  linemap_assert (MAP_ORDINARY_P (map));
   return (const line_map_ordinary *)map;
 }
 
@@ -572,7 +604,7 @@  linemap_check_ordinary (const struct lin
 
 inline line_map_macro *linemap_check_macro (line_map *map)
 {
-  linemap_assert (linemap_macro_expansion_map_p (map));
+  linemap_assert (!MAP_ORDINARY_P (map));
   return (line_map_macro *)map;
 }
 
@@ -582,7 +614,7 @@  inline line_map_macro *linemap_check_mac
 inline const line_map_macro *
 linemap_check_macro (const line_map *map)
 {
-  linemap_assert (linemap_macro_expansion_map_p (map));
+  linemap_assert (!MAP_ORDINARY_P (map));
   return (const line_map_macro *)map;
 }
 
Index: line-map.c
===================================================================
--- line-map.c	(revision 262147)
+++ line-map.c	(working copy)
@@ -26,10 +26,6 @@  along with this program; see the file CO
 #include "internal.h"
 #include "hashtab.h"
 
-/* Highest possible source location encoded within an ordinary or
-   macro map.  */
-const source_location LINE_MAP_MAX_SOURCE_LOCATION = 0x70000000;
-
 static void trace_include (const struct line_maps *, const line_map_ordinary *);
 static const line_map_ordinary * linemap_ordinary_map_lookup (struct line_maps *,
 							      source_location);
@@ -378,13 +374,10 @@  linemap_check_files_exited (struct line_
    macro maps are allocated in different memory location.  */
 
 static struct line_map *
-new_linemap (struct line_maps *set,
-	     enum lc_reason reason)
+new_linemap (struct line_maps *set,  source_location start_location)
 {
-  /* Depending on this variable, a macro map would be allocated in a
-     different memory location than an ordinary map.  */
-  bool macro_map_p = (reason == LC_ENTER_MACRO);
   struct line_map *result;
+  bool macro_map_p = start_location >= LINE_MAP_MAX_LOCATION;
 
   if (LINEMAPS_USED (set, macro_map_p) == LINEMAPS_ALLOCATED (set, macro_map_p))
     {
@@ -455,9 +448,10 @@  new_linemap (struct line_maps *set,
 	result = &set->info_ordinary.maps[LINEMAPS_USED (set, macro_map_p)];
     }
 
+  result->start_location = start_location;
+
   LINEMAPS_USED (set, macro_map_p)++;
 
-  result->reason = reason;
   return result;
 }
 
@@ -492,9 +486,9 @@  linemap_add (struct line_maps *set, enum
   else
     start_location = set->highest_location + 1;
 
-  linemap_assert (!(LINEMAPS_ORDINARY_USED (set)
-		    && (start_location
-			< MAP_START_LOCATION (LINEMAPS_LAST_ORDINARY_MAP (set)))));
+  linemap_assert (!LINEMAPS_ORDINARY_USED (set)
+		  || (start_location
+		      >= MAP_START_LOCATION (LINEMAPS_LAST_ORDINARY_MAP (set))));
 
   /* When we enter the file for the first time reason cannot be
      LC_RENAME.  */
@@ -510,7 +504,9 @@  linemap_add (struct line_maps *set, enum
     }
 
   linemap_assert (reason != LC_ENTER_MACRO);
-  line_map_ordinary *map = linemap_check_ordinary (new_linemap (set, reason));
+  line_map_ordinary *map
+    = linemap_check_ordinary (new_linemap (set, start_location));
+  map->reason = reason;
 
   if (to_file && *to_file == '\0' && reason != LC_RENAME_VERBATIM)
     to_file = "<stdin>";
@@ -545,7 +541,6 @@  linemap_add (struct line_maps *set, enum
     }
 
   map->sysp = sysp;
-  map->start_location = start_location;
   map->to_file = to_file;
   map->to_line = to_line;
   LINEMAPS_ORDINARY_CACHE (set) = LINEMAPS_ORDINARY_USED (set) - 1;
@@ -626,14 +621,12 @@  linemap_enter_macro (struct line_maps *s
 
   start_location = LINEMAPS_MACRO_LOWEST_LOCATION (set) - num_tokens;
 
-  if (start_location <= set->highest_line
-      || start_location > LINEMAPS_MACRO_LOWEST_LOCATION (set))
+  if (start_location < LINE_MAP_MAX_LOCATION)
     /* We ran out of macro map space.   */
     return NULL;
 
-  map = linemap_check_macro (new_linemap (set, LC_ENTER_MACRO));
+  map = linemap_check_macro (new_linemap (set, start_location));
 
-  map->start_location = start_location;
   map->macro = macro_node;
   map->n_tokens = num_tokens;
   map->macro_locations
@@ -718,7 +711,7 @@  linemap_line_start (struct line_maps *se
       || (highest > LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES
 	  && map->m_range_bits > 0)
       || (highest > LINE_MAP_MAX_LOCATION_WITH_COLS
-	  && (set->max_column_hint || highest >= LINE_MAP_MAX_SOURCE_LOCATION)))
+	  && (set->max_column_hint || highest >= LINE_MAP_MAX_LOCATION)))
     add_map = true;
   else
     max_column_hint = set->max_column_hint;
@@ -735,7 +728,7 @@  linemap_line_start (struct line_maps *se
 	  max_column_hint = 0;
 	  column_bits = 0;
 	  range_bits = 0;
-	  if (highest > LINE_MAP_MAX_SOURCE_LOCATION)
+	  if (highest > LINE_MAP_MAX_LOCATION)
 	    return 0;
 	}
       else
@@ -1049,9 +1042,7 @@  linemap_macro_map_lookup (struct line_ma
 bool
 linemap_macro_expansion_map_p (const struct line_map *map)
 {
-  if (!map)
-    return false;
-  return (map->reason == LC_ENTER_MACRO);
+  return map && !MAP_ORDINARY_P (map);
 }
 
 /* If LOCATION is the locus of a token in a replacement-list of a
@@ -1403,23 +1394,22 @@  linemap_macro_loc_to_spelling_point (str
 				     source_location location,
 				     const line_map_ordinary **original_map)
 {
-  struct line_map *map;
   linemap_assert (set && location >= RESERVED_LOCATION_COUNT);
 
   while (true)
     {
-      map = const_cast <line_map *> (linemap_lookup (set, location));
-      if (!linemap_macro_expansion_map_p (map))
-	break;
+      const struct line_map *map = linemap_lookup (set, location);
+      if (!map || MAP_ORDINARY_P (map))
+	{
+	  if (original_map)
+	    *original_map = (const line_map_ordinary *)map;
+	  break;
+	}
 
-      location
-	= linemap_macro_map_loc_unwind_toward_spelling
-	    (set, linemap_check_macro (map),
-	     location);
+      location = linemap_macro_map_loc_unwind_toward_spelling
+	(set, linemap_check_macro (map), location);
     }
 
-  if (original_map)
-    *original_map = linemap_check_ordinary (map);
   return location;
 }
 
@@ -1438,29 +1428,26 @@  linemap_macro_loc_to_def_point (struct l
 				source_location location,
 				const line_map_ordinary **original_map)
 {
-  struct line_map *map;
-
   linemap_assert (set && location >= RESERVED_LOCATION_COUNT);
 
-  while (true)
+  for (;;)
     {
-      source_location caret_loc;
-      if (IS_ADHOC_LOC (location))
-	caret_loc = get_location_from_adhoc_loc (set, location);
-      else
-	caret_loc = location;
+      source_location caret_loc = location;
+      if (IS_ADHOC_LOC (caret_loc))
+	caret_loc = get_location_from_adhoc_loc (set, caret_loc);
 
-      map = const_cast <line_map *> (linemap_lookup (set, caret_loc));
-      if (!linemap_macro_expansion_map_p (map))
-	break;
+      const line_map *map = linemap_lookup (set, caret_loc);
+      if (!map || MAP_ORDINARY_P (map))
+	{
+	  if (original_map)
+	    *original_map = (const line_map_ordinary *)map;
+	  break;
+	}
 
-      location =
-	linemap_macro_map_loc_to_def_point (linemap_check_macro (map),
-					    caret_loc);
+      location = linemap_macro_map_loc_to_def_point
+	(linemap_check_macro (map), caret_loc);
     }
 
-  if (original_map)
-    *original_map = linemap_check_ordinary (map);
   return location;
 }
 
@@ -1770,24 +1757,29 @@  linemap_expand_location (struct line_map
 void
 linemap_dump (FILE *stream, struct line_maps *set, unsigned ix, bool is_macro)
 {
-  const char *lc_reasons_v[LC_ENTER_MACRO + 1]
+  const char *const lc_reasons_v[LC_HWM]
       = { "LC_ENTER", "LC_LEAVE", "LC_RENAME", "LC_RENAME_VERBATIM",
 	  "LC_ENTER_MACRO" };
-  const char *reason;
   const line_map *map;
+  unsigned reason;
 
   if (stream == NULL)
     stream = stderr;
 
   if (!is_macro)
-    map = LINEMAPS_ORDINARY_MAP_AT (set, ix);
+    {
+      map = LINEMAPS_ORDINARY_MAP_AT (set, ix);
+      reason = linemap_check_ordinary (map)->reason;
+    }
   else
-    map = LINEMAPS_MACRO_MAP_AT (set, ix);
-
-  reason = (map->reason <= LC_ENTER_MACRO) ? lc_reasons_v[map->reason] : "???";
+    {
+      map = LINEMAPS_MACRO_MAP_AT (set, ix);
+      reason = LC_ENTER_MACRO;
+    }
 
   fprintf (stream, "Map #%u [%p] - LOC: %u - REASON: %s - SYSP: %s\n",
-	   ix, (void *) map, map->start_location, reason,
+	   ix, (void *) map, map->start_location,
+	   reason < LC_HWM ? lc_reasons_v[reason] : "???",
 	   ((!is_macro
 	     && ORDINARY_MAP_IN_SYSTEM_HEADER_P (linemap_check_ordinary (map)))
 	    ? "yes" : "no"));