diff mbox

Allow embedded timestamps by C/C++ macros to be set externally (2)

Message ID 564903A9.1020306@openmailbox.org
State New
Headers show

Commit Message

Dhole Nov. 15, 2015, 10:14 p.m. UTC
Hi!

A while back I submitted a patch to the gcc-patches mailing list
to allow the embedded timestamps by C/C++ macros to be set externally
[0]. The attached patch addresses a comment from Joseph Myers <joseph at
codesourcery dot com>: move the getenv calls into the gcc/ directory;
and it also provides better organization and style following the advice
and comments of the current Debian gcc package maintainer Matthias Klose
<doko at debian dot org> (I'm sending the patch with him as co-author).

As a reminder for the motivation behind this patch, we are working on
the reproducible builds project which aims to provide users with a way
to reproduce bit-for-bit identical binary packages from the source and
build environment. The project involves Debian as well as several other
free software projects. See <https://reproducible-builds.org/> for more
information.

Quoting from the original email:

> In order to do this, we need to make the build processes deterministic.
> As you can imagine, gcc is quite involved in producing Debian packages.
> One issue we encounter in many packages that fail to build reproducibly
> is the use of the __DATE__, __TIME__ C macros [1], right now we have 456
> affected packages that would need patching (either removing the macros,
> or passing a known date externally).

> A solution for toolchain packages that embed timestamps during the build
> process has been proposed for anyone interested and it consists of the
> following:
> The build environment can export an environment variable called
> SOURCE_DATE_EPOCH with a known timestamp in Unix epoch format (In our
> case, we use the last date of the package's debian changelog). The
> toolchain package running during the build can check if the exported
> variable is set and if so, instead of embedding the local date/time,
> embed the date/time from SOURCE_DATE_EPOCH.

The proposal to use SOURCE_DATE_EPOCH has now been gathered in a more
formal specification [2], so that any project can adhere to it to
achieve reproducible builds when dealing with timestamps.

> It would be very beneficial to our project (and other free software
> projects working on reproducible builds) if gcc supported this feature.

I'm attaching a patch for gcc-5.2.0 that enables this feature: it
modifies the behavior of the macros __DATE__ and __TIME__ when
SOURCE_DATE_EPOCH is exported.

Any comments, suggestions or other ideas to improve this patch are
mostly welcomed.

I'm willing to extend the documentation if the patch feels appropriate.

Thanks for your attention!


PD: Implementation details:

The error output in gcc/c-family/c-common.c (get_source_date_epoch) is
handled by an fprintf() to stderr followed by an exit (EXIT_FAILURE). I
am not sure this is the right approach for error handling, as I have
found many usages of the error() function in the code. If implemented
with error(), the output looks like this:

"""
$ SOURCE_DATE_EPOCH=123a123 g++ test_cpp.cpp -o test_cpp
In file included from <command-line>:0:0:
/usr/include/stdc-predef.h:1:0: error: Environment variable
$SOURCE_DATE_EPOCH: Trailing garbage: a123

 /* Copyright (C) 1991-2014 Free Software Foundation, Inc.
 ^
"""

For this reason I used fprintf(stderr, "error message") to report the
errors.


[0] https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02210.html
[1] https://wiki.debian.org/ReproducibleBuilds/TimestampsFromCPPMacros
[2] https://reproducible-builds.org/specs/source-date-epoch/

Best regards,
Dhole
gcc/c-family/ChangeLog:

2015-10-10  Eduard Sanou  <dhole@openmailbox.org>
	    Matthias Klose  <doko@debian.org>
	* gcc/c-family/c-common.c (get_source_date_epoch): New function, gets 
	the environment variable SOURCE_DATE_EPOCH and parses it as long long
	with error handling.
	* gcc/c-family/c-common.h (get_source_date_epoch): Prototype.
	* gcc/c-family/c-lex.c (c_lex_with_flags): set 
	parse_in->source_date_epoch.

libcpp/ChangeLog:

2015-10-10  Eduard Sanou  <dhole@openmailbox.org>
	    Matthias Klose  <doko@debian.org>
	* libcpp/include/cpplib.h (cpp_init_source_date_epoch): Prototype.
	* libcpp/init.c (cpp_init_source_date_epoch): New function.
	* libcpp/internal.h: Added source_date_epoch variable to struct
	cpp_reader to store a reproducible date.
	* libcpp/macro.c (_cpp_builtin_macro_text): Set pfile->date timestamp
	from pfile->source_date_epoch instead of localtime if source_date_epoch
	is set, to be used for __DATE__ and __TIME__ macros to help
	reproducible builds.

Comments

Bernd Schmidt Nov. 16, 2015, 1:05 p.m. UTC | #1
On 11/15/2015 11:14 PM, Dhole wrote:
> gcc/c-family/ChangeLog:
>
> 2015-10-10  Eduard Sanou<dhole@openmailbox.org>

I can't find a previous change from you in the sources, so the first 
question would be whether you've gone through the copyright assignment 
process.


Bernd
Dhole Nov. 16, 2015, 2:45 p.m. UTC | #2
On 11/16/2015 02:05 PM, Bernd Schmidt wrote:
> On 11/15/2015 11:14 PM, Dhole wrote:
>> gcc/c-family/ChangeLog:
>>
>> 2015-10-10  Eduard Sanou<dhole@openmailbox.org>
> 
> I can't find a previous change from you in the sources, so the first
> question would be whether you've gone through the copyright assignment
> process.

I haven't gone through the copyright assignment process. Nevertheless
Matthias Klose (co-author of this patch) has.

In any case, if required, let me know how to proceed to obtain the
relevant forms so that I can file the copyright assignment.

Best regards,
Dhole
Joseph Myers Nov. 16, 2015, 11:26 p.m. UTC | #3
On Sun, 15 Nov 2015, Dhole wrote:

> The error output in gcc/c-family/c-common.c (get_source_date_epoch) is
> handled by an fprintf() to stderr followed by an exit (EXIT_FAILURE). I
> am not sure this is the right approach for error handling, as I have
> found many usages of the error() function in the code. If implemented
> with error(), the output looks like this:

fprintf to stderr is never appropriate.  All diagnostics should go through 
a diagnostic function that properly causes the message to be translated.

If you want a fatal error (exit immediately after giving the message 
rather than continuing compilation), use the fatal_error function, which 
takes an explicit location.  You can use UNKNOWN_LOCATION to avoid an 
input file being named, like e.g. the code in toplev.c:

        /* Use UNKOWN_LOCATION to prevent gcc from printing the first
           line in the current file. */
        fatal_error (UNKNOWN_LOCATION,
                     "input file %qs is the same as output file",
                     asm_file_name);
Joseph Myers Nov. 16, 2015, 11:29 p.m. UTC | #4
Also, this environment variable needs documenting in cppenv.texi.
diff mbox

Patch

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 7fe7fa6..f795871 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -12318,4 +12318,49 @@  pointer_to_zero_sized_aggr_p (tree t)
   return (TYPE_SIZE (t) && integer_zerop (TYPE_SIZE (t)));
 }
 
+/* Read SOURCE_DATE_EPOCH from environment to have a deterministic
+   timestamp to replace embedded current dates to get reproducible
+   results. Returns -1 if SOURCE_DATE_EPOCH is not defined.  */
+long long
+get_source_date_epoch()
+{
+  char *source_date_epoch;
+  unsigned long long epoch;
+  char *endptr;
+
+  source_date_epoch = getenv ("SOURCE_DATE_EPOCH");
+  if (!source_date_epoch)
+    return -1
+
+  errno = 0;
+  epoch = strtoull (source_date_epoch, &endptr, 10);
+  if ((errno == ERANGE && (epoch == ULLONG_MAX || epoch == 0))
+      || (errno != 0 && epoch == 0))
+    {
+      fprintf (stderr, "environment variable $SOURCE_DATE_EPOCH: strtoull: \
+%s\n", xstrerror(errno));
+      exit (EXIT_FAILURE);
+    }
+  if (endptr == source_date_epoch)
+    {
+      fprintf (stderr, "environment variable $SOURCE_DATE_EPOCH: \
+No digits were found: %s\n", endptr);
+      exit (EXIT_FAILURE);
+    }
+  if (*endptr != '\0')
+    {
+      fprintf (stderr, "environment variable $SOURCE_DATE_EPOCH: \
+Trailing garbage: %s\n", endptr);
+      exit (EXIT_FAILURE);
+    }
+  if (epoch > ULONG_MAX)
+    {
+      fprintf (stderr, "environment variable $SOURCE_DATE_EPOCH: \
+value must be smaller than or equal to: %lu but was found to be: \
+%llu \n", ULONG_MAX, epoch);
+      exit (EXIT_FAILURE);
+    }
+  return (long long) epoch;
+}
+
 #include "gt-c-family-c-common.h"
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index cabf452..4b55277 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1437,4 +1437,10 @@  extern bool contains_cilk_spawn_stmt (tree);
 extern tree cilk_for_number_of_iterations (tree);
 extern bool check_no_cilk (tree, const char *, const char *,
 		           location_t loc = UNKNOWN_LOCATION);
+
+/* Read SOURCE_DATE_EPOCH from environment to have a deterministic
+   timestamp to replace embedded current dates to get reproducible
+   results. Returns -1 if SOURCE_DATE_EPOCH is not defined.  */
+extern long long get_source_date_epoch();
+
 #endif /* ! GCC_C_COMMON_H */
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index bb55be8..9344f66 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -402,6 +402,10 @@  c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
   enum cpp_ttype type;
   unsigned char add_flags = 0;
   enum overflow_type overflow = OT_NONE;
+  long long source_date_epoch = -1;
+
+  source_date_epoch = get_source_date_epoch();
+  cpp_init_source_date_epoch(parse_in, source_date_epoch);
 
   timevar_push (TV_CPP);
  retry:
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index 5e08014..9eee80f 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -775,6 +775,9 @@  extern void cpp_init_special_builtins (cpp_reader *);
 /* Set up built-ins like __FILE__.  */
 extern void cpp_init_builtins (cpp_reader *, int);
 
+/* Initialize the source_date_epoch value.  */
+extern void cpp_init_source_date_epoch (cpp_reader *, long long);
+
 /* This is called after options have been parsed, and partially
    processed.  */
 extern void cpp_post_options (cpp_reader *);
diff --git a/libcpp/init.c b/libcpp/init.c
index 45a4d13..dfde0a1 100644
--- a/libcpp/init.c
+++ b/libcpp/init.c
@@ -530,6 +530,13 @@  cpp_init_builtins (cpp_reader *pfile, int hosted)
     _cpp_define_builtin (pfile, "__OBJC__ 1");
 }
 
+/* Initialize the source_date_epoch value.  */
+void
+cpp_init_source_date_epoch (cpp_reader *pfile, long long source_date_epoch)
+{
+  pfile->source_date_epoch = source_date_epoch; 
+}
+
 /* Sanity-checks are dependent on command-line options, so it is
    called as a subroutine of cpp_read_main_file ().  */
 #if ENABLE_CHECKING
diff --git a/libcpp/internal.h b/libcpp/internal.h
index c2d0816..4363d96 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -502,6 +502,10 @@  struct cpp_reader
   const unsigned char *date;
   const unsigned char *time;
 
+  /* Externally set timestamp to replace current date and time useful for
+     reproducibility.  */
+  long long source_date_epoch;
+
   /* EOF token, and a token forcing paste avoidance.  */
   cpp_token avoid_paste;
   cpp_token eof;
diff --git a/libcpp/macro.c b/libcpp/macro.c
index 1e0a0b5..42beb9c 100644
--- a/libcpp/macro.c
+++ b/libcpp/macro.c
@@ -350,13 +350,20 @@  _cpp_builtin_macro_text (cpp_reader *pfile, cpp_hashnode *node)
 	  time_t tt;
 	  struct tm *tb = NULL;
 
-	  /* (time_t) -1 is a legitimate value for "number of seconds
-	     since the Epoch", so we have to do a little dance to
-	     distinguish that from a genuine error.  */
-	  errno = 0;
-	  tt = time(NULL);
-	  if (tt != (time_t)-1 || errno == 0)
-	    tb = localtime (&tt);
+	  /* Set a reproducible timestamp for __DATE__ and __TIME__ macro
+	     usage if SOURCE_DATE_EPOCH is defined.  */
+	  if (pfile->source_date_epoch != -1)
+	     tb = gmtime ((time_t*) &pfile->source_date_epoch);
+	  else
+	    {
+	      /* (time_t) -1 is a legitimate value for "number of seconds
+		 since the Epoch", so we have to do a little dance to
+		 distinguish that from a genuine error.  */
+	      errno = 0;
+	      tt = time (NULL);
+	      if (tt != (time_t)-1 || errno == 0)
+		tb = localtime (&tt);
+	    }
 
 	  if (tb)
 	    {