diff mbox

[v5,08/19] reproducible: try to detect most common errors

Message ID 1482241596-31688-9-git-send-email-jezz@sysmic.org
State Rejected
Headers show

Commit Message

Jérôme Pouiller Dec. 20, 2016, 1:46 p.m. UTC
Some package includes some information from build environment in their results.
This practice is incompatible with reproducible builds.

This patch scans final target to research most common patterns.

Since we only search fixed strings (grep is called with -F), this search is
fast (on my station, 60ms for a target of 40MB).

Note, it could be a good idea to also match current user name. However, build
path often contains username and, until now, we do not try to avoid build path
in result.

Signed-off-by: Jérôme Pouiller <jezz@sysmic.org>
---
 Makefile | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Samuel Martin Feb. 7, 2017, 2:52 p.m. UTC | #1
Hi Jérôme, all,

On Tue, Dec 20, 2016 at 2:46 PM, Jérôme Pouiller <jezz@sysmic.org> wrote:
> Some package includes some information from build environment in their results.
> This practice is incompatible with reproducible builds.
>
> This patch scans final target to research most common patterns.
>
> Since we only search fixed strings (grep is called with -F), this search is
> fast (on my station, 60ms for a target of 40MB).
>
> Note, it could be a good idea to also match current user name. However, build
> path often contains username and, until now, we do not try to avoid build path
> in result.
>
> Signed-off-by: Jérôme Pouiller <jezz@sysmic.org>

Reviewed-by: Samuel Martin <s.martin49@gmail.com>

> ---
>  Makefile | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index ad7fde5..5b504c1 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -708,6 +708,11 @@ endif
>                 $(call MESSAGE,"Executing post-build script $(s)"); \
>                 $(EXTRA_ENV) $(s) $(TARGET_DIR) $(call qstrip,$(BR2_ROOTFS_POST_SCRIPT_ARGS))$(sep))
>
> +ifeq ($(BR2_REPRODUCIBLE),y)
> +       grep -raoF -e "`uname -r`" -e "`uname -v`" -e "`uname -n`" $(TARGET_DIR) | \
> +               sed 's/\(.*\):\(.*\)/Warning: \1 may contain unreproducible information: \2/' | uniq
> +endif
> +
>  target-post-image: $(TARGETS_ROOTFS) target-finalize
>         @$(foreach s, $(call qstrip,$(BR2_ROOTFS_POST_IMAGE_SCRIPT)), \
>                 $(call MESSAGE,"Executing post-image script $(s)"); \
> --
> 1.9.1
>
> _______________________________________________
> buildroot mailing list
> buildroot@busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot

Regards,
Thomas Petazzoni April 1, 2017, 2:50 p.m. UTC | #2
Hello,

On Tue, 20 Dec 2016 14:46:25 +0100, Jérôme Pouiller wrote:
> Some package includes some information from build environment in their results.
> This practice is incompatible with reproducible builds.
> 
> This patch scans final target to research most common patterns.
> 
> Since we only search fixed strings (grep is called with -F), this search is
> fast (on my station, 60ms for a target of 40MB).
> 
> Note, it could be a good idea to also match current user name. However, build
> path often contains username and, until now, we do not try to avoid build path
> in result.
> 
> Signed-off-by: Jérôme Pouiller <jezz@sysmic.org>

I am not entirely convinced it makes sense to grep more or less
randomly in all files specifically for those strings.

Or perhaps, we should have a separate shell scripts that checks for
several classes of obviously non-reproducible behaviors?

Arnout, Peter, Yann, what do you think?

Thomas
Yann E. MORIN April 1, 2017, 9:13 p.m. UTC | #3
Thomas, Jérôme, All,

On 2017-04-01 16:50 +0200, Thomas Petazzoni spake thusly:
> On Tue, 20 Dec 2016 14:46:25 +0100, Jérôme Pouiller wrote:
> > Some package includes some information from build environment in their results.
> > This practice is incompatible with reproducible builds.
> > 
> > This patch scans final target to research most common patterns.
> > 
> > Since we only search fixed strings (grep is called with -F), this search is
> > fast (on my station, 60ms for a target of 40MB).
> > 
> > Note, it could be a good idea to also match current user name. However, build
> > path often contains username and, until now, we do not try to avoid build path
> > in result.
> > 
> > Signed-off-by: Jérôme Pouiller <jezz@sysmic.org>
> 
> I am not entirely convinced it makes sense to grep more or less
> randomly in all files specifically for those strings.

Indeed. My hostnames usually cary some computer-related keywords, which
can very well occur in binary files (e.g. 'segfault').

> Or perhaps, we should have a separate shell scripts that checks for
> several classes of obviously non-reproducible behaviors?

Indeed, but such a script can only be a helper that a user would
voluntarily run on their own.

We can't have such a script automatically run, because those strings can
be too common.

Regards,
Yann E. MORIN.
Arnout Vandecappelle April 1, 2017, 9:48 p.m. UTC | #4
On 01-04-17 23:13, Yann E. MORIN wrote:
> Thomas, Jérôme, All,
> 
> On 2017-04-01 16:50 +0200, Thomas Petazzoni spake thusly:
>> On Tue, 20 Dec 2016 14:46:25 +0100, Jérôme Pouiller wrote:
>>> Some package includes some information from build environment in their results.
>>> This practice is incompatible with reproducible builds.
>>>
>>> This patch scans final target to research most common patterns.
>>>
>>> Since we only search fixed strings (grep is called with -F), this search is
>>> fast (on my station, 60ms for a target of 40MB).
>>>
>>> Note, it could be a good idea to also match current user name. However, build
>>> path often contains username and, until now, we do not try to avoid build path
>>> in result.
>>>
>>> Signed-off-by: Jérôme Pouiller <jezz@sysmic.org>
>>
>> I am not entirely convinced it makes sense to grep more or less
>> randomly in all files specifically for those strings.
> 
> Indeed. My hostnames usually cary some computer-related keywords, which
> can very well occur in binary files (e.g. 'segfault').

 Even more so for uname -r, which could be something like 4.10...

> 
>> Or perhaps, we should have a separate shell scripts that checks for
>> several classes of obviously non-reproducible behaviors?
> 
> Indeed, but such a script can only be a helper that a user would
> voluntarily run on their own.
> 
> We can't have such a script automatically run, because those strings can
> be too common.

 Well, as proposed by Jerome, it would just print a warning at the end of the
build, which is probably acceptable.

 Regards,
 Arnout
diff mbox

Patch

diff --git a/Makefile b/Makefile
index ad7fde5..5b504c1 100644
--- a/Makefile
+++ b/Makefile
@@ -708,6 +708,11 @@  endif
 		$(call MESSAGE,"Executing post-build script $(s)"); \
 		$(EXTRA_ENV) $(s) $(TARGET_DIR) $(call qstrip,$(BR2_ROOTFS_POST_SCRIPT_ARGS))$(sep))
 
+ifeq ($(BR2_REPRODUCIBLE),y)
+	grep -raoF -e "`uname -r`" -e "`uname -v`" -e "`uname -n`" $(TARGET_DIR) | \
+		sed 's/\(.*\):\(.*\)/Warning: \1 may contain unreproducible information: \2/' | uniq
+endif
+
 target-post-image: $(TARGETS_ROOTFS) target-finalize
 	@$(foreach s, $(call qstrip,$(BR2_ROOTFS_POST_IMAGE_SCRIPT)), \
 		$(call MESSAGE,"Executing post-image script $(s)"); \