diff mbox

build xz (instead of bz2) compressed tarballs and diffs

Message ID 421aad71-31d2-ec16-9c8b-4b1eaefda201@ubuntu.com
State New
Headers show

Commit Message

Matthias Klose May 15, 2017, 1:11 a.m. UTC
As discussed on IRC with Jakub and Richard here are is a small patch which
builds xz compressed tarballs and diff files.

Tested with

  maintainer-scripts/gcc_release \
	-s snap:trunk -p <old bz2 tarball> diffs sources tarfiles
  maintainer-scripts/gcc_release \
	-s snap:trunk -p <old xz tarball> diffs sources tarfiles

and checked that the new tarball and diff files are compressed using xz.

Ok for the trunk and the gcc-7-branch?

Matthias
maintainer-scripts/

2017-05-14  Matthias Klose  <doko@ubuntu.com>

	* gcc_release (build_gzip): Build xz tarball instead of bz2 tarball.
	(build_diffs): Handle building diffs from either bz2 or xz tarballs,
	compress diffs using xz instead of bz2.
	(build_diff): Likewise.
	(upload_files): Check for *.xz files instead of *.bz2 files.
	(announce_snapshot): Announce xz tarball instead of bz2 tarball.
	(XZ): New definition.
	(<toplevel>): Look for both bz2 and xz compressed old tarballs.

Comments

Joseph Myers May 15, 2017, 2:02 p.m. UTC | #1
The xz manpage warns against blindly using -9 (for which --best is a 
deprecated alias) because of the implications for memory requirements for 
decompressing.  If there's a reason it's considered appropriate here, I 
think it needs an explanatory comment.
Markus Trippelsdorf May 15, 2017, 2:13 p.m. UTC | #2
On 2017.05.15 at 14:02 +0000, Joseph Myers wrote:
> The xz manpage warns against blindly using -9 (for which --best is a 
> deprecated alias) because of the implications for memory requirements for 
> decompressing.  If there's a reason it's considered appropriate here, I 
> think it needs an explanatory comment.

I think it is unacceptable, because it would increase memory usage when
decompressing over 20x compared to bz2 (and over 100x while compressing).

The default -6 should be good enough (3x more memory when decompressing).
Jakub Jelinek May 15, 2017, 2:24 p.m. UTC | #3
On Mon, May 15, 2017 at 04:13:44PM +0200, Markus Trippelsdorf wrote:
> On 2017.05.15 at 14:02 +0000, Joseph Myers wrote:
> > The xz manpage warns against blindly using -9 (for which --best is a 
> > deprecated alias) because of the implications for memory requirements for 
> > decompressing.  If there's a reason it's considered appropriate here, I 
> > think it needs an explanatory comment.
> 
> I think it is unacceptable, because it would increase memory usage when
> decompressing over 20x compared to bz2 (and over 100x while compressing).

The memory using during compressing isn't that interesting as long as it
isn't prohibitive for sourceware or the machines RMs use.
For the decompression, I guess it matters what is actually the memory needed
for decompression the -9 gcc tarball, and compare that to minimal memory
requirements to compile (not bootstrap) the compiler using typical system
compilers.  If compilation of gcc takes more memory than the decompression,
then it should be fine, why would anyone try to decompress gcc not to build
it afterwards?

	Jakub
Markus Trippelsdorf May 15, 2017, 7:04 p.m. UTC | #4
On 2017.05.15 at 16:24 +0200, Jakub Jelinek wrote:
> On Mon, May 15, 2017 at 04:13:44PM +0200, Markus Trippelsdorf wrote:
> > On 2017.05.15 at 14:02 +0000, Joseph Myers wrote:
> > > The xz manpage warns against blindly using -9 (for which --best is a 
> > > deprecated alias) because of the implications for memory requirements for 
> > > decompressing.  If there's a reason it's considered appropriate here, I 
> > > think it needs an explanatory comment.
> > 
> > I think it is unacceptable, because it would increase memory usage when
> > decompressing over 20x compared to bz2 (and over 100x while compressing).
> 
> The memory using during compressing isn't that interesting as long as it
> isn't prohibitive for sourceware or the machines RMs use.
> For the decompression, I guess it matters what is actually the memory needed
> for decompression the -9 gcc tarball, and compare that to minimal memory
> requirements to compile (not bootstrap) the compiler using typical system
> compilers.  If compilation of gcc takes more memory than the decompression,
> then it should be fine, why would anyone try to decompress gcc not to build
> it afterwards?

Ok, it doesn't really matter. With gcc-7.1 tarball:

size: 533084160 (uncompressed)

-9:
 xz -d gcc.tar.xz
4.71user 0.26system 0:04.97elapsed 100%CPU (0avgtext+0avgdata 67804maxresident)k
size: 60806928

-6 (default):
 xz -d gcc.tar.xz
4.88user 0.28system 0:05.17elapsed 99%CPU (0avgtext+0avgdata 10324maxresident)k
size: 65059664

So -9 is actually just fine.
Richard Biener May 18, 2017, 10:34 a.m. UTC | #5
On Mon, May 15, 2017 at 3:11 AM, Matthias Klose <doko@ubuntu.com> wrote:
> As discussed on IRC with Jakub and Richard here are is a small patch which
> builds xz compressed tarballs and diff files.
>
> Tested with
>
>   maintainer-scripts/gcc_release \
>         -s snap:trunk -p <old bz2 tarball> diffs sources tarfiles
>   maintainer-scripts/gcc_release \
>         -s snap:trunk -p <old xz tarball> diffs sources tarfiles
>
> and checked that the new tarball and diff files are compressed using xz.
>
> Ok for the trunk and the gcc-7-branch?

Ok.  The version on trunk can get the bz2 old-tar support removed after the next
snapshot generation I  think.  Likewise the branch version after 7.2
was released.

Richard.

> Matthias
>
Matthias Klose May 23, 2017, 11:15 p.m. UTC | #6
On 15.05.2017 12:04, Markus Trippelsdorf wrote:
> On 2017.05.15 at 16:24 +0200, Jakub Jelinek wrote:
>> On Mon, May 15, 2017 at 04:13:44PM +0200, Markus Trippelsdorf wrote:
>>> On 2017.05.15 at 14:02 +0000, Joseph Myers wrote:
>>>> The xz manpage warns against blindly using -9 (for which --best is a 
>>>> deprecated alias) because of the implications for memory requirements for 
>>>> decompressing.  If there's a reason it's considered appropriate here, I 
>>>> think it needs an explanatory comment.
>>>
>>> I think it is unacceptable, because it would increase memory usage when
>>> decompressing over 20x compared to bz2 (and over 100x while compressing).
>>
>> The memory using during compressing isn't that interesting as long as it
>> isn't prohibitive for sourceware or the machines RMs use.
>> For the decompression, I guess it matters what is actually the memory needed
>> for decompression the -9 gcc tarball, and compare that to minimal memory
>> requirements to compile (not bootstrap) the compiler using typical system
>> compilers.  If compilation of gcc takes more memory than the decompression,
>> then it should be fine, why would anyone try to decompress gcc not to build
>> it afterwards?
> 
> Ok, it doesn't really matter. With gcc-7.1 tarball:
> 
> size: 533084160 (uncompressed)
> 
> -9:
>  xz -d gcc.tar.xz
> 4.71user 0.26system 0:04.97elapsed 100%CPU (0avgtext+0avgdata 67804maxresident)k
> size: 60806928
> 
> -6 (default):
>  xz -d gcc.tar.xz
> 4.88user 0.28system 0:05.17elapsed 99%CPU (0avgtext+0avgdata 10324maxresident)k
> size: 65059664
> 
> So -9 is actually just fine.

ok, updated the script to use xz --best by default. trunk and the gcc-7-branch.

Matthias
Matthias Klose May 23, 2017, 11:22 p.m. UTC | #7
On 18.05.2017 03:34, Richard Biener wrote:
> On Mon, May 15, 2017 at 3:11 AM, Matthias Klose <doko@ubuntu.com> wrote:
>> As discussed on IRC with Jakub and Richard here are is a small patch which
>> builds xz compressed tarballs and diff files.
>>
>> Tested with
>>
>>   maintainer-scripts/gcc_release \
>>         -s snap:trunk -p <old bz2 tarball> diffs sources tarfiles
>>   maintainer-scripts/gcc_release \
>>         -s snap:trunk -p <old xz tarball> diffs sources tarfiles
>>
>> and checked that the new tarball and diff files are compressed using xz.
>>
>> Ok for the trunk and the gcc-7-branch?
> 
> Ok.  The version on trunk can get the bz2 old-tar support removed after the next
> snapshot generation I  think.  Likewise the branch version after 7.2
> was released.

Looks like the copy of the script on gcc.gnu.org affects all active branches.
the May 23 GCC 5 snapshot was created successfully.  Is this acceptable? If yes,
then the patch should probably go to the 5 and 6 branches as well.

Please copy the script again to enable the xz --best compression.

Matthias
Richard Biener May 24, 2017, 7:17 a.m. UTC | #8
On May 24, 2017 1:22:42 AM GMT+02:00, Matthias Klose <doko@ubuntu.com> wrote:
>On 18.05.2017 03:34, Richard Biener wrote:
>> On Mon, May 15, 2017 at 3:11 AM, Matthias Klose <doko@ubuntu.com>
>wrote:
>>> As discussed on IRC with Jakub and Richard here are is a small patch
>which
>>> builds xz compressed tarballs and diff files.
>>>
>>> Tested with
>>>
>>>   maintainer-scripts/gcc_release \
>>>         -s snap:trunk -p <old bz2 tarball> diffs sources tarfiles
>>>   maintainer-scripts/gcc_release \
>>>         -s snap:trunk -p <old xz tarball> diffs sources tarfiles
>>>
>>> and checked that the new tarball and diff files are compressed using
>xz.
>>>
>>> Ok for the trunk and the gcc-7-branch?
>> 
>> Ok.  The version on trunk can get the bz2 old-tar support removed
>after the next
>> snapshot generation I  think.  Likewise the branch version after 7.2
>> was released.
>
>Looks like the copy of the script on gcc.gnu.org affects all active
>branches.

Yes.  Only the trunk script is actually used, so ...

>the May 23 GCC 5 snapshot was created successfully.  Is this
>acceptable? If yes,
>then the patch should probably go to the 5 and 6 branches as well.

... This isn't really necessary.

>Please copy the script again to enable the xz --best compression.
>
>Matthias
diff mbox

Patch

Index: maintainer-scripts/gcc_release
===================================================================
--- maintainer-scripts/gcc_release	(revision 248041)
+++ maintainer-scripts/gcc_release	(working copy)
@@ -221,7 +221,7 @@ 
   # Create a "MD5SUMS" file to use for checking the validity of the release.
   echo \
 "# This file contains the MD5 checksums of the files in the 
-# gcc-"${RELEASE}".tar.bz2 tarball.
+# gcc-"${RELEASE}".tar.xz tarball.
 #
 # Besides verifying that all files in the tarball were correctly expanded,
 # it also can be used to determine if any files have changed since the
@@ -244,11 +244,11 @@ 
 
 build_tarfile() {
   # Get the name of the destination tar file.
-  TARFILE="$1.tar.bz2"
+  TARFILE="$1.tar.xz"
   shift
 
   # Build the tar file itself.
-  (${TAR} cf - "$@" | ${BZIP2} > ${TARFILE}) || \
+  (${TAR} cf - "$@" | ${XZ} > ${TARFILE}) || \
     error "Could not build tarfile"
   FILE_LIST="${FILE_LIST} ${TARFILE}"
 }
@@ -273,8 +273,8 @@ 
 # Build .gz files.
 build_gzip() {
   for f in ${FILE_LIST}; do
-    target=${f%.bz2}.gz
-    (${BZIP2} -d -c $f | ${GZIP} > ${target}) || error "Could not create ${target}"
+    target=${f%.xz}.gz
+    (${XZ} -d -c $f | ${GZIP} > ${target}) || error "Could not create ${target}"
   done
 }
 
@@ -282,12 +282,19 @@ 
 build_diffs() {
   old_dir=${1%/*}
   old_file=${1##*/}
-  old_vers=${old_file%.tar.bz2}
+  case "$old_file" in
+    *.tar.xz) old_vers=${old_file%.tar.xz};;
+    *) old_vers=${old_file%.tar.bz2};;
+  esac
   old_vers=${old_vers#gcc-}
   inform "Building diffs against version $old_vers"
   for f in gcc; do
-    old_tar=${old_dir}/${f}-${old_vers}.tar.bz2
-    new_tar=${WORKING_DIRECTORY}/${f}-${RELEASE}.tar.bz2
+    if [ -e ${old_dir}/${f}-${old_vers}.tar.xz ]; then
+      old_tar=${old_dir}/${f}-${old_vers}.tar.xz
+    else
+      old_tar=${old_dir}/${f}-${old_vers}.tar.bz2
+    fi
+    new_tar=${WORKING_DIRECTORY}/${f}-${RELEASE}.tar.xz
     if [ ! -e $old_tar ]; then
       inform "$old_tar not found; not generating diff file"
     elif [ ! -e $new_tar ]; then
@@ -294,7 +301,7 @@ 
       inform "$new_tar not found; not generating diff file"
     else
       build_diff $old_tar gcc-${old_vers} $new_tar gcc-${RELEASE} \
-        ${f}-${old_vers}-${RELEASE}.diff.bz2
+        ${f}-${old_vers}-${RELEASE}.diff.xz
     fi
   done
 }
@@ -305,13 +312,20 @@ 
   tmpdir=gccdiff.$$
   mkdir $tmpdir || error "Could not create directory $tmpdir"
   changedir $tmpdir
-  (${BZIP2} -d -c $1 | ${TAR} xf - ) || error "Could not unpack $1 for diffs"
-  (${BZIP2} -d -c $3 | ${TAR} xf - ) || error "Could not unpack $3 for diffs"
-  ${DIFF} $2 $4 > ../${5%.bz2}
+  case "$1" in
+    *.tar.bz2)
+      (${BZIP2} -d -c $1 | ${TAR} xf - ) || error "Could not unpack $1 for diffs"
+      ;;
+    *.tar.xz)
+      (${XZ} -d -c $1 | ${TAR} xf - ) || error "Could not unpack $1 for diffs"
+      ;;
+  esac
+  (${XZ} -d -c $3 | ${TAR} xf - ) || error "Could not unpack $3 for diffs"
+  ${DIFF} $2 $4 > ../${5%.xz}
   if [ $? -eq 2 ]; then
     error "Trouble making diffs from $1 to $3"
   fi
-  ${BZIP2} ../${5%.bz2} || error "Could not generate ../$5"
+  ${XZ} ../${5%.xz} || error "Could not generate ../$5"
   changedir ..
   rm -rf $tmpdir
   FILE_LIST="${FILE_LIST} $5"
@@ -335,7 +349,7 @@ 
   fi
 
   # Then copy files to their respective (sub)directories.
-  for x in gcc*.gz gcc*.bz2; do
+  for x in gcc*.gz gcc*.xz; do
     if [ -e ${x} ]; then
       # Make sure the file will be readable on the server.
       chmod a+r ${x}
@@ -410,7 +424,7 @@ 
 
 <table>" > ${SNAPSHOT_INDEX}
        
-  snapshot_print gcc-${RELEASE}.tar.bz2 "Complete GCC"
+  snapshot_print gcc-${RELEASE}.tar.xz "Complete GCC"
 
   echo \
 "Diffs from "${BRANCH}"-"${LAST_DATE}" are available in the diffs/ subdirectory.
@@ -528,12 +542,13 @@ 
 MODE_TARFILES=0
 MODE_UPLOAD=0
 
-# List of archive files generated; used to create .gz files from .bz2.
+# List of archive files generated; used to create .gz files from .xz.
 FILE_LIST=""
 
 # Programs we use.
 
 BZIP2="${BZIP2:-bzip2}"
+XZ="${XZ:-xz --best}"
 CVS="${CVS:-cvs -f -Q -z9}"
 DIFF="${DIFF:-diff -Nrcpad}"
 ENV="${ENV:-env}"
@@ -644,6 +659,9 @@ 
   if [ $MODE_DIFFS -ne 0 ] && [ $LOCAL -ne 0 ] && [ -z "${OLD_TARS}" ]; then
     LAST_DATE=`cat ~/.snapshot_date-${BRANCH}`
     OLD_TARS=${SNAPSHOTS_DIR}/${BRANCH}-${LAST_DATE}/gcc-${BRANCH}-${LAST_DATE}.tar.bz2
+    if [ ! -e $OLD_TARS ]; then
+      OLD_TARS=${SNAPSHOTS_DIR}/${BRANCH}-${LAST_DATE}/gcc-${BRANCH}-${LAST_DATE}.tar.xz
+    fi
   fi
 fi