From patchwork Tue Aug 6 06:16:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 1969325 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=I1/atc5/; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WdNQZ4DRjz1yZl for ; Tue, 6 Aug 2024 16:17:58 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6E207385B50B for ; Tue, 6 Aug 2024 06:17:56 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 9A4903858401 for ; Tue, 6 Aug 2024 06:16:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9A4903858401 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9A4903858401 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722924977; cv=none; b=jG8OYzRdBe9qL/+wdcgQXL6I9ANCwmpRqPqM0qDaO/2LrXlClStW3xjybzxmiwQ/AHqxXcqBEZO/wFaRO0dbba9ACz+5U6yncpndKuXRsucICib9efHCnWdmdkQ/iJ4arn6v1ZzrF55Qa4K0dd9dvIK6usTSNOJQesRTa6dn7YU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722924977; c=relaxed/simple; bh=BMn7zLVNPfG+1a3AC8itBynwfPVxcqOGrPU69R7svfk=; h=DKIM-Signature:From:To:Subject:Message-ID:Date:MIME-Version; b=kfvXAxaJbUM5kab+Ffcc346elHSHGpfZC12y8eVqKyLQv0ItuAjJ4Lk6+kdw+DV0lsrjCpNzDuaruwpvtsV0LpXupqXtuhYbDMtIVZuldRi/OwPblw9mN6+A4YiAJDobEueokLeOUFnPEIHOeEnbI076wi6l82eh8JcU/k2CImE= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722924975; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=yOTFDqNcnDNCPB+HZ0EhpRYoXOckAuIE7a2A6WO0bnA=; b=I1/atc5/FH3oT5696+jF5ulXDbaKrsQKR3jyGm3E9/sYs+OQZp/0ldsxq4PATfaNSw8Tko QAZdo4jFcBVbJcy495MFDCDLq4hCZAVL3EPKNXfb6yqYTMfucw9m2HwJjjDf1SBS5Szzj7 67gmTnfBi0jeFRA6De0H6miiuRpufNk= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-528-i1Hqm5PbMY-_WEjrKAnRVg-1; Tue, 06 Aug 2024 02:16:13 -0400 X-MC-Unique: i1Hqm5PbMY-_WEjrKAnRVg-1 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0008A19560A1 for ; Tue, 6 Aug 2024 06:16:12 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.45.224.40]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 243D11955F40 for ; Tue, 6 Aug 2024 06:16:11 +0000 (UTC) From: Florian Weimer To: libc-alpha@sourceware.org Subject: [PATCH v2 1/7] iconv: Base tests for buffer management In-Reply-To: Message-ID: <4bd28169831b2376c6d89e17dd535b41ea32f4fa.1722924862.git.fweimer@redhat.com> References: X-From-Line: 4bd28169831b2376c6d89e17dd535b41ea32f4fa Mon Sep 17 00:00:00 2001 Date: Tue, 06 Aug 2024 08:16:08 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org --- iconv/Makefile | 12 ++- iconv/tst-iconv_prog-buffer.sh | 177 +++++++++++++++++++++++++++++++++ 2 files changed, 187 insertions(+), 2 deletions(-) create mode 100644 iconv/tst-iconv_prog-buffer.sh diff --git a/iconv/Makefile b/iconv/Makefile index 65b4a44ab8..b0fa550141 100644 --- a/iconv/Makefile +++ b/iconv/Makefile @@ -76,8 +76,11 @@ include $(patsubst %,$(..)libof-iterator.mk,$(cpp-srcs-left)) ifeq ($(run-built-tests),yes) xtests-special += $(objpfx)test-iconvconfig.out -tests-special += $(objpfx)tst-iconv_prog.out -tests-special += $(objpfx)tst-translit-mchar.out +tests-special += \ + $(objpfx)tst-iconv_prog-buffer.out \ + $(objpfx)tst-iconv_prog.out \ + $(objpfx)tst-translit-mchar.out \ + # tests-special endif # Make a copy of the file because gconv module names are constructed @@ -141,3 +144,8 @@ $(objpfx)tst-translit-mchar.out: tst-translit-mchar.sh \ '$(run-program-env)' '$(run-program-prefix-after-env)' \ $< > $@; \ $(evaluate-test) + +$(objpfx)tst-iconv_prog-buffer.out: \ + tst-iconv_prog-buffer.sh $(objpfx)iconv_prog + $(BASH) $< $(common-objdir) '$(test-program-prefix)' > $@; \ + $(evaluate-test) diff --git a/iconv/tst-iconv_prog-buffer.sh b/iconv/tst-iconv_prog-buffer.sh new file mode 100644 index 0000000000..a27107f02b --- /dev/null +++ b/iconv/tst-iconv_prog-buffer.sh @@ -0,0 +1,177 @@ +#!/bin/bash +# Test for iconv (the program) buffer management. +# Copyright (C) 2024 Free Software Foundation, Inc. +# This file is part of the GNU C Library. + +# The GNU C Library is free software; you can redistribute it and/or +# modify it under the terms of the GNU Lesser General Public +# License as published by the Free Software Foundation; either +# version 2.1 of the License, or (at your option) any later version. + +# The GNU C Library is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# Lesser General Public License for more details. + +# You should have received a copy of the GNU Lesser General Public +# License along with the GNU C Library; if not, see +# . + +exec 2>&1 +set -e + +exec {logfd}>&1 + +codir=$1 +test_program_prefix="$2" + +# Use internal converters to avoid issues with module loading. +iconv_args="-f ASCII -t UTF-8" + +failure=false + +tmp=`mktemp -d` +trap 'rm -rf "$tmp"' 0 +echo ABC > "$tmp/abc" +echo DEF > "$tmp/def" +echo GGG > "$tmp/ggg" +echo HH > "$tmp/hh" +echo XY > "$tmp/xy" +echo ZT > "$tmp/zt" +echo OUT > "$tmp/out-template" +printf '\xff' > "$tmp/0xff" +cat "$tmp/xy" "$tmp/0xff" "$tmp/zt" > "$tmp/0xff-wrapped" + +run_iconv () { + local c=0 + if test "${FUNCNAME[2]}" = main; then + c=1 + fi + echo "${BASH_SOURCE[$c]}:${BASH_LINENO[$c]}: iconv $iconv_args $@" >&$logfd + $test_program_prefix $codir/iconv/iconv_prog $iconv_args "$@" +} + +check_out_expected () { + if ! cmp -s "$tmp/out" "$tmp/expected" ; then + echo "error: iconv output difference" >&$logfd + echo "*** expected ***" >&$logfd + cat "$tmp/expected" >&$logfd + echo "*** actual ***" >&$logfd + cat "$tmp/out" >&$logfd + failure=true + fi +} + +expect_files () { + local f + ! test -z "$1" + cp "$tmp/$1" "$tmp/expected" + shift + for f in "$@" ; do + cat "$tmp/$f" >> "$tmp/expected" + done + check_out_expected +} + +check_out () { + cat > "$tmp/expected" + check_out_expected +} + +expect_exit () { + local expected=$1 + shift + # Prevent failure for stopping the script. + if "$@" ; then + actual=$? + else + actual=$? + fi + if test "$actual" -ne "$expected"; then + echo "error: expected exit status $expected, not $actual" >&$logfd + exit 1 + fi +} + +ignore_failure () { + set +e + "$@" + status=$? + set -e +} + +# Concatentation test. +run_iconv -o "$tmp/out" "$tmp/abc" "$tmp/def" +expect_files abc def + +# Single-file in-place conversion. +run_iconv -o "$tmp/out" "$tmp/out" +expect_files abc def + +# Multiple input files with in-place conversion. + +run_iconv -o "$tmp/out" "$tmp/out" "$tmp/abc" +expect_files abc def abc + +# But not if we are writing to standard output. + +cp "$tmp/out-template" "$tmp/out" +run_iconv >"$tmp/out" +expect_files out-template + +cp "$tmp/out-template" "$tmp/out" +run_iconv - >"$tmp/out" +expect_files out-template + +cp "$tmp/out-template" "$tmp/out" +run_iconv /dev/null >>"$tmp/out" +expect_files out-template + +# Conversion errors should avoid clobbering an existing file if +# it is also an input file. + +cp "$tmp/0xff" "$tmp/out" +expect_exit 1 run_iconv -o "$tmp/out" "$tmp/out" +expect_files 0xff + +cp "$tmp/0xff" "$tmp/out" +expect_exit 1 run_iconv -o "$tmp/out" < "$tmp/out" +expect_files 0xff + +cp "$tmp/0xff" "$tmp/out" +expect_exit 1 run_iconv -o "$tmp/out" - < "$tmp/out" +expect_files 0xff + +# If errors are ignored, the file should be overwritten. + +cp "$tmp/out-template" "$tmp/out" +expect_exit 1 \ + run_iconv -c -o "$tmp/out" "$tmp/abc" "$tmp/0xff" "$tmp/def" 2>"$tmp/err" +! test -s "$tmp/err" +expect_files abc def + +# FIXME: This is not correct, -c should not change the exit status. +cp "$tmp/out-template" "$tmp/out" +run_iconv -c -o "$tmp/out" \ + "$tmp/abc" "$tmp/0xff-wrapped" "$tmp/def" 2>"$tmp/err" +! test -s "$tmp/err" +expect_files abc xy zt def + +# If the file does not exist yet, it should not be created on error. + +rm "$tmp/out" +expect_exit 1 run_iconv -o "$tmp/out" "$tmp/0xff" +! test -e "$tmp/out" + +expect_exit 1 run_iconv -o "$tmp/out" < "$tmp/0xff" +! test -e "$tmp/out" + +expect_exit 1 run_iconv -o "$tmp/out" "$tmp/abc" "$tmp/0xff" "$tmp/def" +! test -e "$tmp/out" + +expect_exit 1 run_iconv -o "$tmp/out" "$tmp/abc" - < "$tmp/0xff" "$tmp/def" +! test -e "$tmp/out" + +if $failure ; then + exit 1 +fi From patchwork Tue Aug 6 06:16:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 1969323 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=bYn0TiYI; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WdNPK0gN6z1yZl for ; Tue, 6 Aug 2024 16:16:53 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E981B3857C4F for ; Tue, 6 Aug 2024 06:16:50 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTP id 220EA3857000 for ; Tue, 6 Aug 2024 06:16:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 220EA3857000 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 220EA3857000 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722924985; cv=none; b=HnpQZsYzA8DkMw7ijMOYJSRd3wlut2BTLES2udJosfDwt7iGLmEeX1lynp8DkRSp64tCrgyFW9r4hDvjv4Us5D99LGyZHAtACx5jj2L44s7FJpFYqm0M5T68kVWuj6X4JFUbyxDqY6OYaY+8+wfhN4hAcB8kkaJVj4unaBrI9hU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722924985; c=relaxed/simple; bh=0k0HZjnJyeXIqVluUWXtWRjaANzQ2jkBcwohl/TTZeE=; h=DKIM-Signature:From:To:Subject:Message-ID:Date:MIME-Version; b=dNGYPAAnhUd1ZZRqi9Rfo1V6nr3XjnU67W5KYpfHG0+S2zGBi2+xJtS1ianHHx5zzmglGYBcBc61I9qO6r8eJc+4j3UEbXu8OwkbLhFghy4diz4nKHmGjpLLCO+KUaJaTV6Q1AdPzSfJOaX2TvCzF6KCEWLglZo4UDfBg1W92zY= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722924981; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zNq3rmR+2pnobwikdOUYj7G7b5vX0DY2crE801rbjlI=; b=bYn0TiYINVmRvASVC2WC2ByaSudHSPQEExMi+vGQhna8ZLxIgJhLibueWf1KhiH5Z4rG7I WfYeC6zDXUm65lOdEqJE5Fp9mtiK35ji8JCIBpx+mq4NAAsSkQ7u16Tz5KPhfTW4jgz2SJ A91CjNfTH6c/L9QJtGdDn1qOjPybdO0= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-303-mdlmcLRuM6-6C5PVtjba3w-1; Tue, 06 Aug 2024 02:16:19 -0400 X-MC-Unique: mdlmcLRuM6-6C5PVtjba3w-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id CC12E195609F for ; Tue, 6 Aug 2024 06:16:18 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.45.224.40]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EA6601955D42 for ; Tue, 6 Aug 2024 06:16:17 +0000 (UTC) From: Florian Weimer To: libc-alpha@sourceware.org Subject: [PATCH v2 2/7] iconv: Do not use mmap in iconv (the program) (bug 17703) In-Reply-To: Message-ID: <2a6a76d5d6d1fdf571f65e7e12f42eedb571d0a4.1722924862.git.fweimer@redhat.com> References: X-From-Line: 2a6a76d5d6d1fdf571f65e7e12f42eedb571d0a4 Mon Sep 17 00:00:00 2001 Date: Tue, 06 Aug 2024 08:16:14 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org On current systems, very large files are needed before mmap becomes beneficial. Simplify the implementation. This exposed that inptr was not initialized correctly in process_fd. Handling multiple input files resulted in EFAULT in read because a null pointer was passed. This could be observed previously if an input file was not mappable and was reported as bug 17703. --- iconv/iconv_prog.c | 42 +----------------------------------------- 1 file changed, 1 insertion(+), 41 deletions(-) diff --git a/iconv/iconv_prog.c b/iconv/iconv_prog.c index a765b1af21..88a928557e 100644 --- a/iconv/iconv_prog.c +++ b/iconv/iconv_prog.c @@ -31,9 +31,6 @@ #include #include #include -#ifdef _POSIX_MAPPED_FILES -# include -#endif #include #include #include "iconv_prog.h" @@ -253,10 +250,6 @@ conversions from `%s' and to `%s' are not supported"), else do { -#ifdef _POSIX_MAPPED_FILES - struct stat64 st; - char *addr; -#endif int fd, ret; if (verbose) @@ -276,39 +269,6 @@ conversions from `%s' and to `%s' are not supported"), } } -#ifdef _POSIX_MAPPED_FILES - /* We have possibilities for reading the input file. First try - to mmap() it since this will provide the fastest solution. */ - if (fstat64 (fd, &st) == 0 - && ((addr = mmap (NULL, st.st_size, PROT_READ, MAP_PRIVATE, - fd, 0)) != MAP_FAILED)) - { - /* Yes, we can use mmap(). The descriptor is not needed - anymore. */ - if (close (fd) != 0) - error (EXIT_FAILURE, errno, - _("error while closing input `%s'"), - argv[remaining]); - - ret = process_block (cd, addr, st.st_size, &output, - output_file); - - /* We don't need the input data anymore. */ - munmap ((void *) addr, st.st_size); - - if (ret != 0) - { - status = EXIT_FAILURE; - - if (ret < 0) - /* We cannot go on with producing output since it might - lead to problem because the last output might leave - the output stream in an undefined state. */ - break; - } - } - else -#endif /* _POSIX_MAPPED_FILES */ { /* Read the file in pieces. */ ret = process_fd (cd, fd, &output, output_file); @@ -544,7 +504,7 @@ process_fd (iconv_t cd, int fd, FILE **output, const char *output_file) process it in one step. */ static char *inbuf = NULL; static size_t maxlen = 0; - char *inptr = NULL; + char *inptr = inbuf; size_t actlen = 0; while (actlen < maxlen) From patchwork Tue Aug 6 06:16:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 1969327 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Y/Srh3s1; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WdNRQ1jhhz1yZl for ; Tue, 6 Aug 2024 16:18:42 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1A009385840F for ; Tue, 6 Aug 2024 06:18:40 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id A9162385DDDF for ; Tue, 6 Aug 2024 06:16:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A9162385DDDF Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A9162385DDDF Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722924992; cv=none; b=rSWp7rskVqzQSoFRsTkVvKku/uMHEf5/krRHTzjkfwog7gZ4f/jfwRYr6Ul6Lwf956S2vqs5aVGswDLBzJ5Nj7b/FL4dGfmulan9iSVnzYeaZ6k7sfINIBxIglipZT13LdO9RhXU/PF6+NE9RKXO6YXWg7WL4F0BgHBhRyq92Do= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722924992; c=relaxed/simple; bh=bAWqMZE1zaSz8VinSsFnP4l3GCAP4f5FNyk+07JO7Es=; h=DKIM-Signature:From:To:Subject:Message-ID:Date:MIME-Version; b=svmW5wyP8XDHn0GmKtCrTWrIVaAdPcglfLGJ3XBUvbZhFpqw7YNsoVfDaipUQ8h/bXynzjrQfWt6Z1QcWcSJBOKpAlnzyvhM5NJ9Ux8PaLyvEgLsyPvm5gf47l34Y396+mw8Iz49G/UQvp3gAYpOUTwKYIBtqJkukfQrJ/nPq20= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722924987; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=3ewcAO7IQJhHgdMSqy2nSu1HMrEkJ3jsKyYzzrCtvY4=; b=Y/Srh3s1Ctoo372A+GON+H5oUOTo6cicq0wYkQfqG9ef44NX+XWMm+DCVoDhEuKlpAxzDJ anZLykL6w5sHMvOYfc5OMC3uGMlZkq9PC/Dt/IzxzuXO9T/Esm3FOVzJp+tMUbcTPwLjmL 6235b7EdYyjCWjCjAzP98oumbVFNYpU= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-110-Y5HxT2j2PfWz7eG0t2jNQw-1; Tue, 06 Aug 2024 02:16:25 -0400 X-MC-Unique: Y5HxT2j2PfWz7eG0t2jNQw-1 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 160891955D48 for ; Tue, 6 Aug 2024 06:16:25 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.45.224.40]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1FAC11955F40 for ; Tue, 6 Aug 2024 06:16:23 +0000 (UTC) From: Florian Weimer To: libc-alpha@sourceware.org Subject: [PATCH v2 3/7] manual: __is_last is no longer part of iconv internals In-Reply-To: Message-ID: <0e6f0284e9a58bb2497d8dc73080960029cc5e8a.1722924862.git.fweimer@redhat.com> References: X-From-Line: 0e6f0284e9a58bb2497d8dc73080960029cc5e8a Mon Sep 17 00:00:00 2001 Date: Tue, 06 Aug 2024 08:16:20 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org The __is_last field was replaced with a bitmask in commit 85830c4c4688b30d3d76111aa9a26745c7b141d6 in 2000, and multiple bits are in use today. --- manual/charset.texi | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/manual/charset.texi b/manual/charset.texi index 427db3bc80..3aaa62d088 100644 --- a/manual/charset.texi +++ b/manual/charset.texi @@ -2422,11 +2422,11 @@ written into the buffer to signal how much output is available. If this conversion step is not the last one, the element must not be modified. The @code{__outbufend} element must not be modified. -@item int __is_last -This element is nonzero if this conversion step is the last one. This -information is necessary for the recursion. See the description of the -conversion function internals below. This element must never be -modified. +@item int __flags +This field is a set of flags. The @code{__GCONV_IS_LAST} bit is set if +this conversion step is the last one. This information is necessary for +the recursion. See the description of the conversion function internals +below. This element must never be modified. @item int __invocation_counter The conversion function can use this element to see how many calls of @@ -2731,8 +2731,8 @@ Otherwise the function has to emit a byte sequence to bring the state object into the initial state. Once this all happened the other conversion modules in the chain of conversions have to get the same chance. Whether another step follows can be determined from the -@code{__is_last} element of the step data structure to which the first -parameter points. +@code{__GCONV_IS_LAST} flag in the @code{__flags} field of the step +data structure to which the first parameter points. The more interesting mode is when actual text has to be converted. The first step in this case is to convert as much text as possible from the @@ -2866,7 +2866,7 @@ gconv (struct __gconv_step *step, struct __gconv_step_data *data, /* @r{Call the steps down the chain if there are any but only} @r{if we successfully emitted the escape sequence.} */ - if (status == __GCONV_OK && ! data->__is_last) + if (status == __GCONV_OK && ! (data->__flags & __GCONV_IS_LAST)) status = fct (next_step, next_data, NULL, NULL, written, 1); @} @@ -2892,7 +2892,7 @@ gconv (struct __gconv_step *step, struct __gconv_step_data *data, /* @r{If this is the last step, leave the loop. There is} @r{nothing we can do.} */ - if (data->__is_last) + if (data->__flags & __GCONV_IS_LAST) @{ /* @r{Store information about how many bytes are} @r{available.} */ From patchwork Tue Aug 6 06:16:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 1969328 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=WdloymIM; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WdNSM60fQz1yZl for ; Tue, 6 Aug 2024 16:19:31 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 204FA385842A for ; Tue, 6 Aug 2024 06:19:30 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 782F8385C6D1 for ; Tue, 6 Aug 2024 06:16:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 782F8385C6D1 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 782F8385C6D1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722925002; cv=none; b=GWATAbFznhMBvHXoff+vRwrQxL3bXL5qsTFhEKYj7d1ih2EFepmkQCQY4qFVsvMreFqm31YHkUH+pCBmSWZvHFFYlIdMHK2NYvv7obq0pJZLBuHYxIWwYtYkbTNdoNUoTtAN6omNYzchT1grseD57mJ6cjxJ8pmfW/k5eiXybyU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722925002; c=relaxed/simple; bh=bpB6OtwuoO1oeDwcrzSJQEJsyu3w7CAfUDkN5fUSGzg=; h=DKIM-Signature:From:To:Subject:Message-ID:Date:MIME-Version; b=Hq5/fixJYvKUzNTxOQrJ+KRqxstA6AwQVIcijp332U9x8hymkm0lT2YkWg1/0Iaa653WJbA3jUwy85iZp43z0wTWU4tlzbLufQxG6KiwP8eIH3TmixnyYtLPKYxpy1nh5e/PAcgW054U4htx4UnZShCv2Z8PXlh1MjEe0Slsvqg= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722924994; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XgkAbAXuU8U7eBQgbOfo9e1CQ2YdtEB7ty79VEep50o=; b=WdloymIMClOT/dLmR+poA1bP4y6nVXsev7BeBqnJg8WzKDET1X2bm5NqA7PDQtRMRUyLxB PuuDwLutdszEBcGY1e9d94GvNqgqmLKZVIV6Br5cbDf2sbcNj/LBvHM1glsLYQTDXLyG+Q 9X3zzgyAEQGVl9FFLODn+PUxv7qyzMg= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-244-PMYXW-64P-m-0f81xO2rRA-1; Tue, 06 Aug 2024 02:16:32 -0400 X-MC-Unique: PMYXW-64P-m-0f81xO2rRA-1 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id BBBE91955D42 for ; Tue, 6 Aug 2024 06:16:31 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.45.224.40]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1D5E11955F40 for ; Tue, 6 Aug 2024 06:16:29 +0000 (UTC) From: Florian Weimer To: libc-alpha@sourceware.org Subject: [PATCH v2 4/7] iconv: Preserve iconv -c error exit on invalid inputs (bug 32046) In-Reply-To: Message-ID: References: X-From-Line: a4c95d1ac67fc42b17c8c1f8835ed563260a659e Mon Sep 17 00:00:00 2001 Date: Tue, 06 Aug 2024 08:16:26 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org In several converters, a __GCONV_ILLEGAL_INPUT result gets overwritten with __GCONV_FULL_OUTPUT. As a result, iconv (the function) returns E2BIG instead of EILSEQ. The iconv program does not see the original EILSEQ failure, does not recognize the invalid input, and may incorrectly exit successfully. To address this, a new __flags bit is used to indicate a sticky input error state. All __GCONV_ILLEGAL_INPUT results are replaced with a function call that sets this new __GCONV_ENCOUNTERED_ILLEGAL_INPUT and returns __GCONV_ILLEGAL_INPUT. The iconv program checks for __GCONV_ENCOUNTERED_ILLEGAL_INPUT and overrides the exit status. The converter changes introducing __gconv_mark_illegal_input are mostly mechanical, except for the res variable initialization in iconvdata/iso-2022-jp.c: this error gets overwritten with __GCONV_OK and other results in the following code. If res == __GCONV_ILLEGAL_INPUT afterwards, STANDARD_TO_LOOP_ERR_HANDLER below will handle it. The __gconv_mark_illegal_input changes do not alter the errno value set by the iconv function. This is simpler to implement than reviewing each __GCONV_FULL_OUTPUT result and adjust it not to override a previous __GCONV_ILLEGAL_INPUT result. Doing it that way would also change some E2BIG errors in to EILSEQ errors, so it had to be done conditionally (under a flag set by the iconv program only), to avoid confusing buffer management in other applications. --- iconv/Makefile | 4 + iconv/gconv_int.h | 30 ++++++ iconv/gconv_simple.c | 18 ++-- iconv/gconv_trans.c | 2 +- iconv/iconv_prog.c | 5 + iconv/loop.c | 5 +- iconv/tst-iconv-sticky-input-error.c | 135 +++++++++++++++++++++++ iconv/tst-iconv_prog-buffer.sh | 3 +- iconvdata/cp932.c | 6 +- iconvdata/euc-jp-ms.c | 8 +- iconvdata/gbbig5.c | 4 +- iconvdata/ibm1364.c | 8 +- iconvdata/iso646.c | 154 +++++++++++++-------------- iconvdata/unicode.c | 2 +- iconvdata/utf-16.c | 2 +- iconvdata/utf-32.c | 2 +- 16 files changed, 280 insertions(+), 108 deletions(-) create mode 100644 iconv/tst-iconv-sticky-input-error.c diff --git a/iconv/Makefile b/iconv/Makefile index b0fa550141..29e4f280ec 100644 --- a/iconv/Makefile +++ b/iconv/Makefile @@ -61,6 +61,10 @@ test-srcs := \ tst-translit-mchar \ # test-srcs +tests-internal = \ + tst-iconv-sticky-input-error \ + # tests-internal + others = iconv_prog iconvconfig install-others-programs = $(inst_bindir)/iconv install-sbin = iconvconfig diff --git a/iconv/gconv_int.h b/iconv/gconv_int.h index 9fece3ea14..cd452d94cc 100644 --- a/iconv/gconv_int.h +++ b/iconv/gconv_int.h @@ -331,4 +331,34 @@ extern wint_t __gconv_btwoc_ascii (struct __gconv_step *step, unsigned char c); __END_DECLS +/* Internal extensions for . */ + +/* Internal flags for __flags in struct __gconv_step_data. Overlaps + with flags for __gconv_open. */ +enum + { + /* The conversion encountered an illegal input character at one + point. */ + __GCONV_ENCOUNTERED_ILLEGAL_INPUT = 1U << 30, + }; + +/* Mark *STEP_DATA as having seen illegal input, and return + __GCONV_ILLEGAL_INPUT. */ +static inline int +__gconv_mark_illegal_input (struct __gconv_step_data *step_data) +{ + step_data->__flags |= __GCONV_ENCOUNTERED_ILLEGAL_INPUT; + return __GCONV_ILLEGAL_INPUT; +} + +/* Returns true if any of the conversion steps encountered illegal input. */ +static _Bool __attribute__ ((unused)) +__gconv_has_illegal_input (__gconv_t cd) +{ + for (size_t i = 0; i < cd->__nsteps; ++i) + if (cd->__data[i].__flags & __GCONV_ENCOUNTERED_ILLEGAL_INPUT) + return true; + return false; +} + #endif /* gconv_int.h */ diff --git a/iconv/gconv_simple.c b/iconv/gconv_simple.c index 257be2f8ff..f22002cf81 100644 --- a/iconv/gconv_simple.c +++ b/iconv/gconv_simple.c @@ -207,7 +207,7 @@ ucs4_internal_loop (struct __gconv_step *step, UCS4 does not allow such values. */ if (irreversible == NULL) /* We are transliterating, don't try to correct anything. */ - return __GCONV_ILLEGAL_INPUT; + return __gconv_mark_illegal_input (step_data); if (flags & __GCONV_IGNORE_ERRORS) { @@ -218,7 +218,7 @@ ucs4_internal_loop (struct __gconv_step *step, *inptrp = inptr; *outptrp = outptr; - return __GCONV_ILLEGAL_INPUT; + return __gconv_mark_illegal_input (step_data); } put32 (outptr, inval); @@ -276,7 +276,7 @@ ucs4_internal_loop_single (struct __gconv_step *step, if (!(flags & __GCONV_IGNORE_ERRORS)) { *inptrp -= cnt - (state->__count & 7); - return __GCONV_ILLEGAL_INPUT; + return __gconv_mark_illegal_input (step_data); } } else @@ -453,7 +453,7 @@ ucs4le_internal_loop (struct __gconv_step *step, UCS4 does not allow such values. */ if (irreversible == NULL) /* We are transliterating, don't try to correct anything. */ - return __GCONV_ILLEGAL_INPUT; + return __gconv_mark_illegal_input (step_data); if (flags & __GCONV_IGNORE_ERRORS) { @@ -464,7 +464,7 @@ ucs4le_internal_loop (struct __gconv_step *step, *inptrp = inptr; *outptrp = outptr; - return __GCONV_ILLEGAL_INPUT; + return __gconv_mark_illegal_input (step_data); } put32 (outptr, inval); @@ -523,7 +523,7 @@ ucs4le_internal_loop_single (struct __gconv_step *step, represent the result. This is a genuine bug in the input since UCS4 does not allow such values. */ if (!(flags & __GCONV_IGNORE_ERRORS)) - return __GCONV_ILLEGAL_INPUT; + return __gconv_mark_illegal_input (step_data); } else { @@ -969,7 +969,7 @@ ucs4le_internal_loop_single (struct __gconv_step *step, surrogates pass through, attackers could make a security \ hole exploit by synthesizing any desired plane 1-16 \ character. */ \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ if (! ignore_errors_p ()) \ break; \ inptr += 4; \ @@ -1012,7 +1012,7 @@ ucs4le_internal_loop_single (struct __gconv_step *step, them. (Catching this here is not security relevant.) */ \ if (! ignore_errors_p ()) \ { \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ inptr += 2; \ @@ -1061,7 +1061,7 @@ ucs4le_internal_loop_single (struct __gconv_step *step, character. */ \ if (! ignore_errors_p ()) \ { \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ inptr += 4; \ diff --git a/iconv/gconv_trans.c b/iconv/gconv_trans.c index 44f0fd849a..54c4f3a100 100644 --- a/iconv/gconv_trans.c +++ b/iconv/gconv_trans.c @@ -232,6 +232,6 @@ __gconv_transliterate (struct __gconv_step *step, } /* Haven't found a match. */ - return __GCONV_ILLEGAL_INPUT; + return __gconv_mark_illegal_input (step_data); } libc_hidden_def (__gconv_transliterate) diff --git a/iconv/iconv_prog.c b/iconv/iconv_prog.c index 88a928557e..5fe4fe7a6c 100644 --- a/iconv/iconv_prog.c +++ b/iconv/iconv_prog.c @@ -291,6 +291,11 @@ conversions from `%s' and to `%s' are not supported"), } while (++remaining < argc); + /* Ensure that iconv -c still exits with failure if iconv (the + function) has failed with E2BIG instead of EILSEQ. */ + if (__gconv_has_illegal_input (cd)) + status = EXIT_FAILURE; + /* Close the output file now. */ if (output != NULL && fclose (output)) error (EXIT_FAILURE, errno, _("error while closing output file")); diff --git a/iconv/loop.c b/iconv/loop.c index 5340dafc70..199fb28326 100644 --- a/iconv/loop.c +++ b/iconv/loop.c @@ -123,8 +123,7 @@ `continue' must reach certain points. */ #define STANDARD_FROM_LOOP_ERR_HANDLER(Incr) \ { \ - result = __GCONV_ILLEGAL_INPUT; \ - \ + result = __gconv_mark_illegal_input (step_data); \ if (! ignore_errors_p ()) \ break; \ \ @@ -142,7 +141,7 @@ points. */ #define STANDARD_TO_LOOP_ERR_HANDLER(Incr) \ { \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ \ if (irreversible == NULL) \ /* This means we are in call from __gconv_transliterate. In this \ diff --git a/iconv/tst-iconv-sticky-input-error.c b/iconv/tst-iconv-sticky-input-error.c new file mode 100644 index 0000000000..c93b5b5160 --- /dev/null +++ b/iconv/tst-iconv-sticky-input-error.c @@ -0,0 +1,135 @@ +/* Test __GCONV_ENCOUNTERED_ILLEGAL_INPUT, as used by iconv -c (bug 32046). + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* FROM is the input character set, TO the output character set. If + IGNORE is true, the iconv descriptor is set up in the same way as + iconv -c would. INPUT is the input string, EXPECTED_OUTPUT the + output. OUTPUT_LIMIT is a byte count, specifying how many input + bytes are passed to the iconv function on each invocation. */ +static void +one_direction (const char *from, const char *to, bool ignore, + const char *input, const char *expected_output, + size_t output_limit) +{ + if (test_verbose) + { + printf ("info: testing from=\"%s\" to=\"%s\" ignore=%d input=\"", + from, to, (int) ignore); + for (const char *p = input; *p != '\0'; ++p) + if (*p >= ' ' && *p <= '~' && *p != '\\' && *p != '"') + putchar (*p); + else + printf ("\\x%02x", *p & 0xff); + printf (" \" expected_output=\"%s\" output_limit=%zu\n", + expected_output, output_limit); + } + + __gconv_t cd; + if (ignore) + { + struct gconv_spec conv_spec; + TEST_VERIFY_EXIT (__gconv_create_spec (&conv_spec, from, to) + == &conv_spec); + conv_spec.ignore = true; + cd = (iconv_t) -1; + TEST_COMPARE (__gconv_open (&conv_spec, &cd, 0), __GCONV_OK); + } + else + cd = iconv_open (to, from); + TEST_VERIFY_EXIT (cd != (iconv_t) -1); + + char *input_ptr = (char *) input; + size_t input_len = strlen (input); + char output_buf[20]; + char *output_ptr = output_buf; + size_t output_len; + do + { + output_len = array_end (output_buf) - output_ptr; + if (output_len > output_limit) + /* Limit the buffer size as requested by the caller. */ + output_len = output_limit; + TEST_VERIFY_EXIT (output_len > 0); + if (input_len == 0) + /* Trigger final flush. */ + input_ptr = NULL; + char *old_input_ptr = input_ptr; + size_t ret = iconv (cd, &input_ptr, &input_len, + &output_ptr, &output_len); + if (ret == (size_t) -1) + { + if (errno != EILSEQ) + TEST_COMPARE (errno, E2BIG); + } + + if (input_ptr == old_input_ptr) + /* Avoid endless loop if stuck on an invalid input character. */ + break; + } + while (input_ptr != NULL); + + /* Test the sticky illegal input bit. */ + TEST_VERIFY (__gconv_has_illegal_input (cd)); + + TEST_COMPARE_BLOB (expected_output, strlen (expected_output), + output_buf, output_ptr - output_buf); + + TEST_COMPARE (iconv_close (cd), 0); +} + +static int +do_test (void) +{ + static const char charsets[][14] = + { + "ASCII", + "ASCII//IGNORE", + "UTF-8", + "UTF-8//IGNORE", + }; + + for (size_t from_idx = 0; from_idx < array_length (charsets); ++from_idx) + for (size_t to_idx = 0; to_idx < array_length (charsets); ++to_idx) + for (int do_ignore = 0; do_ignore < 2; ++do_ignore) + for (int limit = 1; limit < 5; ++limit) + for (int skip = 0; skip < 3; ++skip) + { + const char *expected_output; + if (do_ignore || strstr (charsets[to_idx], "//IGNORE") != NULL) + expected_output = "ABXY" + skip; + else + expected_output = "AB" + skip; + one_direction (charsets[from_idx], charsets[to_idx], do_ignore, + "AB\xffXY" + skip, expected_output, limit); + } + + return 0; +} + +#include diff --git a/iconv/tst-iconv_prog-buffer.sh b/iconv/tst-iconv_prog-buffer.sh index a27107f02b..5ff99a02a3 100644 --- a/iconv/tst-iconv_prog-buffer.sh +++ b/iconv/tst-iconv_prog-buffer.sh @@ -150,9 +150,8 @@ expect_exit 1 \ ! test -s "$tmp/err" expect_files abc def -# FIXME: This is not correct, -c should not change the exit status. cp "$tmp/out-template" "$tmp/out" -run_iconv -c -o "$tmp/out" \ +expect_exit 1 run_iconv -c -o "$tmp/out" \ "$tmp/abc" "$tmp/0xff-wrapped" "$tmp/def" 2>"$tmp/err" ! test -s "$tmp/err" expect_files abc xy zt def diff --git a/iconvdata/cp932.c b/iconvdata/cp932.c index bf7297b114..3def70a70b 100644 --- a/iconvdata/cp932.c +++ b/iconvdata/cp932.c @@ -4559,7 +4559,7 @@ static const char from_ucs4_extra[229][2] = if (! ignore_errors_p ()) \ { \ /* This is an illegal character. */ \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ \ @@ -4599,7 +4599,7 @@ static const char from_ucs4_extra[229][2] = if (! ignore_errors_p ()) \ { \ /* This is an illegal character. */ \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ \ @@ -4634,7 +4634,7 @@ static const char from_ucs4_extra[229][2] = if (! ignore_errors_p ()) \ { \ /* This is an illegal character. */ \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ \ diff --git a/iconvdata/euc-jp-ms.c b/iconvdata/euc-jp-ms.c index d03a0e05bb..96c5325e9d 100644 --- a/iconvdata/euc-jp-ms.c +++ b/iconvdata/euc-jp-ms.c @@ -4659,7 +4659,7 @@ static const unsigned char from_ucs4_extra[229][2] = /* This is illegal. */ \ if (! ignore_errors_p ()) \ { \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ \ @@ -4689,7 +4689,7 @@ static const unsigned char from_ucs4_extra[229][2] = /* This is an illegal character. */ \ if (! ignore_errors_p ()) \ { \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ \ @@ -4709,7 +4709,7 @@ static const unsigned char from_ucs4_extra[229][2] = if (! ignore_errors_p ()) \ { \ /* This is an illegal character. */ \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ } \ @@ -4820,7 +4820,7 @@ static const unsigned char from_ucs4_extra[229][2] = if (! ignore_errors_p ()) \ { \ /* This is an illegal character. */ \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ \ diff --git a/iconvdata/gbbig5.c b/iconvdata/gbbig5.c index f05deeeb1a..b15a2ef932 100644 --- a/iconvdata/gbbig5.c +++ b/iconvdata/gbbig5.c @@ -4831,7 +4831,7 @@ const char __from_big5_to_gb2312 [13973][2] = { \ /* We do not have a mapping for this character. \ If ignore errors, map it to 0xa1bc - big5 box character */ \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ if (! ignore_errors_p ()) \ break; \ \ @@ -4922,7 +4922,7 @@ const char __from_big5_to_gb2312 [13973][2] = { \ /* We do not have a mapping for this character. \ If ignore errors, map it to 0xa1f5 - gb box character */ \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ if (! ignore_errors_p ()) \ break; \ \ diff --git a/iconvdata/ibm1364.c b/iconvdata/ibm1364.c index 4c37f30e98..d6c8ce7f68 100644 --- a/iconvdata/ibm1364.c +++ b/iconvdata/ibm1364.c @@ -179,7 +179,7 @@ enum /* This is an illegal character. */ \ if (! ignore_errors_p ()) \ { \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ ++*irreversible; \ @@ -219,7 +219,7 @@ enum /* This is an illegal character. */ \ if (! ignore_errors_p ()) \ { \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ ++*irreversible; \ @@ -300,7 +300,7 @@ enum \ if (! ignore_errors_p ()) \ { \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ ++*irreversible; \ @@ -332,7 +332,7 @@ enum /* This is an illegal character. */ \ if (! ignore_errors_p ()) \ { \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ break; \ } \ ++*irreversible; \ diff --git a/iconvdata/iso646.c b/iconvdata/iso646.c index d96e5f86b9..ba7e23f8ac 100644 --- a/iconvdata/iso646.c +++ b/iconvdata/iso646.c @@ -313,7 +313,7 @@ gconv_end (struct __gconv_step *data) ch = 0xf9; \ else if (var == JP_OCR_B) \ /* Illegal character. */ \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ else if (var == YU) \ ch = 0x17e; \ else if (var == HU) \ @@ -387,7 +387,7 @@ gconv_end (struct __gconv_step *data) ch = 0xec; \ else if (var == JP_OCR_B) \ /* Illegal character. */ \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ else if (var == YU) \ ch = 0x10d; \ else if (var == HU) \ @@ -403,7 +403,7 @@ gconv_end (struct __gconv_step *data) break; \ case 0x80 ... 0xff: \ /* Illegal character. */ \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ } \ \ @@ -440,17 +440,17 @@ gconv_end (struct __gconv_step *data) case 0x23: \ if (var == GB || var == ES || var == IT || var == FR || var == FR1 \ || var == NO2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0x24: \ if (var == CN || var == HU || var == CU || var == SE || var == SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0x40: \ if (var == CA || var == CA2 || var == DE || var == ES || var == ES2 \ || var == IT || var == YU || var == HU || var == FR || var == FR1 \ || var == PT || var == PT2 || var == SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0x5b: \ if (var == CA || var == CA2 || var == DE || var == DK || var == ES \ @@ -458,7 +458,7 @@ gconv_end (struct __gconv_step *data) || var == HU || var == FR || var == FR1 || var == NO \ || var == NO2 || var == PT || var == PT2 || var == SE \ || var == SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ else if (var == CU) \ ch = 0x7d; \ break; \ @@ -468,7 +468,7 @@ gconv_end (struct __gconv_step *data) || var == YU || var == KR || var == HU || var == CU || var == FR \ || var == FR1 || var == NO || var == NO2 || var == PT \ || var == PT2 || var == SE || var == SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0x5d: \ if (var == CA || var == CA2 || var == DE || var == DK || var == ES \ @@ -476,17 +476,17 @@ gconv_end (struct __gconv_step *data) || var == HU || var == FR || var == FR1 || var == NO \ || var == NO2 || var == PT || var == PT2 || var == SE \ || var == SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0x5e: \ if (var == CA || var == CA2 || var == ES2 || var == YU || var == CU \ || var == SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0x60: \ if (var == CA || var == CA2 || var == IT || var == JP_OCR_B \ || var == YU || var == HU || var == FR || var == SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0x7b: \ if (var == CA || var == CA2 || var == DE || var == DK || var == ES \ @@ -494,14 +494,14 @@ gconv_end (struct __gconv_step *data) || var == CU || var == FR || var == FR1 || var == NO \ || var == NO2 || var == PT || var == PT2 || var == SE \ || var == SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0x7c: \ if (var == CA || var == CA2 || var == DE || var == DK || var == ES \ || var == ES2 || var == IT || var == YU || var == HU || var == CU \ || var == FR || var == FR1 || var == NO || var == PT \ || var == PT2 || var == SE || var == SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ else if (var == NO2) \ ch = 0x7e; \ break; \ @@ -510,7 +510,7 @@ gconv_end (struct __gconv_step *data) || var == ES2 || var == IT || var == YU || var == HU || var == CU \ || var == FR || var == FR1 || var == NO || var == NO2 \ || var == PT || var == PT2 || var == SE || var == SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0x7e: \ if (var == GB || var == CA || var == CA2 || var == DE || var == ES2 \ @@ -518,21 +518,21 @@ gconv_end (struct __gconv_step *data) || var == YU || var == HU || var == CU || var == FR || var == FR1 \ || var == NO || var == NO2 || var == PT || var == SE \ || var == SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xa1: \ if (var != ES && var != ES2 && var != CU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5b; \ break; \ case 0xa3: \ if (var != GB && var != ES && var != IT && var != FR && var != FR1) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x23; \ break; \ case 0xa4: \ if (var != HU && var != CU && var != SE && var != SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x24; \ break; \ case 0xa5: \ @@ -541,7 +541,7 @@ gconv_end (struct __gconv_step *data) else if (var == JP || var == JP_OCR_B) \ ch = 0x5c; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xa7: \ if (var == DE || var == ES || var == IT || var == PT) \ @@ -551,11 +551,11 @@ gconv_end (struct __gconv_step *data) else if (var == NO2) \ ch = 0x23; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xa8: \ if (var != ES2 && var != CU && var != FR && var != FR1) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7e; \ break; \ case 0xb0: \ @@ -566,7 +566,7 @@ gconv_end (struct __gconv_step *data) else if (var == PT) \ ch = 0x7e; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xb4: \ if (var == ES2 || var == CU) \ @@ -574,11 +574,11 @@ gconv_end (struct __gconv_step *data) else if (var == PT2) \ ch = 0x40; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xb5: \ if (var != FR) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x60; \ break; \ case 0xbf: \ @@ -587,31 +587,31 @@ gconv_end (struct __gconv_step *data) else if (var == ES2 || var == CU) \ ch = 0x5e; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xc1: \ if (var != HU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x40; \ break; \ case 0xc3: \ if (var != PT && var != PT2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5b; \ break; \ case 0xc4: \ if (var != DE && var != SE && var != SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5b; \ break; \ case 0xc5: \ if (var != DK && var != NO && var != NO2 && var != SE && var != SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5d; \ break; \ case 0xc6: \ if (var != DK && var != NO && var != NO2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5b; \ break; \ case 0xc7: \ @@ -620,7 +620,7 @@ gconv_end (struct __gconv_step *data) else if (var == PT || var == PT2) \ ch = 0x5c; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xc9: \ if (var == CA2) \ @@ -630,26 +630,26 @@ gconv_end (struct __gconv_step *data) else if (var == SE2) \ ch = 0x40; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xd1: \ if (var != ES && var != ES2 && var != CU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5c; \ break; \ case 0xd5: \ if (var != PT && var != PT2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5d; \ break; \ case 0xd6: \ if (var != DE && var != HU && var != SE && var != SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5c; \ break; \ case 0xd8: \ if (var != DK && var != NO && var != NO2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5c; \ break; \ case 0xdc: \ @@ -658,11 +658,11 @@ gconv_end (struct __gconv_step *data) else if (var == SE2) \ ch = 0x5e; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xdf: \ if (var != DE) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7e; \ break; \ case 0xe0: \ @@ -671,36 +671,36 @@ gconv_end (struct __gconv_step *data) else if (var == IT) \ ch = 0x7b; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xe1: \ if (var != HU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x60; \ break; \ case 0xe2: \ if (var != CA && var != CA2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5b; \ break; \ case 0xe3: \ if (var != PT && var != PT2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7b; \ break; \ case 0xe4: \ if (var != DE && var != SE && var != SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7b; \ break; \ case 0xe5: \ if (var != DK && var != NO && var != NO2 && var != SE && var != SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7d; \ break; \ case 0xe6: \ if (var != DK && var != NO && var != NO2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7b; \ break; \ case 0xe7: \ @@ -711,11 +711,11 @@ gconv_end (struct __gconv_step *data) else if (var == PT || var == PT2) \ ch = 0x7c; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xe8: \ if (var != CA && var != CA2 && var != IT && var != FR && var != FR1) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7d; \ break; \ case 0xe9: \ @@ -726,51 +726,51 @@ gconv_end (struct __gconv_step *data) else if (var == SE2) \ ch = 0x60; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xea: \ if (var != CA && var != CA2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5d; \ break; \ case 0xec: \ if (var != IT) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7e; \ break; \ case 0xee: \ if (var != CA) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5e; \ break; \ case 0xf1: \ if (var != ES && var != ES2 && var != CU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7c; \ break; \ case 0xf2: \ if (var != IT) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7c; \ break; \ case 0xf4: \ if (var != CA && var != CA2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x60; \ break; \ case 0xf5: \ if (var != PT && var != PT2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7d; \ break; \ case 0xf6: \ if (var != DE && var != HU && var != SE && var != SE2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7c; \ break; \ case 0xf8: \ if (var != DK && var != NO && var != NO2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7c; \ break; \ case 0xf9: \ @@ -779,11 +779,11 @@ gconv_end (struct __gconv_step *data) else if (var == IT) \ ch = 0x60; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0xfb: \ if (var != CA && var != CA2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7e; \ break; \ case 0xfc: \ @@ -792,93 +792,93 @@ gconv_end (struct __gconv_step *data) else if (var == SE2) \ ch = 0x7e; \ else \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ break; \ case 0x160: \ if (var != YU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5b; \ break; \ case 0x106: \ if (var != YU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5d; \ break; \ case 0x107: \ if (var != YU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7d; \ break; \ case 0x10c: \ if (var != YU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5e; \ break; \ case 0x10d: \ if (var != YU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7e; \ break; \ case 0x110: \ if (var != YU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5c; \ break; \ case 0x111: \ if (var != YU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7c; \ break; \ case 0x161: \ if (var != YU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7b; \ break; \ case 0x17d: \ if (var != YU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x40; \ break; \ case 0x17e: \ if (var != YU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x60; \ break; \ case 0x2dd: \ if (var != HU) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7e; \ break; \ case 0x2022: \ if (var != ES2) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x40; \ break; \ case 0x203e: \ if (var != GB && var != CN && var != JP && var != NO && var != SE) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x7e; \ break; \ case 0x20a9: \ if (var != KR) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5c; \ break; \ case 0x2329: \ if (var != JP_OCR_B) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5b; \ break; \ case 0x232a: \ if (var != JP_OCR_B) \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ ch = 0x5d; \ break; \ default: \ if (__glibc_unlikely (ch > 0x7f)) \ { \ UNICODE_TAG_HANDLER (ch, 4); \ - failure = __GCONV_ILLEGAL_INPUT; \ + failure = __gconv_mark_illegal_input (step_data); \ } \ break; \ } \ diff --git a/iconvdata/unicode.c b/iconvdata/unicode.c index d69c9887a1..79bb9886e5 100644 --- a/iconvdata/unicode.c +++ b/iconvdata/unicode.c @@ -163,7 +163,7 @@ gconv_end (struct __gconv_step *data) surrogates pass through, attackers could make a security \ hole exploit by synthesizing any desired plane 1-16 \ character. */ \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ if (! ignore_errors_p ()) \ break; \ inptr += 4; \ diff --git a/iconvdata/utf-16.c b/iconvdata/utf-16.c index a869353f20..9d9fd2a2a3 100644 --- a/iconvdata/utf-16.c +++ b/iconvdata/utf-16.c @@ -206,7 +206,7 @@ gconv_end (struct __gconv_step *data) We must catch this. If we let surrogates pass through, \ attackers could make a security hole exploit by \ synthesizing any desired plane 1-16 character. */ \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ if (! ignore_errors_p ()) \ break; \ inptr += 4; \ diff --git a/iconvdata/utf-32.c b/iconvdata/utf-32.c index 5693b48975..139eefb6d8 100644 --- a/iconvdata/utf-32.c +++ b/iconvdata/utf-32.c @@ -207,7 +207,7 @@ gconv_end (struct __gconv_step *data) We must catch this. If we let surrogates pass through, \ attackers could make a security hole exploit by \ generating "irregular UTF-32" sequences. */ \ - result = __GCONV_ILLEGAL_INPUT; \ + result = __gconv_mark_illegal_input (step_data); \ if (! ignore_errors_p ()) \ break; \ inptr += 4; \ From patchwork Tue Aug 6 06:16:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 1969329 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JUgMku68; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WdNTG41pLz1yXs for ; Tue, 6 Aug 2024 16:20:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 664E8385DDE4 for ; Tue, 6 Aug 2024 06:20:16 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id E760D385840F for ; Tue, 6 Aug 2024 06:16:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E760D385840F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E760D385840F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722925006; cv=none; b=PjbrC5XjGVP6RcIG2Pe1Efq3HgCvYm4ICs6l6MIptGIQxnDRlbBgYOvtKXfCSB/i80Di1JuMf0KIxi29PX34eUplFQyhmRgO2cnGcvVgbvF4bdEXCnBkVV+MtgryXLcoP5yL08JZmnG7XA9v3Fdmi24gNO5kqrlu6puV6l32HdY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722925006; c=relaxed/simple; bh=U59h98rtN1iHlOZdzcdubEOR3Mko9R9SymXyEx01sNM=; h=DKIM-Signature:From:To:Subject:Message-ID:Date:MIME-Version; b=v0S6On2qKmn60d7HXM0kS+BZ6W39lWFmr6vSQWX0ws10nl9cMD5aiZLT7fUk61FDXGXBxumukCw9ECqo3AcuT/h9MYKMB4NRdJjN2fxV9g7HlcymEjFA6I+23oXI2e2U59QY5OWzEb9Eo857TAB6o2+XkVe28w08PV7kWfAPfLQ= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722925000; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=g2hTPIJRIdwSKoq1pEQx0jI9A83S4VryMo6xt98h/68=; b=JUgMku68hLEilEr2z5CGzCp0rUvl5A+zqbeogZEhuEazxL+JIKit36b4PiIqpNlj1jnBvX BtwrVZ1U6sQUOZ4FY+6xiAsvjMGD6Cf+I6D2HAPEN+J5HG2kp7wdVKZYWhyPZHHJFs33K1 Q4zgaNDECtxOLtnAeNc6xg9xgwRVQlM= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-524-UpFxJWppN2yYr5wxDfr_Jw-1; Tue, 06 Aug 2024 02:16:38 -0400 X-MC-Unique: UpFxJWppN2yYr5wxDfr_Jw-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 073E219560B3 for ; Tue, 6 Aug 2024 06:16:38 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.45.224.40]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 98CCE1955D42 for ; Tue, 6 Aug 2024 06:16:36 +0000 (UTC) From: Florian Weimer To: libc-alpha@sourceware.org Subject: [PATCH v2 5/7] iconv: Support in-place conversions (bug 10460, bug 32033) In-Reply-To: Message-ID: References: X-From-Line: ebafd2a79afcdd5626d84fce083f13d98cd9b2d8 Mon Sep 17 00:00:00 2001 Date: Tue, 06 Aug 2024 08:16:33 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org Check if any of the input files overlaps with the output file, and use a temporary file in this case, so that the input is no clobbered before it is read. This fixes bug 10460. It allows to use iconv more easily as a functional replacement for GNU recode. The updated output buffer management truncates the output file if there is no input, fixing bug 32033. --- NEWS | 3 + iconv/Makefile | 11 ++ iconv/iconv_prog.c | 351 ++++++++++++++++++++++++++------- iconv/tst-iconv_prog-buffer.sh | 96 ++++++++- 4 files changed, 386 insertions(+), 75 deletions(-) diff --git a/NEWS b/NEWS index d488874aba..34428e0bfe 100644 --- a/NEWS +++ b/NEWS @@ -25,6 +25,9 @@ Major new features: which is why this mode is not enabled by default. A future version of the library may turn it on by default, however. +* The iconv program now supports converting files in place. The program + automatically uses a temporary file if required. + Deprecated and removed features, and other changes affecting compatibility: [Add deprecations, removals and changes affecting compatibility here] diff --git a/iconv/Makefile b/iconv/Makefile index 29e4f280ec..c9af0c4d44 100644 --- a/iconv/Makefile +++ b/iconv/Makefile @@ -81,6 +81,8 @@ include $(patsubst %,$(..)libof-iterator.mk,$(cpp-srcs-left)) ifeq ($(run-built-tests),yes) xtests-special += $(objpfx)test-iconvconfig.out tests-special += \ + $(objpfx)tst-iconv_prog-buffer-large.out \ + $(objpfx)tst-iconv_prog-buffer-tiny.out \ $(objpfx)tst-iconv_prog-buffer.out \ $(objpfx)tst-iconv_prog.out \ $(objpfx)tst-translit-mchar.out \ @@ -153,3 +155,12 @@ $(objpfx)tst-iconv_prog-buffer.out: \ tst-iconv_prog-buffer.sh $(objpfx)iconv_prog $(BASH) $< $(common-objdir) '$(test-program-prefix)' > $@; \ $(evaluate-test) +$(objpfx)tst-iconv_prog-buffer-tiny.out: \ + tst-iconv_prog-buffer.sh $(objpfx)iconv_prog + $(BASH) $< $(common-objdir) '$(test-program-prefix)' \ + '--buffer-size=1' > $@; \ + $(evaluate-test) +$(objpfx)tst-iconv_prog-buffer-large.out: \ + tst-iconv_prog-buffer.sh $(objpfx)iconv_prog + $(BASH) $< $(common-objdir) '$(test-program-prefix)' '' '22' > $@; \ + $(evaluate-test) diff --git a/iconv/iconv_prog.c b/iconv/iconv_prog.c index 5fe4fe7a6c..3e02db7319 100644 --- a/iconv/iconv_prog.c +++ b/iconv/iconv_prog.c @@ -47,7 +47,11 @@ static void print_version (FILE *stream, struct argp_state *state); void (*argp_program_version_hook) (FILE *, struct argp_state *) = print_version; -#define OPT_VERBOSE 1000 +enum + { + OPT_VERBOSE = 1000, + OPT_BUFFER_SIZE, + }; #define OPT_LIST 'l' /* Definitions of arguments for argp functions. */ @@ -63,6 +67,10 @@ static const struct argp_option options[] = { "output", 'o', N_("FILE"), 0, N_("output file") }, { "silent", 's', NULL, 0, N_("suppress warnings") }, { "verbose", OPT_VERBOSE, NULL, 0, N_("print progress information") }, + /* This is an internal option intended for testing only. Very small + buffers do not work with all character sets. */ + { "buffer-size", OPT_BUFFER_SIZE, N_("BYTE-COUNT"), OPTION_HIDDEN, + N_("size of in-memory scratch buffer") }, { NULL, 0, NULL, 0, NULL } }; @@ -100,13 +108,20 @@ static int list; /* If nonzero omit invalid character from output. */ int omit_invalid; +/* Current index in argv (after command line processing) with the + input file name. */ +static int current_input_file_index; + +/* Size of the temporary, in-memory buffer. Exceeding it needs + spooling to disk in a temporary file. Controlled by --buffer_size. */ +static size_t output_buffer_size = 1024 * 1024; + /* Prototypes for the functions doing the actual work. */ -static int process_block (iconv_t cd, char *addr, size_t len, FILE **output, - const char *output_file); -static int process_fd (iconv_t cd, int fd, FILE **output, - const char *output_file); -static int process_file (iconv_t cd, FILE *input, FILE **output, - const char *output_file); +static void prepare_output_file (char **argv); +static void close_output_file (int status); +static int process_block (iconv_t cd, char *addr, size_t len); +static int process_fd (iconv_t cd, int fd); +static int process_file (iconv_t cd, FILE *input); static void print_known_names (void); @@ -114,7 +129,6 @@ int main (int argc, char *argv[]) { int status = EXIT_SUCCESS; - int remaining; __gconv_t cd; struct charmap_t *from_charmap = NULL; struct charmap_t *to_charmap = NULL; @@ -126,7 +140,7 @@ main (int argc, char *argv[]) textdomain (_libc_intl_domainname); /* Parse and process arguments. */ - argp_parse (&argp, argc, argv, 0, &remaining, NULL); + argp_parse (&argp, argc, argv, 0, ¤t_input_file_index, NULL); /* List all coded character sets if wanted. */ if (list) @@ -161,7 +175,8 @@ main (int argc, char *argv[]) if (from_charmap != NULL || to_charmap != NULL) /* Construct the conversion table and do the conversion. */ status = charmap_conversion (from_code, from_charmap, to_code, to_charmap, - argc, remaining, argv, output_file); + argc, current_input_file_index, argv, + output_file); else { struct gconv_spec conv_spec; @@ -235,16 +250,14 @@ conversions from `%s' and to `%s' are not supported"), _("failed to start conversion processing")); } - /* The output file. Will be opened when we are ready to produce - output. */ - FILE *output = NULL; + prepare_output_file (argv); /* Now process the remaining files. Write them to stdout or the file specified with the `-o' parameter. If we have no file given as the parameter process all from stdin. */ - if (remaining == argc) + if (current_input_file_index == argc) { - if (process_file (cd, stdin, &output, output_file) != 0) + if (process_file (cd, stdin) != 0) status = EXIT_FAILURE; } else @@ -253,17 +266,17 @@ conversions from `%s' and to `%s' are not supported"), int fd, ret; if (verbose) - fprintf (stderr, "%s:\n", argv[remaining]); - if (strcmp (argv[remaining], "-") == 0) - fd = 0; + fprintf (stderr, "%s:\n", argv[current_input_file_index]); + if (strcmp (argv[current_input_file_index], "-") == 0) + fd = STDIN_FILENO; else { - fd = open (argv[remaining], O_RDONLY); + fd = open (argv[current_input_file_index], O_RDONLY); if (fd == -1) { error (0, errno, _("cannot open input file `%s'"), - argv[remaining]); + argv[current_input_file_index]); status = EXIT_FAILURE; continue; } @@ -271,7 +284,7 @@ conversions from `%s' and to `%s' are not supported"), { /* Read the file in pieces. */ - ret = process_fd (cd, fd, &output, output_file); + ret = process_fd (cd, fd); /* Now close the file. */ close (fd); @@ -289,7 +302,7 @@ conversions from `%s' and to `%s' are not supported"), } } } - while (++remaining < argc); + while (++current_input_file_index < argc); /* Ensure that iconv -c still exits with failure if iconv (the function) has failed with E2BIG instead of EILSEQ. */ @@ -297,8 +310,7 @@ conversions from `%s' and to `%s' are not supported"), status = EXIT_FAILURE; /* Close the output file now. */ - if (output != NULL && fclose (output)) - error (EXIT_FAILURE, errno, _("error while closing output file")); + close_output_file (status); } return status; @@ -328,6 +340,14 @@ parse_opt (int key, char *arg, struct argp_state *state) /* Omit invalid characters from output. */ omit_invalid = 1; break; + case OPT_BUFFER_SIZE: + { + int i = atoi (arg); + if (i <= 0) + error (EXIT_FAILURE, 0, _("invalid buffer size: %s"), arg); + output_buffer_size = i; + } + break; case OPT_VERBOSE: verbose = 1; break; @@ -374,59 +394,247 @@ warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n\ fprintf (stream, gettext ("Written by %s.\n"), "Ulrich Drepper"); } +/* Command line index of the last input file that overlaps with the + output file. Zero means no temporary file is ever required. */ +static int last_overlapping_file_index; -static int -write_output (const char *outbuf, const char *outptr, FILE **output, - const char *output_file) +/* This is set to true if the output is written to a temporary file. */ +static bool output_using_temporary_file; + +/* This is the file descriptor that will be used by write_output. */ +static int output_fd = -1; + +/* Pointers at the start and end of the fixed-size output buffer. */ +static char *output_buffer_start; + +/* Current write position in the output buffer. */ +static char *output_buffer_current; + +/* Remaining bytes after output_buffer_current in the output buffer. */ +static size_t output_buffer_remaining; + + +/* Reduce the buffer size when writing directly to the output file, to + reduce cache utilization. */ +static size_t copy_buffer_size = BUFSIZ; + +static void +output_error (void) +{ + error (EXIT_FAILURE, errno, _("cannot open output file")); +} + +static void +input_error (const char *path) { - /* We have something to write out. */ - int errno_save = errno; + error (0, errno, _("cannot open input file `%s'"), path); +} - if (*output == NULL) +/* Opens output_file for writing, truncating it. */ +static void +open_output_direct (void) +{ + output_fd = open64 (output_file, O_WRONLY | O_CREAT | O_TRUNC, 0777); + if (output_fd < 0) + output_error (); +} + +static void +prepare_output_file (char **argv) +{ + if (copy_buffer_size > output_buffer_size) + copy_buffer_size = output_buffer_size; + + if (output_file == NULL || strcmp (output_file, "-") == 0) { - /* Determine output file. */ - if (output_file != NULL && strcmp (output_file, "-") != 0) + /* No buffering is required when writing to standard output + because input overlap is expected to be solved externally. */ + output_fd = STDOUT_FILENO; + output_buffer_size = copy_buffer_size; + } + else + { + /* If iconv creates the output file, no overlap is possible. */ + output_fd = open64 (output_file, O_WRONLY | O_CREAT | O_EXCL, 0777); + if (output_fd >= 0) + output_buffer_size = copy_buffer_size; + else { - *output = fopen (output_file, "w"); - if (*output == NULL) - error (EXIT_FAILURE, errno, _("cannot open output file")); + /* Otherwise, check if any of the input files overlap with the + output file. */ + struct statx st; + if (statx (AT_FDCWD, output_file, 0, STATX_INO | STATX_MODE, &st) + != 0) + output_error (); + uint32_t out_dev_minor = st.stx_dev_minor; + uint32_t out_dev_major = st.stx_dev_major; + uint64_t out_ino = st.stx_ino; + + int idx = current_input_file_index; + while (true) + { + /* Special case: no input files means standard input. */ + if (argv[idx] == NULL && idx != current_input_file_index) + break; + + int ret; + if (argv[idx] == NULL || strcmp (argv[idx], "-") == 0) + ret = statx (STDIN_FILENO, "", AT_EMPTY_PATH, STATX_INO, &st); + else + ret = statx (AT_FDCWD, argv[idx], 0, STATX_INO, &st); + if (ret != 0) + { + input_error (argv[idx]); + exit (EXIT_FAILURE); + } + if (out_dev_minor == st.stx_dev_minor + && out_dev_major == st.stx_dev_major + && out_ino == st.stx_ino) + { + if (argv[idx] == NULL) + /* Corner case: index of NULL would be larger than + idx while converting, triggering a switch away + from the temporary file. */ + last_overlapping_file_index = INT_MAX; + else + last_overlapping_file_index = idx; + } + + if (argv[idx] == NULL) + break; + ++idx; + } + + /* If there is no overlap, avoid using a temporary file. */ + if (last_overlapping_file_index == 0) + { + open_output_direct (); + output_buffer_size = copy_buffer_size; + } } - else - *output = stdout; } - if (fwrite (outbuf, 1, outptr - outbuf, *output) < (size_t) (outptr - outbuf) - || ferror (*output)) + output_buffer_start = malloc (output_buffer_size); + if (output_buffer_start == NULL) + output_error (); + output_buffer_current = output_buffer_start; + output_buffer_remaining = output_buffer_size; +} + +/* Write out the range [first, last), terminating the process on write + error. */ +static void +write_fully (int fd, const char *first, const char *last) +{ + while (first < last) { - /* Error occurred while printing the result. */ - error (0, 0, _("\ + ssize_t ret = write (fd, first, last - first); + if (ret == 0) + { + errno = ENOSPC; + output_error (); + } + if (ret < 0) + error (EXIT_FAILURE, errno, _("\ conversion stopped due to problem in writing the output")); - return -1; + first += ret; + } +} + +static void +flush_output (void) +{ + bool temporary_file_not_needed + = current_input_file_index > last_overlapping_file_index; + if (output_fd < 0) + { + if (temporary_file_not_needed) + open_output_direct (); + else + { + /* Create an anonymous temporary file. */ + FILE *fp = tmpfile (); + if (fp == NULL) + output_error (); + output_fd = dup (fileno (fp)); + if (output_fd < 0) + output_error (); + fclose (fp); + output_using_temporary_file = true; + } + /* Either way, no longer use a memory-only staging buffer. */ + output_buffer_size = copy_buffer_size; } + else if (output_using_temporary_file && temporary_file_not_needed) + { + /* The temporary file is no longer needed. Switch to direct + output, replacing output_fd. */ + int temp_fd = output_fd; + open_output_direct (); + + /* Copy over the data spooled to the temporary file. */ + if (lseek (temp_fd, 0, SEEK_SET) < 0) + output_error (); + while (true) + { + char buf[BUFSIZ]; + ssize_t ret = read (temp_fd, buf, sizeof (buf)); + if (ret < 0) + output_error (); + if (ret == 0) + break; + write_fully (output_fd, buf, buf + ret); + } + close (temp_fd); - errno = errno_save; + /* No longer using a temporary file from now on. */ + output_using_temporary_file = false; + output_buffer_size = copy_buffer_size; + } - return 0; + write_fully (output_fd, output_buffer_start, output_buffer_current); + output_buffer_current = output_buffer_start; + output_buffer_remaining = output_buffer_size; } +static void +close_output_file (int status) +{ + /* Do not perform a flush if a temporary file or the in-memory + buffer is in use and there was an error. It would clobber the + overlapping input file. */ + if (status != EXIT_SUCCESS && !omit_invalid && + (output_using_temporary_file || output_fd < 0)) + return; + + /* The current_input_file_index variable is now larger than + last_overlapping_file_index, so the flush_output call switches + away from the temporary file. */ + flush_output (); + + if (output_fd == STDOUT_FILENO) + { + /* Close standard output in safe manner, to report certain + ENOSPC errors. */ + output_fd = dup (output_fd); + if (output_fd < 0) + output_error (); + } + if (close (output_fd) < 0) + output_error (); +} static int -process_block (iconv_t cd, char *addr, size_t len, FILE **output, - const char *output_file) +process_block (iconv_t cd, char *addr, size_t len) { -#define OUTBUF_SIZE 32768 const char *start = addr; - char outbuf[OUTBUF_SIZE]; - char *outptr; - size_t outlen; size_t n; int ret = 0; while (len > 0) { - outptr = outbuf; - outlen = OUTBUF_SIZE; - n = iconv (cd, &addr, &len, &outptr, &outlen); + n = iconv (cd, &addr, &len, + &output_buffer_current, &output_buffer_remaining); if (n == (size_t) -1 && omit_invalid && errno == EILSEQ) { @@ -437,39 +645,34 @@ process_block (iconv_t cd, char *addr, size_t len, FILE **output, errno = E2BIG; } - if (outptr != outbuf) - { - ret = write_output (outbuf, outptr, output, output_file); - if (ret != 0) - break; - } - if (n != (size_t) -1) { /* All the input test is processed. For state-dependent character sets we have to flush the state now. */ - outptr = outbuf; - outlen = OUTBUF_SIZE; - n = iconv (cd, NULL, NULL, &outptr, &outlen); - - if (outptr != outbuf) + n = iconv (cd, NULL, NULL, + &output_buffer_current, &output_buffer_remaining); + if (n == (size_t) -1 && errno == E2BIG) { - ret = write_output (outbuf, outptr, output, output_file); - if (ret != 0) - break; + /* Try again if the state flush exceeded the buffer space. */ + flush_output (); + n = iconv (cd, NULL, NULL, + &output_buffer_current, &output_buffer_remaining); } + bool errno_is_EILSEQ = errno == EILSEQ; if (n != (size_t) -1) break; - if (omit_invalid && errno == EILSEQ) + if (omit_invalid && errno_is_EILSEQ) { ret = 1; break; } } - if (errno != E2BIG) + if (errno == E2BIG) + flush_output (); + else { /* iconv() ran into a problem. */ switch (errno) @@ -500,7 +703,7 @@ incomplete character or shift sequence at end of buffer")); static int -process_fd (iconv_t cd, int fd, FILE **output, const char *output_file) +process_fd (iconv_t cd, int fd) { /* we have a problem with reading from a descriptor since we must not provide the iconv() function an incomplete character or shift @@ -574,16 +777,16 @@ process_fd (iconv_t cd, int fd, FILE **output, const char *output_file) } /* Now we have all the input in the buffer. Process it in one run. */ - return process_block (cd, inbuf, actlen, output, output_file); + return process_block (cd, inbuf, actlen); } static int -process_file (iconv_t cd, FILE *input, FILE **output, const char *output_file) +process_file (iconv_t cd, FILE *input) { /* This should be safe since we use this function only for `stdin' and we haven't read anything so far. */ - return process_fd (cd, fileno (input), output, output_file); + return process_fd (cd, fileno (input)); } diff --git a/iconv/tst-iconv_prog-buffer.sh b/iconv/tst-iconv_prog-buffer.sh index 5ff99a02a3..54ff871d32 100644 --- a/iconv/tst-iconv_prog-buffer.sh +++ b/iconv/tst-iconv_prog-buffer.sh @@ -17,6 +17,12 @@ # License along with the GNU C Library; if not, see # . +# Arguments: +# root of the build tree ($(objpfx-common)) +# test command wrapper (for running on the board/with new ld.so) +# extra flags to pass to iconv +# number of times to double the input files in size (default: 0) + exec 2>&1 set -e @@ -26,7 +32,9 @@ codir=$1 test_program_prefix="$2" # Use internal converters to avoid issues with module loading. -iconv_args="-f ASCII -t UTF-8" +iconv_args="-f ASCII -t UTF-8 $3" + +file_size_doublings=${4-0} failure=false @@ -39,7 +47,19 @@ echo HH > "$tmp/hh" echo XY > "$tmp/xy" echo ZT > "$tmp/zt" echo OUT > "$tmp/out-template" +: > "$tmp/empty" printf '\xff' > "$tmp/0xff" + +# Double all files to produce larger buffers. +for p in "$tmp"/* ; do + i=0 + while test $i -lt $file_size_doublings; do + cat "$p" "$p" > "$tmp/scratch" + mv "$tmp/scratch" "$p" + i=$(($i + 1)) + done +done + cat "$tmp/xy" "$tmp/0xff" "$tmp/zt" > "$tmp/0xff-wrapped" run_iconv () { @@ -113,6 +133,38 @@ expect_files abc def run_iconv -o "$tmp/out" "$tmp/out" "$tmp/abc" expect_files abc def abc +run_iconv -o "$tmp/out" "$tmp/ggg" "$tmp/out" +expect_files ggg abc def abc + +run_iconv -o "$tmp/out" "$tmp/hh" "$tmp/out" "$tmp/hh" +expect_files hh ggg abc def abc hh + +cp "$tmp/out-template" "$tmp/out" +run_iconv -o "$tmp/out" "$tmp/ggg" "$tmp/out" "$tmp/out" "$tmp/ggg" +expect_files ggg out-template out-template ggg + +cp "$tmp/out-template" "$tmp/out" +run_iconv -o "$tmp/out" "$tmp/ggg" "$tmp/out" "$tmp/hh" "$tmp/out" "$tmp/ggg" +expect_files ggg out-template hh out-template ggg + +# Empty output should truncate the output file if exists. + +cp "$tmp/out-template" "$tmp/out" +run_iconv -o "$tmp/out" "$tmp/err" @@ -156,6 +236,20 @@ expect_exit 1 run_iconv -c -o "$tmp/out" \ ! test -s "$tmp/err" expect_files abc xy zt def +cp "$tmp/0xff-wrapped" "$tmp/out" +expect_exit 1 run_iconv -c -o "$tmp/out" "$tmp/out" "$tmp/abc" "$tmp/out" "$tmp/def" +expect_files xy zt abc xy zt def + +cp "$tmp/0xff-wrapped" "$tmp/out" +expect_exit 1 run_iconv -o "$tmp/out" \ + "$tmp/out" "$tmp/abc" "$tmp/out" "$tmp/def" +expect_files 0xff-wrapped + +cp "$tmp/0xff-wrapped" "$tmp/out" +expect_exit 1 run_iconv -c -o "$tmp/out" \ + "$tmp/abc" "$tmp/out" "$tmp/def" "$tmp/out" +expect_files abc xy zt def xy zt + # If the file does not exist yet, it should not be created on error. rm "$tmp/out" From patchwork Tue Aug 6 06:16:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 1969324 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ED6QfwDX; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WdNQ407d9z1yZl for ; Tue, 6 Aug 2024 16:17:32 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BCC51385C6C8 for ; Tue, 6 Aug 2024 06:17:29 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTP id 0C032385C6C7 for ; Tue, 6 Aug 2024 06:16:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0C032385C6C7 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0C032385C6C7 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722925007; cv=none; b=kDeSArVIZl3ZDX4QQw7rUxT00KO32eQul3X0mLh1uw/p34aLdR6swqWLaYjdpNPdTcEIOMAgY/i8PMISjhNTq3yOY+J2JMH9LrY1daY4l1nRO36BLT1irCFTULKUOrJr1Y/zKhY0mswnk/e5yJYB01XIvHjYldTdJst0oPsrrkM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722925007; c=relaxed/simple; bh=8UT+c2d2A1mMfi83QAzvOzuhjdx23k3kO8SjfEakuz0=; h=DKIM-Signature:From:To:Subject:Message-ID:Date:MIME-Version; b=ZY/eQW4T7pFCO3+kjN4TWVyTgIin78tLuA3PPJBPe0jO1mNzgCthzXfcacpwCGdULp3Nk3wvzJvKiEOdTaQGo7ldhk/kxmIQ7sQqsk6QTYpjoj0metT+GEm9I+2dy7duySK/3SffmAI/viAZ9EXRqoBylB3QwoYAnIy0TraKG/4= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722925005; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vJ9Lx1p0CMBVuOX+nSB3ppxWjoRbm8v51YQTJMJCKz8=; b=ED6QfwDX06AOpAvIVbHy8nsJ4E0byGu8XFloMx8VGORgTH5O7lAE3XZVTHEVCkugYHnmXi Xm/mW3VloXqNYW8YxgWyTtSenNcSq3c2TjrY7VgpmiqNfa4sOE1Glv8v2ax94mWk8EMBnG zQTKhAeJtDfXThdTQD4xXinosyanF44= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-597-zUUOTU5ANkerz5LmPJggeg-1; Tue, 06 Aug 2024 02:16:44 -0400 X-MC-Unique: zUUOTU5ANkerz5LmPJggeg-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C7D9219560AD for ; Tue, 6 Aug 2024 06:16:43 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.45.224.40]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D5E0E1955D42 for ; Tue, 6 Aug 2024 06:16:42 +0000 (UTC) From: Florian Weimer To: libc-alpha@sourceware.org Subject: [PATCH v2 6/7] iconv: Multiple - on command line should not fail (bug 32050) In-Reply-To: Message-ID: References: X-From-Line: ae480c4e705870913177a9bb21ef8eb384f5479e Mon Sep 17 00:00:00 2001 Date: Tue, 06 Aug 2024 08:16:39 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org Usually, the second and subsequent - return EOF immediately and do not contribute to the output, but this is not an error. --- iconv/iconv_prog.c | 3 ++- iconv/tst-iconv_prog-buffer.sh | 5 +++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/iconv/iconv_prog.c b/iconv/iconv_prog.c index 3e02db7319..dd4bc3a59a 100644 --- a/iconv/iconv_prog.c +++ b/iconv/iconv_prog.c @@ -287,7 +287,8 @@ conversions from `%s' and to `%s' are not supported"), ret = process_fd (cd, fd); /* Now close the file. */ - close (fd); + if (fd != STDIN_FILENO) + close (fd); if (ret != 0) { diff --git a/iconv/tst-iconv_prog-buffer.sh b/iconv/tst-iconv_prog-buffer.sh index 54ff871d32..a9c3729d94 100644 --- a/iconv/tst-iconv_prog-buffer.sh +++ b/iconv/tst-iconv_prog-buffer.sh @@ -265,6 +265,11 @@ expect_exit 1 run_iconv -o "$tmp/out" "$tmp/abc" "$tmp/0xff" "$tmp/def" expect_exit 1 run_iconv -o "$tmp/out" "$tmp/abc" - < "$tmp/0xff" "$tmp/def" ! test -e "$tmp/out" +# Listing standard input multiple times should not fail (bug 32050). + +run_iconv -o "$tmp/out" "$tmp/xy" - - "$tmp/zt" < "$tmp/abc" +expect_files xy abc zt + if $failure ; then exit 1 fi From patchwork Tue Aug 6 06:16:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 1969326 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=LCFs666D; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WdNQh0XgWz1yZl for ; Tue, 6 Aug 2024 16:18:04 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EB025385842D for ; Tue, 6 Aug 2024 06:18:01 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTP id A6E3F385841D for ; Tue, 6 Aug 2024 06:16:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A6E3F385841D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A6E3F385841D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722925021; cv=none; b=GVSlLzyvYFON2/czGFPUKk4wD6JoleaKk2nuTDSRRyGAenpRAnH/PlXhXA+3b4TZJTi6cgzpLpK8bfncH7WnuYKPlIsEyC9iKWNtrjfrAmghZB8Yj/ddWvyIv0JJ1mMeSvvyvJMpZwcp1FY9qHar+D2A38HZZXcSPml45msJxy0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722925021; c=relaxed/simple; bh=HOslAdy5oJ/CpLjlFH/XMYEoW8VATXHjM1MjNZVruFE=; h=DKIM-Signature:From:To:Subject:Message-ID:Date:MIME-Version; b=sUDxWVzfHN6x4KVRN2Bi2xtJEMHekm9Z8WkYtQ0qBtcVUiNfktLz8DyifZ9Iea0t6zJ0zgHRS/nzlIqUNPIgA1rHm36sQB+S8+TLtROYF2447/TSCrUp1wXVBPxLCVDKqKMOKr6dj9bsekwBB6c4+eKCRX+MfH3to3pdNZVMsR8= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722925014; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Zq8OfA1p8OhmS3UIDWT+bvyq7EgmVuha1uICot8R+cM=; b=LCFs666DCoZgnqki1ctZZmMOC6/Fkfz0ZEufoY2CXwNlk7+9vGAscanMGLO4hG1seoBATP MT7e3zjQ6M6elWBLXjzAqh/gyDYWP5y36TkFvaLRsXaH1oFunpF8wrhnvKFECy2CSUn3kR sh+RstD8vf2HkTrHjJ0uqbe0WZZ9sug= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-367-GIWaWZ20OJ-Pub1qcnwf8Q-1; Tue, 06 Aug 2024 02:16:53 -0400 X-MC-Unique: GIWaWZ20OJ-Pub1qcnwf8Q-1 Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 533881955D42 for ; Tue, 6 Aug 2024 06:16:52 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.45.224.40]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 414BC19560AA for ; Tue, 6 Aug 2024 06:16:50 +0000 (UTC) From: Florian Weimer To: libc-alpha@sourceware.org Subject: [PATCH v2 7/7] iconv: Input buffering for the iconv program (bug 32050) In-Reply-To: Message-ID: <592986faab2dc259ae5f537ff6b4be6793fd7fdf.1722924862.git.fweimer@redhat.com> References: X-From-Line: 592986faab2dc259ae5f537ff6b4be6793fd7fdf Mon Sep 17 00:00:00 2001 Date: Tue, 06 Aug 2024 08:16:48 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org Do not read the entire input file into memory. --- iconv/iconv_prog.c | 184 ++++++++++++++------------------- iconv/tst-iconv_prog-buffer.sh | 31 ++++++ 2 files changed, 109 insertions(+), 106 deletions(-) diff --git a/iconv/iconv_prog.c b/iconv/iconv_prog.c index dd4bc3a59a..fcb2c2f5f8 100644 --- a/iconv/iconv_prog.c +++ b/iconv/iconv_prog.c @@ -118,8 +118,9 @@ static size_t output_buffer_size = 1024 * 1024; /* Prototypes for the functions doing the actual work. */ static void prepare_output_file (char **argv); -static void close_output_file (int status); -static int process_block (iconv_t cd, char *addr, size_t len); +static void close_output_file (__gconv_t cd, int status); +static int process_block (iconv_t cd, char **addr, size_t *len, + off64_t file_offset, bool *incomplete); static int process_fd (iconv_t cd, int fd); static int process_file (iconv_t cd, FILE *input); static void print_known_names (void); @@ -311,7 +312,7 @@ conversions from `%s' and to `%s' are not supported"), status = EXIT_FAILURE; /* Close the output file now. */ - close_output_file (status); + close_output_file (cd, status); } return status; @@ -599,7 +600,7 @@ flush_output (void) } static void -close_output_file (int status) +close_output_file (__gconv_t cd, int status) { /* Do not perform a flush if a temporary file or the in-memory buffer is in use and there was an error. It would clobber the @@ -608,10 +609,28 @@ close_output_file (int status) (output_using_temporary_file || output_fd < 0)) return; - /* The current_input_file_index variable is now larger than - last_overlapping_file_index, so the flush_output call switches + /* All the input test is processed. For state-dependent character + sets we have to flush the state now. + + The current_input_file_index variable is now larger than + last_overlapping_file_index, so the flush_output calls switch away from the temporary file. */ + size_t n = iconv (cd, NULL, NULL, + &output_buffer_current, &output_buffer_remaining); + if (n == (size_t) -1 && errno == E2BIG) + { + /* Try again if the state flush exceeded the buffer space. */ + flush_output (); + n = iconv (cd, NULL, NULL, + &output_buffer_current, &output_buffer_remaining); + } + int saved_errno = errno; flush_output (); + if (n == (size_t) -1 && !omit_invalid) + { + errno = saved_errno; + output_error (); + } if (output_fd == STDOUT_FILENO) { @@ -625,51 +644,35 @@ close_output_file (int status) output_error (); } +/* CD is the iconv handle. Input processing starts at *ADDR, and + consumes upto *LEN bytes. *ADDR and *LEN are updated. FILE_OFFSET + is the file offset of the data initially at ADDR. *INCOMPLETE is + set to true if conversion stops due to an incomplete input + sequence. */ static int -process_block (iconv_t cd, char *addr, size_t len) +process_block (iconv_t cd, char **addr, size_t *len, off64_t file_offset, + bool *incomplete) { - const char *start = addr; + const char *start = *addr; size_t n; int ret = 0; - while (len > 0) + while (*len > 0) { - n = iconv (cd, &addr, &len, + n = iconv (cd, addr, len, &output_buffer_current, &output_buffer_remaining); if (n == (size_t) -1 && omit_invalid && errno == EILSEQ) { ret = 1; - if (len == 0) + if (*len == 0) n = 0; else errno = E2BIG; } if (n != (size_t) -1) - { - /* All the input test is processed. For state-dependent - character sets we have to flush the state now. */ - n = iconv (cd, NULL, NULL, - &output_buffer_current, &output_buffer_remaining); - if (n == (size_t) -1 && errno == E2BIG) - { - /* Try again if the state flush exceeded the buffer space. */ - flush_output (); - n = iconv (cd, NULL, NULL, - &output_buffer_current, &output_buffer_remaining); - } - bool errno_is_EILSEQ = errno == EILSEQ; - - if (n != (size_t) -1) - break; - - if (omit_invalid && errno_is_EILSEQ) - { - ret = 1; - break; - } - } + break; if (errno == E2BIG) flush_output (); @@ -680,13 +683,12 @@ process_block (iconv_t cd, char *addr, size_t len) { case EILSEQ: if (! omit_invalid) - error (0, 0, _("illegal input sequence at position %ld"), - (long int) (addr - start)); + error (0, 0, _("illegal input sequence at position %lld"), + (long long int) (file_offset + (*addr - start))); break; case EINVAL: - error (0, 0, _("\ -incomplete character or shift sequence at end of buffer")); - break; + *incomplete = true; + return ret; case EBADF: error (0, 0, _("internal error (illegal descriptor)")); break; @@ -706,79 +708,49 @@ incomplete character or shift sequence at end of buffer")); static int process_fd (iconv_t cd, int fd) { - /* we have a problem with reading from a descriptor since we must not - provide the iconv() function an incomplete character or shift - sequence at the end of the buffer. Since we have to deal with - arbitrary encodings we must read the whole text in a buffer and - process it in one step. */ - static char *inbuf = NULL; - static size_t maxlen = 0; - char *inptr = inbuf; - size_t actlen = 0; - - while (actlen < maxlen) + char inbuf[BUFSIZ]; + char *inbuf_end = inbuf + sizeof (inbuf); + size_t inbuf_used = 0; + off64_t file_offset = 0; + int status = 0; + bool incomplete = false; + + while (true) { - ssize_t n = read (fd, inptr, maxlen - actlen); - - if (n == 0) - /* No more text to read. */ - break; - - if (n == -1) + char *p = inbuf + inbuf_used; + ssize_t read_ret = read (fd, p, inbuf_end - p); + if (read_ret == 0) + { + /* On EOF, check if the previous iconv invocation saw an + incomplete sequence. */ + if (incomplete) + { + error (0, 0, _("\ +incomplete character or shift sequence at end of buffer")); + return 1; + } + return 0; + } + if (read_ret < 0) { - /* Error while reading. */ error (0, errno, _("error while reading the input")); return -1; } - - inptr += n; - actlen += n; + inbuf_used += read_ret; + incomplete = false; + p = inbuf; + int ret = process_block (cd, &p, &inbuf_used, file_offset, &incomplete); + if (ret != 0) + { + status = ret; + if (ret < 0) + break; + } + /* The next loop iteration consumes the leftover bytes. */ + memmove (inbuf, p, inbuf_used); + file_offset += read_ret - inbuf_used; } - - if (actlen == maxlen) - while (1) - { - ssize_t n; - char *new_inbuf; - - /* Increase the buffer. */ - new_inbuf = (char *) realloc (inbuf, maxlen + 32768); - if (new_inbuf == NULL) - { - error (0, errno, _("unable to allocate buffer for input")); - return -1; - } - inbuf = new_inbuf; - maxlen += 32768; - inptr = inbuf + actlen; - - do - { - n = read (fd, inptr, maxlen - actlen); - - if (n == 0) - /* No more text to read. */ - break; - - if (n == -1) - { - /* Error while reading. */ - error (0, errno, _("error while reading the input")); - return -1; - } - - inptr += n; - actlen += n; - } - while (actlen < maxlen); - - if (n == 0) - /* Break again so we leave both loops. */ - break; - } - - /* Now we have all the input in the buffer. Process it in one run. */ - return process_block (cd, inbuf, actlen); + return status; } diff --git a/iconv/tst-iconv_prog-buffer.sh b/iconv/tst-iconv_prog-buffer.sh index a9c3729d94..23098ac56a 100644 --- a/iconv/tst-iconv_prog-buffer.sh +++ b/iconv/tst-iconv_prog-buffer.sh @@ -50,6 +50,9 @@ echo OUT > "$tmp/out-template" : > "$tmp/empty" printf '\xff' > "$tmp/0xff" +# Length should be a prime number, to help with buffer alignment testing. +printf '\xc3\xa4\xe2\x80\x94\xe2\x80\x94\xc3\xa4\n' > "$tmp/utf8-sequence" + # Double all files to produce larger buffers. for p in "$tmp"/* ; do i=0 @@ -270,6 +273,34 @@ expect_exit 1 run_iconv -o "$tmp/out" "$tmp/abc" - < "$tmp/0xff" "$tmp/def" run_iconv -o "$tmp/out" "$tmp/xy" - - "$tmp/zt" < "$tmp/abc" expect_files xy abc zt +# NB: Extra iconv args are ignored after this point. Actual +# multi-byte conversion does not work with tiny buffers. +iconv_args="-f UTF-8 -t ASCII" + +printf 'x\n\xc3' > "$tmp/incomplete" +expect_exit 1 run_iconv -o "$tmp/out" "$tmp/incomplete" +check_out <&$logfd + printf "%s" "$prefix" > "$tmp/prefix" + cat "$tmp/prefix" "$tmp/utf8-sequence" > "$tmp/tmp" + iconv_args="-f UTF-8 -t UCS-4" + run_iconv -o "$tmp/out1" "$tmp/tmp" + iconv_args="-f UCS-4 -t UTF-8" + run_iconv -o "$tmp/out" "$tmp/out1" + expect_files prefix utf8-sequence + + prefix="$prefix@" + prefix_length=$(($prefix_length + 1)) +done + if $failure ; then exit 1 fi