From patchwork Thu Jun 27 11:07:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Wakely X-Patchwork-Id: 1953167 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=QMHGAikJ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W8wmx1j7Pz20Xg for ; Thu, 27 Jun 2024 21:09:05 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 726C83832E41 for ; Thu, 27 Jun 2024 11:09:03 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 2C1123835C0D for ; Thu, 27 Jun 2024 11:08:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2C1123835C0D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2C1123835C0D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719486520; cv=none; b=g9FJ5JomVDLSEP9AgHT1FSP4JCzZ7qzqerv47EAWYgcm9aV3G2fRmatey76/ULaNDqnvaG8j5r/PomYuVkxuBRRpe5e9qg+kaxY4p57m7LkRi5yvxvnPQikM8De7vx44VirYhSLznfJ0rmmQxlUWa0vN66Ji1lzBIx+7H9Y//TY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719486520; c=relaxed/simple; bh=JQzqOWj27NTg5yldBqjen+DWSN+WoxcnYoigccQzJPY=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=JapIQoBKqZ+yC3wMU7mpY9r720nJr+x/u/R6qhEE/wllP+gKcE4odxc8OeMLzWKp3+rqL9tp+Eu1ofruKxBB99pj8hPQWf65IZAIo2P5iJOjF08A9mbl50qTdsSDrxyRNsTYddP21kxKrcZ72PSptZBkzLUpp8UtNsc/vdHCQfo= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1719486517; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=I4q/PIYkX+Zx72MrNCaz+GF0pmaQBR1wMc8BVOk4n9c=; b=QMHGAikJ9JrmZUVUe/Y+R5Dma86nBYyEjzh/yqbO59KNjqCUICp0q1kZWqEVEJPzo23Y8D 9TE5YaRMUZtxqBzYwbpFxBNlLtDg5UMt/bJmert0pxiMNpSWkSYQbX8l1E3fVSbSh2EX+u m49921kWUJqMvXg4Ax3d/lg4x54QSUA= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-96-0HgmS-J1NeWGGnOGvZOUKA-1; Thu, 27 Jun 2024 07:08:35 -0400 X-MC-Unique: 0HgmS-J1NeWGGnOGvZOUKA-1 Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 878F919560B2; Thu, 27 Jun 2024 11:08:34 +0000 (UTC) Received: from localhost (unknown [10.42.28.171]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CE1811956054; Thu, 27 Jun 2024 11:08:33 +0000 (UTC) From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [committed v2] libstdc++: Fix std::codecvt for empty dest [PR37475] Date: Thu, 27 Jun 2024 12:07:07 +0100 Message-ID: <20240627110832.119192-1-jwakely@redhat.com> In-Reply-To: <20240612203324.879477-1-jwakely@redhat.com> References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Here's what I've pushed, with a typo fixed as spotted by Kristian in the PR comments. Tested x86_64-linux. Pushed to trunk. -- >8 -- For the GNU locale model, codecvt::do_out and codecvt::do_in incorrectly return 'ok' when the destination range is empty. That happens because detecting incomplete output is done in the loop body, and the loop is never even entered if to == to_end. By restructuring the loop condition so that we check the output range separately, we can ensure that for a non-empty source range, we always enter the loop at least once, and detect if the destination range is too small. The loops also seem easier to reason about if we return immediately on any error, instead of checking the result twice on every iteration. We can use an RAII type to restore the locale before returning, which also simplifies all the other member functions. libstdc++-v3/ChangeLog: PR libstdc++/37475 * config/locale/gnu/codecvt_members.cc (Guard): New RAII type. (do_out, do_in): Return partial if the destination is empty but the source is not. Use Guard to restore locale on scope exit. Return immediately on any conversion error. (do_encoding, do_max_length, do_length): Use Guard. * testsuite/22_locale/codecvt/in/char/37475.cc: New test. * testsuite/22_locale/codecvt/in/wchar_t/37475.cc: New test. * testsuite/22_locale/codecvt/out/char/37475.cc: New test. * testsuite/22_locale/codecvt/out/wchar_t/37475.cc: New test. --- .../config/locale/gnu/codecvt_members.cc | 117 ++++++++---------- .../22_locale/codecvt/in/char/37475.cc | 23 ++++ .../22_locale/codecvt/in/wchar_t/37475.cc | 23 ++++ .../22_locale/codecvt/out/char/37475.cc | 23 ++++ .../22_locale/codecvt/out/wchar_t/37475.cc | 23 ++++ 5 files changed, 142 insertions(+), 67 deletions(-) create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/in/char/37475.cc create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/in/wchar_t/37475.cc create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/out/char/37475.cc create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/out/wchar_t/37475.cc diff --git a/libstdc++-v3/config/locale/gnu/codecvt_members.cc b/libstdc++-v3/config/locale/gnu/codecvt_members.cc index 034713d236e..794f25a5f35 100644 --- a/libstdc++-v3/config/locale/gnu/codecvt_members.cc +++ b/libstdc++-v3/config/locale/gnu/codecvt_members.cc @@ -37,8 +37,23 @@ namespace std _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION - // Specializations. #ifdef _GLIBCXX_USE_WCHAR_T +namespace +{ + // RAII type for changing and restoring the current thread's locale. + struct Guard + { +#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) + explicit Guard(__c_locale loc) : old(__uselocale(loc)) { } + ~Guard() { __uselocale(old); } +#else + explicit Guard(__c_locale) { } +#endif + __c_locale old; + }; +} + + // Specializations. codecvt_base::result codecvt:: do_out(state_type& __state, const intern_type* __from, @@ -46,22 +61,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION extern_type* __to, extern_type* __to_end, extern_type*& __to_next) const { - result __ret = ok; state_type __tmp_state(__state); - -#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) - __c_locale __old = __uselocale(_M_c_locale_codecvt); -#endif + Guard g(_M_c_locale_codecvt); // wcsnrtombs is *very* fast but stops if encounters NUL characters: // in case we fall back to wcrtomb and then continue, in a loop. // NB: wcsnrtombs is a GNU extension - for (__from_next = __from, __to_next = __to; - __from_next < __from_end && __to_next < __to_end - && __ret == ok;) + __from_next = __from; + __to_next = __to; + while (__from_next < __from_end) { - const intern_type* __from_chunk_end = wmemchr(__from_next, L'\0', - __from_end - __from_next); + if (__to_next >= __to_end) + return partial; + + const intern_type* __from_chunk_end + = wmemchr(__from_next, L'\0', __from_end - __from_next); if (!__from_chunk_end) __from_chunk_end = __from_end; @@ -77,12 +91,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION for (; __from < __from_next; ++__from) __to_next += wcrtomb(__to_next, *__from, &__tmp_state); __state = __tmp_state; - __ret = error; + return error; } else if (__from_next && __from_next < __from_chunk_end) { __to_next += __conv; - __ret = partial; + return partial; } else { @@ -90,13 +104,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __to_next += __conv; } - if (__from_next < __from_end && __ret == ok) + if (__from_next < __from_end) { extern_type __buf[MB_LEN_MAX]; __tmp_state = __state; const size_t __conv2 = wcrtomb(__buf, *__from_next, &__tmp_state); if (__conv2 > static_cast(__to_end - __to_next)) - __ret = partial; + return partial; else { memcpy(__to_next, __buf, __conv2); @@ -107,11 +121,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } } -#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) - __uselocale(__old); -#endif - - return __ret; + return ok; } codecvt_base::result @@ -121,24 +131,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION intern_type* __to, intern_type* __to_end, intern_type*& __to_next) const { - result __ret = ok; state_type __tmp_state(__state); - -#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) - __c_locale __old = __uselocale(_M_c_locale_codecvt); -#endif + Guard g(_M_c_locale_codecvt); // mbsnrtowcs is *very* fast but stops if encounters NUL characters: // in case we store a L'\0' and then continue, in a loop. // NB: mbsnrtowcs is a GNU extension - for (__from_next = __from, __to_next = __to; - __from_next < __from_end && __to_next < __to_end - && __ret == ok;) + __from_next = __from; + __to_next = __to; + while (__from_next < __from_end) { - const extern_type* __from_chunk_end; - __from_chunk_end = static_cast(memchr(__from_next, '\0', - __from_end - - __from_next)); + if (__to_next >= __to_end) + return partial; + + const extern_type* __from_chunk_end + = static_cast(memchr(__from_next, '\0', + __from_end - __from_next)); if (!__from_chunk_end) __from_chunk_end = __from_end; @@ -161,13 +169,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } __from_next = __from; __state = __tmp_state; - __ret = error; + return error; } else if (__from_next && __from_next < __from_chunk_end) { // It is unclear what to return in this case (see DR 382). __to_next += __conv; - __ret = partial; + return partial; } else { @@ -175,7 +183,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __to_next += __conv; } - if (__from_next < __from_end && __ret == ok) + if (__from_next < __from_end) { if (__to_next < __to_end) { @@ -185,48 +193,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION *__to_next++ = L'\0'; } else - __ret = partial; + return partial; } } -#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) - __uselocale(__old); -#endif - - return __ret; + return ok; } int codecvt:: do_encoding() const throw() { + Guard g(_M_c_locale_codecvt); // XXX This implementation assumes that the encoding is // stateless and is either single-byte or variable-width. - int __ret = 0; -#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) - __c_locale __old = __uselocale(_M_c_locale_codecvt); -#endif - if (MB_CUR_MAX == 1) - __ret = 1; -#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) - __uselocale(__old); -#endif - return __ret; + return MB_CUR_MAX == 1; } int codecvt:: do_max_length() const throw() { -#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) - __c_locale __old = __uselocale(_M_c_locale_codecvt); -#endif + Guard g(_M_c_locale_codecvt); // XXX Probably wrong for stateful encodings. - int __ret = MB_CUR_MAX; -#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) - __uselocale(__old); -#endif - return __ret; + return MB_CUR_MAX; } int @@ -236,10 +226,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { int __ret = 0; state_type __tmp_state(__state); - -#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) - __c_locale __old = __uselocale(_M_c_locale_codecvt); -#endif + Guard g(_M_c_locale_codecvt); // mbsnrtowcs is *very* fast but stops if encounters NUL characters: // in case we advance past it and then continue, in a loop. @@ -295,10 +282,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } } -#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) - __uselocale(__old); -#endif - return __ret; } #endif diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/in/char/37475.cc b/libstdc++-v3/testsuite/22_locale/codecvt/in/char/37475.cc new file mode 100644 index 00000000000..6184c3280cb --- /dev/null +++ b/libstdc++-v3/testsuite/22_locale/codecvt/in/char/37475.cc @@ -0,0 +1,23 @@ +#include +#include + +void +test_pr37475() +{ + typedef std::codecvt test_type; + const test_type& cvt = std::use_facet(std::locale::classic()); + const char from = 'a'; + const char* from_next; + char to = 0; + char* to_next; + std::mbstate_t st = std::mbstate_t(); + std::codecvt_base::result res + = cvt.in(st, &from, &from+1, from_next, &to, &to, to_next); + + VERIFY( res == std::codecvt_base::noconv ); +} + +int main() +{ + test_pr37475(); +} diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/in/wchar_t/37475.cc b/libstdc++-v3/testsuite/22_locale/codecvt/in/wchar_t/37475.cc new file mode 100644 index 00000000000..a0e64847ea9 --- /dev/null +++ b/libstdc++-v3/testsuite/22_locale/codecvt/in/wchar_t/37475.cc @@ -0,0 +1,23 @@ +#include +#include + +void +test_pr37475() +{ + typedef std::codecvt test_type; + const test_type& cvt = std::use_facet(std::locale::classic()); + const char from = 'a'; + const char* from_next; + wchar_t to = 0; + wchar_t* to_next; + std::mbstate_t st = std::mbstate_t(); + std::codecvt_base::result res + = cvt.in(st, &from, &from+1, from_next, &to, &to, to_next); + + VERIFY( res == std::codecvt_base::partial ); +} + +int main() +{ + test_pr37475(); +} diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/out/char/37475.cc b/libstdc++-v3/testsuite/22_locale/codecvt/out/char/37475.cc new file mode 100644 index 00000000000..8736e4b7f3f --- /dev/null +++ b/libstdc++-v3/testsuite/22_locale/codecvt/out/char/37475.cc @@ -0,0 +1,23 @@ +#include +#include + +void +test_pr37475() +{ + typedef std::codecvt test_type; + const test_type& cvt = std::use_facet(std::locale::classic()); + const char from = 'a'; + const char* from_next; + char to; + char* to_next; + std::mbstate_t st = std::mbstate_t(); + std::codecvt_base::result res + = cvt.out(st, &from, &from+1, from_next, &to, &to, to_next); + + assert( res == std::codecvt_base::noconv ); +} + +int main() +{ + test_pr37475(); +} diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/out/wchar_t/37475.cc b/libstdc++-v3/testsuite/22_locale/codecvt/out/wchar_t/37475.cc new file mode 100644 index 00000000000..2cd2edb7404 --- /dev/null +++ b/libstdc++-v3/testsuite/22_locale/codecvt/out/wchar_t/37475.cc @@ -0,0 +1,23 @@ +#include +#include + +void +test_pr37475() +{ + typedef std::codecvt test_type; + const test_type& cvt = std::use_facet(std::locale::classic()); + const wchar_t from = L'a'; + const wchar_t* from_next; + char to; + char* to_next; + std::mbstate_t st = std::mbstate_t(); + std::codecvt_base::result res + = cvt.out(st, &from, &from+1, from_next, &to, &to, to_next); + + assert( res == std::codecvt_base::partial ); +} + +int main() +{ + test_pr37475(); +}