From patchwork Thu Jul 11 23:19:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jonathan Wakely X-Patchwork-Id: 1959604 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=N1JIe004; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WKrQj1V6Hz1xqc for ; Fri, 12 Jul 2024 09:24:13 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 555383834307 for ; Thu, 11 Jul 2024 23:24:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 21289383602A for ; Thu, 11 Jul 2024 23:23:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 21289383602A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 21289383602A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720740213; cv=none; b=gUsdzrJaJO0Zlu9tPJvZ95b0A35Yo1PuBMIwDJer9/spM0moUlNJ1/SbmWwL7LwPkDjxIGzb961UcXASu6UNqdbNq/sMA6tCU/AIs9QvvNrypHbbakz870XNWY68unztsL+ToAD/bc6+YRvwHzvv5SZ2xAc19gH4RF72Fx1YCpk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720740213; c=relaxed/simple; bh=ff5zRIOx2TWHX4hsy2LABP8YOgtLCLrcFQK2m2MEAN0=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=LTZrajePDCYdIIAxdpKxFx0dQxMOGgMPpD7ZgC5f1x6yu62/AkVf2MWqYi4rEXLjXo9swScljdJmfhcGU31ZJfUbd/JfI1YhYSxzbVlZuBSJHMOsko0lDxHMasY+M7dYN6rpk0osD/fwODCysnHZM5QSb3wUw5v9NzZ3ugSeJ4k= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1720740208; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SqQZxZzXpaLRfmb5DbhFoH69mVLxmeL0aHOor4nKQJM=; b=N1JIe004Gc7qJU2wmzClbbzNrjwdfNtzbTWFBzKCfuV8NUerPE6hnGwdawFB218ZKTyeTt Y2fOMXNgDO8VX+gQSSdieWKIOR+saAzZo/tsd3XZCxFR5stO3qDTCaFhZRO1H0glRc5U+8 d9ROCrwo7BKCO+ikHdydBgBYT30DgSE= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-332-j75X9QGPO1adADFjaZ-tmw-1; Thu, 11 Jul 2024 19:23:26 -0400 X-MC-Unique: j75X9QGPO1adADFjaZ-tmw-1 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 48010195608F; Thu, 11 Jul 2024 23:23:24 +0000 (UTC) Received: from localhost (unknown [10.42.28.210]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A2EA319560AE; Thu, 11 Jul 2024 23:23:23 +0000 (UTC) From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [PATCH v2] libstdc++: Handle encodings in localized chrono formatting [PR109162] Date: Fri, 12 Jul 2024 00:19:23 +0100 Message-ID: <20240711232322.1264807-1-jwakely@redhat.com> In-Reply-To: <20240201160400.2143624-2-jwakely@redhat.com> References: <20240201160400.2143624-2-jwakely@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org I sent v1 of this patch in February, and it added the new symbols to libstdc++exp.a which meant users needed to use -lstdc++exp to format chrono types in C++23 mode. That was less than ideal. This v2 patch adds the new symbols to the main library, which means no extra step to get the new features, and we can enable them as a DR for C++20 mode. But that means we need new exports in the shared library, and so need to be more confident that the feature is stable and ready to go into the lib. I'm not 100% confident that we want to add a new, private facet to the std::locale, but it seems reasonable. And that's not exposed to users at all, as the two new symbols added to the library hide the creation and use of that facet. -- >8 -- This implements the C++23 paper P2419R2 (Clarify handling of encodings in localized formatting of chrono types). The requirement is that when the literal encoding is "a Unicode encoding form" and the formatting locale uses a different encoding, any locale-specific strings such as "août" for std::chrono::August should be converted to the literal encoding. Using the recently-added std::locale::encoding() function we can check the locale's encoding and then use iconv if a conversion is needed. Because nl_langinfo_l and iconv_open both allocate memory, a naive implementation would perform multiple allocations and deallocations for every snippet of locale-specific text that needs to be converted to UTF-8. To avoid that, a new internal locale::facet is defined to store the text_encoding and an iconv_t descriptor, which are then cached in the formatting locale. This requires access to the internals of a std::locale object in src/c++20/format.cc, so that new file needs to be compiled with -fno-access-control, as well as -std=gnu++26 in order to use std::text_encoding. Because the new std::text_encoding and std::locale::encoding() symbols are only in the libstdc++exp.a archive, we need to include src/c++26/text_encoding.cc in the main library, but not export its symbols yet. This means they can be used by the two new functions which are exported from the main library. The encoding conversions are done for C++20, treating it as a DR that resolves LWG 3656. With this change we can increase the value of the __cpp_lib_format macro for C++23. The value should be 202207 for P2419R2, but we already implement P2510R3 (Formatting pointers) so can use the value 202304. libstdc++-v3/ChangeLog: PR libstdc++/109162 * acinclude.m4 (libtool_VERSION): Update to 6:34:0. * config/abi/pre/gnu.ver: Disambiguate old patters. Add new GLIBCXX_3.4.34 symbol version and new exports. * configure: Regenerate. * include/bits/chrono_io.h (_ChronoSpec::_M_locale_specific): Add new accessor functions to use a reserved bit in _Spec. (__formatter_chrono::_M_parse): Use _M_locale_specific(true) when chrono-specs contains locale-dependent conversion specifiers. (__formatter_chrono::_M_format): Open iconv descriptor if conversion to UTF-8 will be needed. (__formatter_chrono::_M_write): New function to write a localized string with possible character conversion. (__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B) (__formatter_chrono::_M_p, __formatter_chrono::_M_r) (__formatter_chrono::_M_x, __formatter_chrono::_M_X) (__formatter_chrono::_M_locale_fmt): Use _M_write. * include/bits/version.def (format): Update value. * include/bits/version.h: Regenerate. * include/std/format (_GLIBCXX_P2518R3): Check feature test macro instead of __cplusplus. (basic_format_context): Declare __formatter_chrono as friend. * src/c++20/Makefile.am: Add new file. * src/c++20/Makefile.in: Regenerate. * src/c++20/format.cc: New file. * testsuite/std/time/format_localized.cc: New test. * testsuite/util/testsuite_abi.cc: Add new symbol version. --- libstdc++-v3/acinclude.m4 | 2 +- libstdc++-v3/config/abi/pre/gnu.ver | 16 +- libstdc++-v3/configure | 2 +- libstdc++-v3/include/bits/chrono_io.h | 96 ++++++++-- libstdc++-v3/include/bits/version.def | 34 +++- libstdc++-v3/include/bits/version.h | 11 +- libstdc++-v3/include/std/format | 16 +- libstdc++-v3/src/c++20/Makefile.am | 8 +- libstdc++-v3/src/c++20/Makefile.in | 10 +- libstdc++-v3/src/c++20/format.cc | 174 ++++++++++++++++++ .../testsuite/std/time/format_localized.cc | 47 +++++ libstdc++-v3/testsuite/util/testsuite_abi.cc | 1 + 12 files changed, 387 insertions(+), 30 deletions(-) create mode 100644 libstdc++-v3/src/c++20/format.cc create mode 100644 libstdc++-v3/testsuite/std/time/format_localized.cc diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index e04aae25360..e4ed583b3ae 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -4230,7 +4230,7 @@ changequote([,])dnl fi # For libtool versioning info, format is CURRENT:REVISION:AGE -libtool_VERSION=6:33:0 +libtool_VERSION=6:34:0 # Everything parsed; figure out what files and settings to use. case $enable_symvers in diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 31449b5b87b..939540a0a1e 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -109,7 +109,11 @@ GLIBCXX_3.4 { std::[j-k]*; # std::length_error::l*; # std::length_error::~l*; - std::locale::[A-Za-e]*; + # std::locale::[A-Za-d]*; + std::locale::all; + std::locale::classic*; + std::locale::collate; + std::locale::ctype; std::locale::facet::[A-Za-z]*; std::locale::facet::_S_get_c_locale*; std::locale::facet::_S_clone_c_locale*; @@ -168,7 +172,7 @@ GLIBCXX_3.4 { std::strstream*; std::strstreambuf*; # std::t[a-q]*; - std::t[a-g]*; + std::terminate*; std::th[a-h]*; std::th[j-q]*; std::th[s-z]*; @@ -2528,6 +2532,14 @@ GLIBCXX_3.4.33 { _ZNKSt12__basic_fileIcE13native_handleEv; } GLIBCXX_3.4.32; +# GCC 15.1.0 +GLIBCXX_3.4.34 { + # std::__format::__with_encoding_conversion + _ZNSt8__format26__with_encoding_conversionERKSt6locale; + # std::__format::__locale_encoding_to_utf8 + _ZNSt8__format25__locale_encoding_to_utf8ERKSt6localeSt17basic_string_viewIcSt11char_traitsIcEEPv; +} GLIBCXX_3.4.33; + # Symbols in the support library (libsupc++) have their own tag. CXXABI_1.3 { diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure index 5645e991af7..fe525308ae2 100755 --- a/libstdc++-v3/configure +++ b/libstdc++-v3/configure @@ -51040,7 +51040,7 @@ $as_echo "$as_me: WARNING: === Symbol versioning will be disabled." >&2;} fi # For libtool versioning info, format is CURRENT:REVISION:AGE -libtool_VERSION=6:33:0 +libtool_VERSION=6:34:0 # Everything parsed; figure out what files and settings to use. case $enable_symvers in diff --git a/libstdc++-v3/include/bits/chrono_io.h b/libstdc++-v3/include/bits/chrono_io.h index 72c66a0fef0..2f3ba89de61 100644 --- a/libstdc++-v3/include/bits/chrono_io.h +++ b/libstdc++-v3/include/bits/chrono_io.h @@ -38,8 +38,10 @@ #include // setw, setfill #include #include // from_chars +#include // __sso_string #include +#include namespace std _GLIBCXX_VISIBILITY(default) { @@ -211,6 +213,20 @@ namespace __format struct _ChronoSpec : _Spec<_CharT> { basic_string_view<_CharT> _M_chrono_specs; + + // Use one of the reserved bits in __format::_Spec. + // This indicates that a locale-dependent conversion specifier such as + // %a is used in the chrono-specs. This is not the same as the + // _Spec::_M_localized member which indicates that "L" was present + // in the format-spec, e.g. "{:L%a}" is localized and locale-specific, + // but "{:L}" is only localized and "{:%a}" is only locale-specific. + constexpr bool + _M_locale_specific() const noexcept + { return this->_M_reserved; } + + constexpr void + _M_locale_specific(bool __b) noexcept + { this->_M_reserved = __b; } }; // Represents the information provided by a chrono type. @@ -305,11 +321,12 @@ namespace __format const auto __chrono_specs = __first++; // Skip leading '%' if (*__chrono_specs != '%') __throw_format_error("chrono format error: no '%' at start of " - "chrono-specs"); + "chrono-specs"); _CharT __mod{}; bool __conv = true; int __needed = 0; + bool __locale_specific = false; while (__first != __last) { @@ -322,15 +339,18 @@ namespace __format case 'a': case 'A': __needed = _Weekday; + __locale_specific = true; break; case 'b': case 'h': case 'B': __needed = _Month; + __locale_specific = true; break; case 'c': __needed = _DateTime; __allowed_mods = _Mod_E; + __locale_specific = true; break; case 'C': __needed = _Year; @@ -368,6 +388,8 @@ namespace __format break; case 'p': case 'r': + __locale_specific = true; + [[fallthrough]]; case 'R': case 'T': __needed = _TimeOfDay; @@ -393,10 +415,12 @@ namespace __format break; case 'x': __needed = _Date; + __locale_specific = true; __allowed_mods = _Mod_E; break; case 'X': __needed = _TimeOfDay; + __locale_specific = true; __allowed_mods = _Mod_E; break; case 'y': @@ -436,6 +460,8 @@ namespace __format || (__mod == 'O' && !(__allowed_mods & _Mod_O))) __throw_format_error("chrono format error: invalid " " modifier in chrono-specs"); + if (__mod && __c != 'z') + __locale_specific = true; __mod = _CharT(); if ((__parts & __needed) != __needed) @@ -467,6 +493,7 @@ namespace __format _M_spec = __spec; _M_spec._M_chrono_specs = __string_view(__chrono_specs, __first - __chrono_specs); + _M_spec._M_locale_specific(__locale_specific); return __first; } @@ -486,6 +513,24 @@ namespace __format if (__first == __last) return _M_format_to_ostream(__t, __fc, __is_neg); +#if __glibcxx_format >= 202207L // C++ >= 23 + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3565. Handling of encodings in localized formatting + // of chrono types is underspecified + if constexpr (is_same_v<_CharT, char>) + if constexpr (__unicode::__literal_encoding_is_utf8()) + if (_M_spec._M_localized && _M_spec._M_locale_specific()) + { + extern locale __with_encoding_conversion(const locale&); + + // Allocate and cache the necessary state to convert strings + // in the locale's encoding to UTF-8. + locale __loc = __fc.locale(); + if (__loc != locale::classic()) + __fc._M_loc = __with_encoding_conversion(__loc); + } +#endif + _Sink_iter<_CharT> __out; __format::_Str_sink<_CharT> __sink; bool __write_direct = false; @@ -742,6 +787,30 @@ namespace __format static constexpr _CharT _S_space = _S_chars[14]; static constexpr const _CharT* _S_empty_spec = _S_chars + 15; + template + _OutIter + _M_write(_OutIter __out, const locale& __loc, __string_view __s) const + { +#if __glibcxx_format >= 202207L // C++ >= 20 + __sso_string __buf; + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3565. Handling of encodings in localized formatting + // of chrono types is underspecified + if constexpr (is_same_v<_CharT, char>) + if constexpr (__unicode::__literal_encoding_is_utf8()) + if (_M_spec._M_localized && _M_spec._M_locale_specific() + && __loc != locale::classic()) + { + extern string_view + __locale_encoding_to_utf8(const std::locale&, string_view, + void*); + + __s = __locale_encoding_to_utf8(__loc, __s, &__buf); + } +#endif + return __format::__write(std::move(__out), __s); + } + template typename _FormatContext::iterator _M_a_A(const _Tp& __t, typename _FormatContext::iterator __out, @@ -761,7 +830,7 @@ namespace __format else __tp._M_days_abbreviated(__days); __string_view __str(__days[__wd.c_encoding()]); - return __format::__write(std::move(__out), __str); + return _M_write(std::move(__out), __loc, __str); } template @@ -782,7 +851,7 @@ namespace __format else __tp._M_months_abbreviated(__months); __string_view __str(__months[(unsigned)__m - 1]); - return __format::__write(std::move(__out), __str); + return _M_write(std::move(__out), __loc, __str); } template @@ -1059,8 +1128,8 @@ namespace __format const auto& __tp = use_facet<__timepunct<_CharT>>(__loc); const _CharT* __ampm[2]; __tp._M_am_pm(__ampm); - return std::format_to(std::move(__out), _S_empty_spec, - __ampm[__hms.hours().count() >= 12]); + return _M_write(std::move(__out), __loc, + __ampm[__hms.hours().count() >= 12]); } template @@ -1095,8 +1164,9 @@ namespace __format basic_string<_CharT> __fmt(_S_empty_spec); __fmt.insert(1u, 1u, _S_colon); __fmt.insert(2u, __ampm_fmt); - return std::vformat_to(std::move(__out), __fmt, - std::make_format_args<_FormatContext>(__t)); + using _FmtStr = _Runtime_format_string<_CharT>; + return _M_write(std::move(__out), __loc, + std::format(__loc, _FmtStr(__fmt), __t)); } template @@ -1279,8 +1349,9 @@ namespace __format basic_string<_CharT> __fmt(_S_empty_spec); __fmt.insert(1u, 1u, _S_colon); __fmt.insert(2u, __rep); - return std::vformat_to(std::move(__out), __fmt, - std::make_format_args<_FormatContext>(__t)); + using _FmtStr = _Runtime_format_string<_CharT>; + return _M_write(std::move(__out), __loc, + std::format(__loc, _FmtStr(__fmt), __t)); } template @@ -1302,8 +1373,9 @@ namespace __format basic_string<_CharT> __fmt(_S_empty_spec); __fmt.insert(1u, 1u, _S_colon); __fmt.insert(2u, __rep); - return std::vformat_to(std::move(__out), __fmt, - std::make_format_args<_FormatContext>(__t)); + using _FmtStr = _Runtime_format_string<_CharT>; + return _M_write(std::move(__out), __loc, + std::format(__loc, _FmtStr(__fmt), __t)); } template @@ -1580,7 +1652,7 @@ namespace __format const auto& __tp = use_facet>(__loc); __tp.put(__os, __os, _S_space, &__tm, __fmt, __mod); if (__os) - __out = __format::__write(std::move(__out), __os.view()); + __out = _M_write(std::move(__out), __loc, __os.view()); return __out; } }; diff --git a/libstdc++-v3/include/bits/version.def b/libstdc++-v3/include/bits/version.def index 42cdef2f526..013445b75df 100644 --- a/libstdc++-v3/include/bits/version.def +++ b/libstdc++-v3/include/bits/version.def @@ -1161,16 +1161,27 @@ ftms = { }; ftms = { + name = format; + // 202311 P2918R2 Runtime format strings II + // values = { + // v = 202311; + // cxxmin = 26; + // hosted = yes; + // }; + // 202304 P2510R3 Formatting pointers + // 202305 P2757R3 Type checking format args + // 202306 P2637R3 Member visit + values = { + v = 202304; + cxxmin = 23; + hosted = yes; + }; // 201907 Text Formatting, Integration of chrono, printf corner cases. // 202106 std::format improvements. // 202110 Fixing locale handling in chrono formatters, generator-like types. // 202207 Encodings in localized formatting of chrono, basic-format-string. - // 202207 P2286R8 Formatting Ranges - // 202207 P2585R1 Improving default container formatting - // TODO: #define __cpp_lib_format_ranges 202207L - name = format; values = { - v = 202110; + v = 202207; cxxmin = 20; hosted = yes; }; @@ -1374,6 +1385,19 @@ ftms = { }; }; +// ftms = { + // name = format_ranges; + // 202207 P2286R8 Formatting Ranges + // 202207 P2585R1 Improving default container formatting + // LWG3750 Too many papers bump __cpp_lib_format + // TODO: #define __cpp_lib_format_ranges 202207L + // values = { + // v = 202207; + // cxxmin = 23; + // hosted = yes; + // }; +// }; + ftms = { name = freestanding_algorithm; values = { diff --git a/libstdc++-v3/include/bits/version.h b/libstdc++-v3/include/bits/version.h index 1eaf3733bc2..5f5eb872395 100644 --- a/libstdc++-v3/include/bits/version.h +++ b/libstdc++-v3/include/bits/version.h @@ -1304,10 +1304,15 @@ #undef __glibcxx_want_barrier #if !defined(__cpp_lib_format) -# if (__cplusplus >= 202002L) && _GLIBCXX_HOSTED -# define __glibcxx_format 202110L +# if (__cplusplus >= 202100L) && _GLIBCXX_HOSTED +# define __glibcxx_format 202304L # if defined(__glibcxx_want_all) || defined(__glibcxx_want_format) -# define __cpp_lib_format 202110L +# define __cpp_lib_format 202304L +# endif +# elif (__cplusplus >= 202002L) && _GLIBCXX_HOSTED +# define __glibcxx_format 202207L +# if defined(__glibcxx_want_all) || defined(__glibcxx_want_format) +# define __cpp_lib_format 202207L # endif # endif #endif /* !defined(__cpp_lib_format) && defined(__glibcxx_want_format) */ diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format index 16cee0d3c74..a4921ce391b 100644 --- a/libstdc++-v3/include/std/format +++ b/libstdc++-v3/include/std/format @@ -2342,10 +2342,10 @@ namespace __format // _GLIBCXX_RESOLVE_LIB_DEFECTS // P2510R3 Formatting pointers -#if __cplusplus > 202302L || ! defined __STRICT_ANSI__ -#define _GLIBCXX_P2518R3 1 +#if __glibcxx_format >= 202304L || ! defined __STRICT_ANSI__ +# define _GLIBCXX_P2518R3 1 #else -#define _GLIBCXX_P2518R3 0 +# define _GLIBCXX_P2518R3 0 #endif #if _GLIBCXX_P2518R3 @@ -3821,6 +3821,9 @@ namespace __format __do_vformat_to(_Out, basic_string_view<_CharT>, const basic_format_args<_Context>&, const locale* = nullptr); + + template struct __formatter_chrono; + } // namespace __format /// @endcond @@ -3831,6 +3834,11 @@ namespace __format * this class template explicitly. For typical uses of `std::format` the * library will use the specializations `std::format_context` (for `char`) * and `std::wformat_context` (for `wchar_t`). + * + * You are not allowed to define partial or explicit specializations of + * this class template. + * + * @since C++20 */ template class basic_format_context @@ -3863,6 +3871,8 @@ namespace __format const basic_format_args<_Context2>&, const locale*); + friend __format::__formatter_chrono<_CharT>; + public: ~basic_format_context() = default; diff --git a/libstdc++-v3/src/c++20/Makefile.am b/libstdc++-v3/src/c++20/Makefile.am index a24505e5141..d0f7859290c 100644 --- a/libstdc++-v3/src/c++20/Makefile.am +++ b/libstdc++-v3/src/c++20/Makefile.am @@ -36,7 +36,7 @@ else inst_sources = endif -sources = tzdb.cc +sources = tzdb.cc format.cc vpath % $(top_srcdir)/src/c++20 @@ -53,6 +53,12 @@ tzdb.o: tzdb.cc tzdata.zi.h $(CXXCOMPILE) -I. -c $< endif +# This needs access to std::text_encoding and to the internals of std::locale. +format.lo: format.cc + $(LTCXXCOMPILE) -std=gnu++26 -fno-access-control -c $< +format.o: format.cc + $(CXXCOMPILE) -std=gnu++26 -fno-access-control -c $< + if GLIBCXX_HOSTED libc__20convenience_la_SOURCES = $(sources) $(inst_sources) else diff --git a/libstdc++-v3/src/c++20/Makefile.in b/libstdc++-v3/src/c++20/Makefile.in index 3ec8c5ce804..d759b8dcc7c 100644 --- a/libstdc++-v3/src/c++20/Makefile.in +++ b/libstdc++-v3/src/c++20/Makefile.in @@ -121,7 +121,7 @@ CONFIG_CLEAN_FILES = CONFIG_CLEAN_VPATH_FILES = LTLIBRARIES = $(noinst_LTLIBRARIES) libc__20convenience_la_LIBADD = -am__objects_1 = tzdb.lo +am__objects_1 = tzdb.lo format.lo @ENABLE_EXTERN_TEMPLATE_TRUE@am__objects_2 = sstream-inst.lo @GLIBCXX_HOSTED_TRUE@am_libc__20convenience_la_OBJECTS = \ @GLIBCXX_HOSTED_TRUE@ $(am__objects_1) $(am__objects_2) @@ -432,7 +432,7 @@ headers = @ENABLE_EXTERN_TEMPLATE_TRUE@inst_sources = \ @ENABLE_EXTERN_TEMPLATE_TRUE@ sstream-inst.cc -sources = tzdb.cc +sources = tzdb.cc format.cc @GLIBCXX_HOSTED_FALSE@libc__20convenience_la_SOURCES = @GLIBCXX_HOSTED_TRUE@libc__20convenience_la_SOURCES = $(sources) $(inst_sources) @@ -755,6 +755,12 @@ vpath % $(top_srcdir)/src/c++20 @USE_STATIC_TZDATA_TRUE@tzdb.o: tzdb.cc tzdata.zi.h @USE_STATIC_TZDATA_TRUE@ $(CXXCOMPILE) -I. -c $< +# This needs access to std::text_encoding and to the internals of std::locale. +format.lo: format.cc + $(LTCXXCOMPILE) -std=gnu++26 -fno-access-control -c $< +format.o: format.cc + $(CXXCOMPILE) -std=gnu++26 -fno-access-control -c $< + # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: diff --git a/libstdc++-v3/src/c++20/format.cc b/libstdc++-v3/src/c++20/format.cc new file mode 100644 index 00000000000..507bac79e95 --- /dev/null +++ b/libstdc++-v3/src/c++20/format.cc @@ -0,0 +1,174 @@ +// Definitions for formatting -*- C++ -*- + +// Copyright The GNU Toolchain Authors. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// . + +#define _GLIBCXX_USE_CXX11_ABI 1 +#include "../c++26/text_encoding.cc" + +#if defined _GLIBCXX_USE_NL_LANGINFO_L && defined _GLIBCXX_HAVE_ICONV +# include +# include +# include // make_unique +# include // strlen, strcpy +# include +# include +#endif + +namespace std +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION +namespace __format +{ +// Helpers for P2419R2 +// (Clarify handling of encodings in localized formatting of chrono types) +// Convert a string from the locale's charset to UTF-8. + +namespace +{ +// A non-standard locale::facet that caches the locale's std::text_encoding +// and an iconv descriptor for converting from that encoding to UTF-8. +struct __encoding : locale::facet +{ + static locale::id id; + + explicit + __encoding(const text_encoding& enc, size_t refs = 0) + : facet(refs), _M_enc(enc) + { +#if defined _GLIBCXX_HAVE_ICONV + if (enc != text_encoding::UTF8 && enc != text_encoding::ASCII) + _M_cd = ::iconv_open("UTF-8", enc.name()); +#endif + } + + ~__encoding() + { +#if defined _GLIBCXX_HAVE_ICONV + if (_M_has_desc()) + ::iconv_close(_M_cd); +#endif + } + + bool _M_has_desc() const + { +#if defined _GLIBCXX_HAVE_ICONV + return _M_cd != (::iconv_t)-1; +#else + return false; +#endif + } + + text_encoding _M_enc; +#if defined _GLIBCXX_HAVE_ICONV + ::iconv_t _M_cd = (::iconv_t)-1; +#endif +}; + +locale::id __encoding::id; +} // namespace + +std::locale +__with_encoding_conversion(const std::locale& loc) +{ +#if defined _GLIBCXX_USE_NL_LANGINFO_L && __CHAR_BIT__ == 8 + if (std::__try_use_facet<__encoding>(loc)) + return loc; + + string name = loc.name(); + if (name == "C" || name == "*") + return loc; + + text_encoding locenc = __locale_encoding(name.c_str()); + + if (locenc == text_encoding::UTF8 || locenc == text_encoding::ASCII + || locenc == text_encoding::unknown) + return loc; + + auto impl = std::make_unique(*loc._M_impl, 1); + auto facetp = std::make_unique<__encoding>(locenc); + locale loc2(loc, facetp.get()); // FIXME: PR libstdc++/113704 + facetp.release(); + // FIXME: Ideally we wouldn't need to reallocate this string again, + // just don't delete[] it in the locale(locale, Facet*) constructor. + if (const char* name = loc._M_impl->_M_names[0]) + { + loc2._M_impl->_M_names[0] = new char[strlen(name) + 1]; + strcpy(loc2._M_impl->_M_names[0], name); + } + return loc2; +#else + return loc; +#endif +} + +string_view +__locale_encoding_to_utf8(const std::locale& loc, string_view str, + void* poutbuf) +{ +#if defined _GLIBCXX_USE_NL_LANGINFO_L && __CHAR_BIT__ == 8 \ + && _GLIBCXX_HAVE_ICONV + string& outbuf = *static_cast(poutbuf); + // Don't need to use __try_use_facet with its dynamic_cast<__encoding*>, + // since we know there are no types derived from __encoding. If the array + // element is non-null, we have the facet. + auto id = __encoding::id._M_id(); + auto enc_facet = static_cast(loc._M_impl->_M_facets[id]); + if (!enc_facet || !enc_facet->_M_has_desc()) + return str; + + size_t inbytesleft = str.size(); + size_t written = 0; + bool done = false; + + auto overwrite = [&](char* p, size_t n) { + auto inbytes = const_cast(str.data()) + str.size() - inbytesleft; + char* outbytes = p + written; + size_t outbytesleft = n - written; + size_t res = ::iconv(enc_facet->_M_cd, &inbytes, &inbytesleft, + &outbytes, &outbytesleft); + if (res == (size_t)-1) + { + if (errno != E2BIG) + { + done = true; + return 0zu; + } + } + else + done = true; + written = outbytes - p; + return written; + }; + do + outbuf.resize_and_overwrite(outbuf.capacity() + (inbytesleft * 3 / 2), + overwrite); + while (!done); + if (outbuf.size()) + str = outbuf; +#endif // USE_NL_LANGINFO_L && CHAR_BIT == 8 && HAVE_ICONV + + return str; +} +} // namespace __format +_GLIBCXX_END_NAMESPACE_VERSION +} // namespace std diff --git a/libstdc++-v3/testsuite/std/time/format_localized.cc b/libstdc++-v3/testsuite/std/time/format_localized.cc new file mode 100644 index 00000000000..c24b18d088a --- /dev/null +++ b/libstdc++-v3/testsuite/std/time/format_localized.cc @@ -0,0 +1,47 @@ +// { dg-do run { target c++23 } } +// { dg-require-namedlocale "ru_UA.koi8u" } +// { dg-require-namedlocale "es_ES.ISO8859-1" } +// { dg-require-namedlocale "fr_FR.ISO8859-1" } +// { dg-require-effective-target cxx11_abi } + +// P2419R2 +// Clarify handling of encodings in localized formatting of chrono types + +// Localized date-time strings such as "février" should be converted to UTF-8 +// if the locale uses a different encoding. + +#include +#include +#include + +void +test_ru() +{ + std::locale loc("ru_UA.koi8u"); + auto s = std::format(loc, "День недели: {:L}", std::chrono::Monday); + VERIFY( s == "День недели: Пн" ); +} + +void +test_es() +{ + std::locale loc(ISO_8859(1,es_ES)); + auto s = std::format(loc, "Día de la semana: {:L%A %a}", std::chrono::Wednesday); + VERIFY( s == "Día de la semana: miércoles mié" ); +} + +void +test_fr() +{ + std::locale loc(ISO_8859(1,fr_FR)); + auto s = std::format(loc, "Six mois après {0:L%b}, c'est {1:L%B}.", + std::chrono::February, std::chrono::August); + VERIFY( s == "Six mois après févr., c'est août." ); +} + +int main() +{ + test_ru(); + test_es(); + test_fr(); +} diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc b/libstdc++-v3/testsuite/util/testsuite_abi.cc index ec7c3df9ecc..ce9cda660fa 100644 --- a/libstdc++-v3/testsuite/util/testsuite_abi.cc +++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc @@ -215,6 +215,7 @@ check_version(symbol& test, bool added) known_versions.push_back("GLIBCXX_3.4.31"); known_versions.push_back("GLIBCXX_3.4.32"); known_versions.push_back("GLIBCXX_3.4.33"); + known_versions.push_back("GLIBCXX_3.4.34"); known_versions.push_back("GLIBCXX_LDBL_3.4.31"); known_versions.push_back("GLIBCXX_IEEE128_3.4.29"); known_versions.push_back("GLIBCXX_IEEE128_3.4.30");