From patchwork Mon Jun 1 16:57:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carlos O'Donell X-Patchwork-Id: 1301831 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=sourceware.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=ZRe+6ENI; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49bLvF6PGcz9sVR for ; Tue, 2 Jun 2020 02:57:29 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BC71C3851C36; Mon, 1 Jun 2020 16:57:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BC71C3851C36 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1591030645; bh=s3eYMT/OLQSYmJ0X1RAocftfVp55u/zfFGo/XUYnD58=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=ZRe+6ENIlM72LmQw6jlgohFLRIzz1utjQM1rC6jdtbLR37whKfjYDaIeCQbCciY1z pG/Qi6moucqyk/koy1hzKWOH8OhEfarCRTw9phf08i/TiqnacGp0zbvvq9xoxTNXIw r3zF6UgXEQ0LhGF68+lNca76ZZyLmJ15HioCM5Kc= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81]) by sourceware.org (Postfix) with ESMTP id E275F3851C0B for ; Mon, 1 Jun 2020 16:57:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E275F3851C0B Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-357-zTE5fn2uOZOAFa7e3pTsxw-1; Mon, 01 Jun 2020 12:57:18 -0400 X-MC-Unique: zTE5fn2uOZOAFa7e3pTsxw-1 Received: by mail-qk1-f200.google.com with SMTP id v6so8611356qkh.7 for ; Mon, 01 Jun 2020 09:57:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=s3eYMT/OLQSYmJ0X1RAocftfVp55u/zfFGo/XUYnD58=; b=UDWp0lqU+mOfez8Mmr3VIN2i2mh5s9mAkTLxZdLoYQMkqh1GxsaF00dcGUkHYuaGKr JmD+AuUzhMHOB/5qgQRIHa/YdxG7HsSzA29XxQYieQ85LsFraO4NOGmsu3gvPiGtYuOg DlfvMH8Y/Cu0gur173Axn9zOKMw97qWZcqnEqsXB7mLEFiRov4ZTFak1gRJsU4a7otID /wRD32TOGwP9w1PCDgP68e331PoaiADN1QOl0q3abGlPWr1S2v8ocTw/OOzH3e1mKWtf 8aSFNISDyvNwn+GUeEVKuiEqJ3rnFwPuiV0UFnbg1wmKtEigh5xtJcYxexkXMYt6hj7k 4r6g== X-Gm-Message-State: AOAM533Fx9ggP7RrFMftakXa25nmv8mCWKhgi4vqxzq9KswBRDQUBCW9 DvJkuNsyUXmAEjM80VpYFRzSE9p6QQ62HsUMHCrwZm4NFdaaOFMkjb07Hju0utYEiAAP8cjNm4w jbUlqOVk7Ve5i7XbUXrWj X-Received: by 2002:a37:8d85:: with SMTP id p127mr19324494qkd.35.1591030637702; Mon, 01 Jun 2020 09:57:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz943Y+9H+Iv3mxkc5EcEvwcaQ/QRQGB3Zk9sKW4uc5wU97ikOEzt3e6gdYfwmaTzecGC7OzA== X-Received: by 2002:a37:8d85:: with SMTP id p127mr19324472qkd.35.1591030637343; Mon, 01 Jun 2020 09:57:17 -0700 (PDT) Received: from [192.168.1.4] (198-84-170-103.cpe.teksavvy.com. [198.84.170.103]) by smtp.gmail.com with ESMTPSA id z60sm16346681qtc.30.2020.06.01.09.57.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 01 Jun 2020 09:57:16 -0700 (PDT) Subject: [PATCH v3] mbstowcs: Document, test, and fix null pointer dst semantics (Bug 25219) To: Florian Weimer References: <63ea09de-e943-c26e-c821-093ba303d76b@redhat.com> <87r1vc43ff.fsf@oldenburg2.str.redhat.com> <87eeqyyd3v.fsf@mid.deneb.enyo.de> <66774998-1f2c-82d3-237e-914e950edd4f@redhat.com> <87a71myc9r.fsf@mid.deneb.enyo.de> Organization: Red Hat Message-ID: Date: Mon, 1 Jun 2020 12:57:15 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <87a71myc9r.fsf@mid.deneb.enyo.de> Content-Language: en-US X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Carlos O'Donell via Libc-alpha From: Carlos O'Donell Reply-To: Carlos O'Donell Cc: Florian Weimer , Martin Sebor , Carlos O'Donell via Libc-alpha Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" On 6/1/20 12:35 PM, Florian Weimer wrote: > * Carlos O'Donell: > >> How about this? >> ~~~ >> This behaviour of accepting a null pointer for @var{wstring} is an @w{XPG4.2} >> extension that is not specified in @w{ISO C} and is optional in @w{POSIX}. >> ~~~ > > Looks good to me. v3 - Updated XPG4.2 text. - Fixed space after sizeof x 2 - Fixed space after period x 1 OK for master? 8< --- 8< --- 8< The function mbstowcs, by an XSI extension to POSIX, accepts a null pointer for the destination wchar_t array. This API behaviour allows you to use the function to compute the length of the required wchar_t array i.e. does the conversion without storing it and returns the number of wide characters required. We remove the __write_only__ markup for the first argument because it is not true since the destination may be a null pointer, and so the length argument may not apply. We remove the markup otherwise the new test case cannot be compiled with -Werror=nonnull. We add a new test case for mbstowcs which exercises the destination is a null pointer behaviour which we have now explicitly documented. The mbsrtowcs and mbsnrtowcs behave similarly, and mbsrtowcs is documented as doing this in C11, even if the standard doesn't come out and call out this specific use case. We add one note to each of mbsrtowcs and mbsnrtowcs to call out that they support a null pointer for the destination. The wcsrtombs function behaves similarly but in the other way around and allows you to use a null destination pointer to compute how many bytes you would need to convert the wide character input. We document this particular case also, but leave wcsnrtombs as a references to wcsrtombs, so the reader must still read the details of the semantics for wcsrtombs. --- manual/charset.texi | 23 ++++++++++++++++---- stdlib/stdlib.h | 2 +- wcsmbs/Makefile | 2 +- wcsmbs/tst-mbstowcs.c | 50 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 71 insertions(+), 6 deletions(-) create mode 100644 wcsmbs/tst-mbstowcs.c diff --git a/manual/charset.texi b/manual/charset.texi index 9fd0166115..b638323fc2 100644 --- a/manual/charset.texi +++ b/manual/charset.texi @@ -1026,6 +1026,10 @@ stores in the pointer pointed to by @var{src} either a null pointer (if the NUL byte in the input string was reached) or the address of the byte following the last converted multibyte character. +Like @code{mbstowcs} the @var{dst} parameter may be a null pointer and +the function can be used to count the number of wide characters that +would be required. + @pindex wchar.h @code{mbsrtowcs} was introduced in @w{Amendment 1} to @w{ISO C90} and is declared in @file{wchar.h}. @@ -1101,10 +1105,11 @@ successfully converted. Except in the case of an encoding error the return value of the @code{wcsrtombs} function is the number of bytes in all the multibyte -character sequences stored in @var{dst}. Before returning, the state in -the object pointed to by @var{ps} (or the internal object in case -@var{ps} is a null pointer) is updated to reflect the state after the -last conversion. The state is the initial shift state in case the +character sequences which were or would have been (if @var{dst} was +not a null) stored in @var{dst}. Before returning, the state in the +object pointed to by @var{ps} (or the internal object in case @var{ps} +is a null pointer) is updated to reflect the state after the last +conversion. The state is the initial shift state in case the terminating NUL wide character was converted. @pindex wchar.h @@ -1131,6 +1136,10 @@ string @code{*@var{src}} need not be NUL-terminated. But if a NUL byte is found within the @var{nmc} first bytes of the string, the conversion stops there. +Like @code{mbstowcs} the @var{dst} parameter may be a null pointer and +the function can be used to count the number of wide characters that +would be required. + This function is a GNU extension. It is meant to work around the problems mentioned above. Now it is possible to convert a buffer with multibyte character text piece by piece without having to care about @@ -1465,6 +1474,12 @@ mbstowcs_alloc (const char *string) @} @end smallexample +If @var{wstring} is a null pointer then no output is written and the +conversion proceeds as above, and the result is returned. In practice +such behaviour is useful for calculating the exact number of wide +characters required to convert @var{string}. This behaviour of +accepting a null pointer for @var{wstring} is an @w{XPG4.2} extension +that is not specified in @w{ISO C} and is optional in @w{POSIX}. @end deftypefun @deftypefun size_t wcstombs (char *@var{string}, const wchar_t *@var{wstring}, size_t @var{size}) diff --git a/stdlib/stdlib.h b/stdlib/stdlib.h index dd779bd740..f971df4247 100644 --- a/stdlib/stdlib.h +++ b/stdlib/stdlib.h @@ -932,7 +932,7 @@ extern int wctomb (char *__s, wchar_t __wchar) __THROW; /* Convert a multibyte string to a wide char string. */ extern size_t mbstowcs (wchar_t *__restrict __pwcs, const char *__restrict __s, size_t __n) __THROW - __attr_access ((__write_only__, 1, 3)) __attr_access ((__read_only__, 2)); + __attr_access ((__read_only__, 2)); /* Convert a wide char string to multibyte string. */ extern size_t wcstombs (char *__restrict __s, const wchar_t *__restrict __pwcs, size_t __n) diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile index f02167fa58..e638e45522 100644 --- a/wcsmbs/Makefile +++ b/wcsmbs/Makefile @@ -52,7 +52,7 @@ tests := tst-wcstof wcsmbs-tst1 tst-wcsnlen tst-btowc tst-mbrtowc \ tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \ tst-wcstod-round test-char-types tst-fgetwc-after-eof \ tst-wcstod-nan-sign tst-c16-surrogate tst-c32-state \ - $(addprefix test-,$(strop-tests)) + $(addprefix test-,$(strop-tests)) tst-mbstowcs include ../Rules diff --git a/wcsmbs/tst-mbstowcs.c b/wcsmbs/tst-mbstowcs.c new file mode 100644 index 0000000000..b48891553e --- /dev/null +++ b/wcsmbs/tst-mbstowcs.c @@ -0,0 +1,50 @@ +/* Test basic mbstowcs including wstring == NULL (Bug 25219). + Copyright (C) 2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +static int +do_test (void) +{ + char string[] = { '1', '2', '3' , '4', '5', '\0' }; + size_t len = strlen (string); + wchar_t wstring[] = { L'1', L'2', L'3', L'4', L'5', L'\0' }; +#define NUM_WCHAR 6 + wchar_t wout[NUM_WCHAR]; + size_t result; + + /* The input ASCII string in the C/POSIX locale must convert + to the matching WSTRING. */ + result = mbstowcs (wout, string, NUM_WCHAR); + TEST_VERIFY (result == (NUM_WCHAR - 1)); + TEST_COMPARE_BLOB (wstring, sizeof (wchar_t) * (NUM_WCHAR - 1), + wout, sizeof (wchar_t) * result); + + /* The input ASCII string in the C/POSIX locale must be the + same length when using mbstowcs to compute the length of + the string required in the conversion. Using mbstowcs + in this way is an XSI extension to POSIX. */ + result = mbstowcs (NULL, string, len); + TEST_VERIFY (result == len); + + return 0; +} + +#include