From patchwork Fri May 6 14:04:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Siddhesh Poyarekar X-Patchwork-Id: 1628278 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=ZL65JYjE; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4KxRL10vYNz9sGf for ; Mon, 9 May 2022 13:22:01 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3636A3857811 for ; Mon, 9 May 2022 03:21:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3636A3857811 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652066518; bh=dsDKais8BpFFOmB6vD9lamKaUqIvDEG/jVPBl85+GUU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=ZL65JYjEUpjctZKvbx17x4XhteTWIqrdH6icUlzMLgDljNX3zNegCXRgkMUWg0sxx bxr5w6ZEGSp45xNy5wFZbKEGzp9Dklp99eSTmx9+w0vJBrs9LX22DD4FU5XD7iiEt8 fk0Xg2sENk0n6kzKREJk6+IKP2QB3ryrRM0BB4ts= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from crane.ash.relay.mailchannels.net (crane.ash.relay.mailchannels.net [23.83.222.43]) by sourceware.org (Postfix) with ESMTPS id 0BB8F3858C2C for ; Mon, 9 May 2022 03:21:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0BB8F3858C2C X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 1C2944C1AC2; Mon, 9 May 2022 03:21:38 +0000 (UTC) Received: from pdx1-sub0-mail-a304.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 96C164C1B68; Mon, 9 May 2022 03:21:37 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1652066497; a=rsa-sha256; cv=none; b=XPXjOvGfxYnpMYJVq9mV5OFHQGV1kproedIYx6Re0GPyhhZy/0HA8p9n904Z65jZGcYHQG qR/C2zTzmtY1ONlx0ZgRET1qWqSH5e73wTLjaJkd6ovaYY+HOBgrurMFhtulZ7n5DN9h3C 3sXO2x53gMG9AYDiki7/8eg2KaEgg7xdOux8+CiY8TIg3QaZ9ezSlpmBGHJJCXu0uPSfj1 IW/cnCYnKXYZdj67Zgh70EuM8uUn9ZTNjZ4CreKVaH6HDdzqy050eZSeT8HuDpDeKJMJ4A y6ilOYMDVi2TLLh5Bou86S4OH+dfvj5/SLe/PXCDAj441KpzehKrmUe29I+CZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1652066497; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dsDKais8BpFFOmB6vD9lamKaUqIvDEG/jVPBl85+GUU=; b=LkgYPkcYjOYirmiXDEe87/mlYReVh5UZg3mTrl3To/BGXULGHEHDb6WaxOH/uUVC0WWCav w8r2SZ91LqJM8CWtjd4gYfpR67rNduY/QvGGvsrsI7cdOwaber5EiXlA8aGYSMRUSxCmfO 9hL9rMgI2QMfgtWnxTAbWF4jb0XXtSXH50yBpGaXSOCogl6TEbLXL1WdwDVYlJ8J9JW8No mmUxg7+JRSp/HazI5r3KfYzc+YW3kN9fYG2rnzte89afvLyxvjBFbxxtZtTOdHD/pfXmZ4 iyUFeGq8632QfIwqDXdit4z0f1eDzEQmrcCkPF1t2VzTwoXZF7M/o/+4C6zK/w== ARC-Authentication-Results: i=1; rspamd-847cd55849-6dknf; auth=pass smtp.auth=dreamhost smtp.mailfrom=siddhesh@sourceware.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-Eyes-Left: 0009c6c6593bdb22_1652066497929_72683277 X-MC-Loop-Signature: 1652066497929:26905173 X-MC-Ingress-Time: 1652066497928 Received: from pdx1-sub0-mail-a304.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.96.96.26 (trex/6.7.1); Mon, 09 May 2022 03:21:37 +0000 Received: from rhbox.intra.reserved-bit.com (unknown [1.186.121.104]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a304.dreamhost.com (Postfix) with ESMTPSA id 4KxRKV2M4Cz1Pg; Sun, 8 May 2022 20:21:34 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v2] wcrtomb: Make behavior POSIX compliant Date: Fri, 6 May 2022 19:34:15 +0530 Message-Id: <20220506140415.1275807-1-siddhesh@sourceware.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220505184348.3357550-3-siddhesh@sourceware.org> References: <20220505184348.3357550-3-siddhesh@sourceware.org> MIME-Version: 1.0 X-Spam-Status: No, score=-3494.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_DMARC_NONE, KAM_DMARC_STATUS, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_SBL, SPF_HELO_NONE, SPF_NEUTRAL, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Siddhesh Poyarekar via Libc-alpha From: Siddhesh Poyarekar Reply-To: Siddhesh Poyarekar Cc: fweimer@redhat.com, jakub@redhat.com, schwab@linux-m68k.org, dickey@his.com Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" The GNU implementation of wcrtomb assumes that there are at least MB_CUR_MAX bytes available in the destination buffer passed to wcrtomb as the first argument. This is not compatible with the POSIX definition, which only requires enough space for the input wide character. This does not break much in practice because when users supply buffers smaller than MB_CUR_MAX (e.g. in ncurses), they compute and dynamically allocate the buffer, which results in enough spare space (thanks to usable_size in malloc and padding in alloca) that no actual buffer overflow occurs. However when the code is built with _FORTIFY_SOURCE, it runs into the hard check against MB_CUR_MAX in __wcrtomb_chk and hence fails. It wasn't evident until now since dynamic allocations would result in wcrtomb not being fortified but since _FORTIFY_SOURCE=3, that limitation is gone, resulting in such code failing. To fix this problem, introduce an internal buffer that is MB_LEN_MAX long and use that to perform the conversion and then copy the resultant bytes into the destination buffer. Also move the fortification check into the main implementation, which checks the result after conversion and aborts if the resultant byte count is greater than the destination buffer size. One complication is that applications that assume the MB_CUR_MAX limitation to be gone may not be able to run safely on older glibcs if they use static destination buffers smaller than MB_CUR_MAX; dynamic allocations will always have enough spare space that no actual overruns will occur. One alternative to fixing this is to bump symbol version to prevent them from running on older glibcs but that seems too strict a constraint. Instead, since these users will only have made this decision on reading the manual, I have put a note in the manual warning them about the pitfalls of having static buffers smaller than MB_CUR_MAX and running them on older glibc. Benchmarking: The wcrtomb microbenchmark shows significant increases in maximum execution time for all locales, ranging from 10x for ar_SA.UTF-8 to 1.5x-2x for nearly everything else. The mean execution time however saw practically no impact, with some results even being quicker, indicating that cache locality has a much bigger role in the overhead. Given that the additional copy uses a temporary buffer inside wcrtomb, it's likely that a hot path will end up putting that buffer (which is responsible for the additional overhead) in a similar place on stack, giving the necessary cache locality to negate the overhead. However in situations where wcrtomb ends up getting called at wildly different spots on the call stack (or is on different call stacks, e.g. with threads or different execution contexts) and is still a hotspot, the performance lag will be visible. Signed-off-by: Siddhesh Poyarekar --- Changes from v1: - Fixed nits suggested by Paul Eggert. debug/tst-fortify.c | 7 ++++++- debug/wcrtomb_chk.c | 8 ++------ include/wchar.h | 4 ++++ manual/charset.texi | 10 +++++----- wcsmbs/wcrtomb.c | 36 ++++++++++++++++++++++++++++-------- 5 files changed, 45 insertions(+), 20 deletions(-) diff --git a/debug/tst-fortify.c b/debug/tst-fortify.c index 03c9867714..8e94643bf2 100644 --- a/debug/tst-fortify.c +++ b/debug/tst-fortify.c @@ -1478,10 +1478,15 @@ do_test (void) character which has a multibyte representation which does not fit. */ CHK_FAIL_START - char smallbuf[2]; + char smallbuf[1]; if (wcrtomb (smallbuf, L'\x100', &s) != 2) FAIL (); CHK_FAIL_END + + /* Same input with a large enough buffer and we're good. */ + char bigenoughbuf[2]; + if (wcrtomb (bigenoughbuf, L'\x100', &s) != 2) + FAIL (); #endif wchar_t wenough[10]; diff --git a/debug/wcrtomb_chk.c b/debug/wcrtomb_chk.c index 8b6d026560..28c3ea0d2d 100644 --- a/debug/wcrtomb_chk.c +++ b/debug/wcrtomb_chk.c @@ -1,4 +1,5 @@ /* Copyright (C) 2005-2022 Free Software Foundation, Inc. + Copyright The GNU Toolchain Authors. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -25,10 +26,5 @@ size_t __wcrtomb_chk (char *s, wchar_t wchar, mbstate_t *ps, size_t buflen) { - /* We do not have to implement the full wctomb semantics since we - know that S cannot be NULL when we come here. */ - if (buflen < MB_CUR_MAX) - __chk_fail (); - - return __wcrtomb (s, wchar, ps); + return __wcrtomb_internal (s, wchar, ps, buflen); } diff --git a/include/wchar.h b/include/wchar.h index 4267985625..db83297bca 100644 --- a/include/wchar.h +++ b/include/wchar.h @@ -172,6 +172,10 @@ libc_hidden_proto (__mbrtowc) libc_hidden_proto (__mbrlen) extern size_t __wcrtomb (char *__restrict __s, wchar_t __wc, __mbstate_t *__restrict __ps) attribute_hidden; +extern size_t __wcrtomb_internal (char *__restrict __s, wchar_t __wc, + __mbstate_t *__restrict __ps, + size_t __s_size) + attribute_hidden; extern size_t __mbsrtowcs (wchar_t *__restrict __dst, const char **__restrict __src, size_t __len, __mbstate_t *__restrict __ps) diff --git a/manual/charset.texi b/manual/charset.texi index a9b5cb4a37..67fe5bf3c7 100644 --- a/manual/charset.texi +++ b/manual/charset.texi @@ -883,11 +883,11 @@ the string @var{s}. This includes all bytes representing shift sequences. One word about the interface of the function: there is no parameter -specifying the length of the array @var{s}. Instead the function -assumes that there are at least @code{MB_CUR_MAX} bytes available since -this is the maximum length of any byte sequence representing a single -character. So the caller has to make sure that there is enough space -available, otherwise buffer overruns can occur. +specifying the length of the array @var{s}, so the caller has to make sure +that there is enough space available, otherwise buffer overruns can occur. +Also, @theglibc{} versions older than 2.36 assume that @var{s} is at least +@var{MB_CUR_MAX} bytes long, so programs that need to run on older +@glibcadj{} versions must comply with this limit. @pindex wchar.h @code{wcrtomb} was introduced in @w{Amendment 1} to @w{ISO C90} and is diff --git a/wcsmbs/wcrtomb.c b/wcsmbs/wcrtomb.c index e17438989f..db12e46297 100644 --- a/wcsmbs/wcrtomb.c +++ b/wcsmbs/wcrtomb.c @@ -1,4 +1,5 @@ /* Copyright (C) 1996-2022 Free Software Foundation, Inc. + Copyright The GNU Toolchain Authors. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -20,6 +21,7 @@ #include #include #include +#include #include #include @@ -34,7 +36,7 @@ static mbstate_t state; size_t -__wcrtomb (char *s, wchar_t wc, mbstate_t *ps) +__wcrtomb_internal (char *s, wchar_t wc, mbstate_t *ps, size_t s_size) { char buf[MB_LEN_MAX]; struct __gconv_step_data data; @@ -52,14 +54,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) /* A first special case is if S is NULL. This means put PS in the initial state. */ if (s == NULL) - { - s = buf; - wc = L'\0'; - } + wc = L'\0'; /* Tell where we want to have the result. */ - data.__outbuf = (unsigned char *) s; - data.__outbufend = (unsigned char *) s + MB_CUR_MAX; + data.__outbuf = (unsigned char *) buf; + data.__outbufend = (unsigned char *) buf + sizeof buf; /* Get the conversion functions. */ fcts = get_gconv_fcts (_NL_CURRENT_DATA (LC_CTYPE)); @@ -101,7 +100,22 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) if (status == __GCONV_OK || status == __GCONV_EMPTY_INPUT || status == __GCONV_FULL_OUTPUT) - result = data.__outbuf - (unsigned char *) s; + { + result = data.__outbuf - (unsigned char *) buf; + + if (s != NULL) + { + if (result > s_size) + __chk_fail (); + + if (__glibc_likely (result < 2)) + *s = *buf; + else if (__glibc_likely (result == 2)) + memcpy (s, buf, result); /* Help the compiler. */ + else + memcpy (s, buf, result); + } + } else { result = (size_t) -1; @@ -110,5 +124,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) return result; } + +size_t +__wcrtomb (char *s, wchar_t wc, mbstate_t *ps) +{ + return __wcrtomb_internal (s, wc, ps, (size_t) -1); +} weak_alias (__wcrtomb, wcrtomb) libc_hidden_weak (wcrtomb)