From patchwork Mon Jul 3 19:19:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1802921 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=PxCVwfMk; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qvwjk71pDz20bn for ; Tue, 4 Jul 2023 05:20:10 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6F2283857835 for ; Mon, 3 Jul 2023 19:20:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6F2283857835 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1688412008; bh=KxYzIM7Lx0uDMStLpTljkieodowhPj5CTrwWU5YD7Uk=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=PxCVwfMkTJI0qVwommlI5WwMJQ8C1GqeZ04e/6C1mHnUzDFh5GXZWSQInfKYNGTsI vdxDTTgNB/O3ZYJAU2Mp1QH44X1iWx+dEPoz757tbnbfJAmBf/3R/FCwc/Q1xCiOnK Cyfv5oSiYcR6KpJujFuo41J5ODck4EFbWPxKKTJs= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oa1-x33.google.com (mail-oa1-x33.google.com [IPv6:2001:4860:4864:20::33]) by sourceware.org (Postfix) with ESMTPS id 6A7AF3858D32 for ; Mon, 3 Jul 2023 19:19:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6A7AF3858D32 Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-1a28de15c8aso4423300fac.2 for ; Mon, 03 Jul 2023 12:19:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688411990; x=1691003990; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KxYzIM7Lx0uDMStLpTljkieodowhPj5CTrwWU5YD7Uk=; b=Fotv5MTLyiQaMEnPCYWcPf9m/I0YM7rbnrT8C1sf746tiOvHCbAM04tj1syIW4gH+4 eT13+RGxJXmTQdVtGCWwswXASX/m3siycWeNsDiVxqyLWV5M2+Yn4FhM7cKsfq8mwnR8 w+ChEprHvYMrhSr6cf0BKRYES8SOujHpN2RRFeuLMWlljINlBJiSrqh6U3VoyzEJqQwi YiGHSIdMpdqebgPM3p3nV5kfgnZDzeA+3KifyAU+I6T3tqEZdOf2OmnETevKQLTH3Tt+ 9LsTZ8BaysPQAOjWtZl3FqbXhfPtalbO3R69yVUMNqdmEQ6BsiUJC+q4pmCkybPCuF3N rK6w== X-Gm-Message-State: AC+VfDwzPf5C/wmAL7d+AwWQjcPlw9kudRxlp3acVBJGGriD+gjCd/H4 nramJARFt3XiseDmeUCOXs0dtZG075NF9QfR4sNEgQ== X-Google-Smtp-Source: APBJJlF5E8C6zwg+vE3Sw6JIsVz22K0ouy2/Bg6PHlwOuy4BrMwgLVZtrKWFoLnzg4URL6Yxg1ZhCw== X-Received: by 2002:a05:6870:670b:b0:187:d229:2781 with SMTP id gb11-20020a056870670b00b00187d2292781mr12501425oab.6.1688411990173; Mon, 03 Jul 2023 12:19:50 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c3:665c:ecc5:4952:8dae:2a84]) by smtp.gmail.com with ESMTPSA id eh18-20020a056870f59200b001a663e49523sm12782529oab.36.2023.07.03.12.19.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Jul 2023 12:19:49 -0700 (PDT) To: libc-alpha@sourceware.org, Joe Simmons-Talbott Subject: [PATCH v2] vfscanf-internal: Remove potentially unbounded allocas Date: Mon, 3 Jul 2023 16:19:45 -0300 Message-Id: <20230703191945.1923238-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Some locales define a list of mapping pairs of alternate digits and separators for input digits (to_inpunct). This require the scanf to create a list of all possible inputs for the optional type modifier 'I'. Checked on x86_64-linux-gnu. --- Changes from v1: * Add malloc failure check. --- stdio-common/Makefile | 3 ++ stdio-common/tst-scanf-to_inpunct.c | 78 ++++++++++++++++++++++++++++ stdio-common/vfscanf-internal.c | 53 ++++++++++++------- wcsmbs/Makefile | 3 ++ wcsmbs/tst-wscanf-to_inpunct.c | 79 +++++++++++++++++++++++++++++ 5 files changed, 199 insertions(+), 17 deletions(-) create mode 100644 stdio-common/tst-scanf-to_inpunct.c create mode 100644 wcsmbs/tst-wscanf-to_inpunct.c diff --git a/stdio-common/Makefile b/stdio-common/Makefile index 8871ec7668..f6d9017ff1 100644 --- a/stdio-common/Makefile +++ b/stdio-common/Makefile @@ -231,6 +231,7 @@ tests := \ tst-scanf-binary-gnu11 \ tst-scanf-binary-gnu89 \ tst-scanf-round \ + tst-scanf-to_inpunct \ tst-setvbuf1 \ tst-sprintf \ tst-sprintf-errno \ @@ -347,6 +348,7 @@ LOCALES := \ de_DE.ISO-8859-1 \ de_DE.UTF-8 \ en_US.ISO-8859-1 \ + fa_IR.UTF-8 \ hi_IN.UTF-8 \ ja_JP.EUC-JP \ ps_AF.UTF-8 \ @@ -366,6 +368,7 @@ $(objpfx)tst-swprintf.out: $(gen-locales) $(objpfx)tst-vfprintf-mbs-prec.out: $(gen-locales) $(objpfx)tst-vfprintf-width-i18n.out: $(gen-locales) $(objpfx)tst-grouping3.out: $(gen-locales) +$(objpfx)tst-scanf-to_inpunct.out: $(gen-locales) endif tst-printf-bz18872-ENV = MALLOC_TRACE=$(objpfx)tst-printf-bz18872.mtrace \ diff --git a/stdio-common/tst-scanf-to_inpunct.c b/stdio-common/tst-scanf-to_inpunct.c new file mode 100644 index 0000000000..32236ac2dc --- /dev/null +++ b/stdio-common/tst-scanf-to_inpunct.c @@ -0,0 +1,78 @@ +/* Test scanf for languages with mapping pairs of alternate digits and + separators. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include + +/* fa_IR defines to_inpunct for numbers. */ +static const struct +{ + int n; + const char *str; +} inputs[] = +{ + { 1, "\xdb\xb1" }, + { 2, "\xdb\xb2" }, + { 3, "\xdb\xb3" }, + { 4, "\xdb\xb4" }, + { 5, "\xdb\xb5" }, + { 6, "\xdb\xb6" }, + { 7, "\xdb\xb7" }, + { 8, "\xdb\xb8" }, + { 9, "\xdb\xb9" }, + { 10, "\xdb\xb1\xdb\xb0" }, + { 11, "\xdb\xb1\xdb\xb1" }, + { 12, "\xdb\xb1\xdb\xb2" }, + { 13, "\xdb\xb1\xdb\xb3" }, + { 14, "\xdb\xb1\xdb\xb4" }, + { 15, "\xdb\xb1\xdb\xb5" }, + { 16, "\xdb\xb1\xdb\xb6" }, + { 17, "\xdb\xb1\xdb\xb7" }, + { 18, "\xdb\xb1\xdb\xb8" }, + { 19, "\xdb\xb1\xdb\xb9" }, + { 20, "\xdb\xb2\xdb\xb0" }, + { 30, "\xdb\xb3\xdb\xb0" }, + { 40, "\xdb\xb4\xdb\xb0" }, + { 50, "\xdb\xb5\xdb\xb0" }, + { 60, "\xdb\xb6\xdb\xb0" }, + { 70, "\xdb\xb7\xdb\xb0" }, + { 80, "\xdb\xb8\xdb\xb0" }, + { 90, "\xdb\xb9\xdb\xb0" }, + { 100, "\xdb\xb1\xdb\xb0\xdb\xb0" }, + { 1000, "\xdb\xb1\xdb\xb0\xdb\xb0\xdb\xb0" }, +}; + +static int +do_test (void) +{ + xsetlocale (LC_ALL, "fa_IR.UTF-8"); + + for (int i = 0; i < array_length (inputs); i++) + { + int n; + sscanf (inputs[i].str, "%Id", &n); + TEST_COMPARE (n, inputs[i].n); + } + + return 0; +} + +#include diff --git a/stdio-common/vfscanf-internal.c b/stdio-common/vfscanf-internal.c index bfb9baa21a..bf15e3aecf 100644 --- a/stdio-common/vfscanf-internal.c +++ b/stdio-common/vfscanf-internal.c @@ -1455,13 +1455,14 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, int from_level; int to_level; int level; + enum { num_digits_len = 10 }; #ifdef COMPILE_WSCANF - const wchar_t *wcdigits[10]; - const wchar_t *wcdigits_extended[10]; + const wchar_t *wcdigits[num_digits_len]; #else - const char *mbdigits[10]; - const char *mbdigits_extended[10]; + const char *mbdigits[num_digits_len]; #endif + CHAR_T *digits_extended[num_digits_len] = { NULL }; + /* "to_inpunct" is a map from ASCII digits to their equivalent in locale. This is defined for locales which use an extra digits set. */ @@ -1482,18 +1483,24 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, /* Adding new level for extra digits set in locale file. */ ++to_level; - for (n = 0; n < 10; ++n) + for (n = 0; n < num_digits_len; ++n) { #ifdef COMPILE_WSCANF wcdigits[n] = (const wchar_t *) _NL_CURRENT (LC_CTYPE, _NL_CTYPE_INDIGITS0_WC + n); wchar_t *wc_extended = (wchar_t *) - alloca ((to_level + 2) * sizeof (wchar_t)); + malloc ((to_level + 2) * sizeof (wchar_t)); + if (wc_extended == NULL) + { + __set_errno (ENOMEM); + done = EOF; + goto digits_extended_fail; + } __wmemcpy (wc_extended, wcdigits[n], to_level); wc_extended[to_level] = __towctrans (L'0' + n, map); wc_extended[to_level + 1] = '\0'; - wcdigits_extended[n] = wc_extended; + digits_extended[n] = wc_extended; #else mbdigits[n] = curctype->values[_NL_CTYPE_INDIGITS0_MB + n].string; @@ -1524,14 +1531,19 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, size_t mbdigits_len = last_char - mbdigits[n]; /* Allocate memory for extended multibyte digit. */ - char *mb_extended; - mb_extended = (char *) alloca (mbdigits_len + mblen + 1); + char *mb_extended = malloc (mbdigits_len + mblen + 1); + if (mb_extended == NULL) + { + __set_errno (ENOMEM); + done = EOF; + goto digits_extended_fail; + } /* And get the mbdigits + extra_digit string. */ *(char *) __mempcpy (__mempcpy (mb_extended, mbdigits[n], mbdigits_len), extra_mbdigit, mblen) = '\0'; - mbdigits_extended[n] = mb_extended; + digits_extended[n] = mb_extended; #endif } } @@ -1541,7 +1553,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, { /* In this round we get the pointer to the digit strings and also perform the first round of comparisons. */ - for (n = 0; n < 10; ++n) + for (n = 0; n < num_digits_len; ++n) { /* Get the string for the digits with value N. */ #ifdef COMPILE_WSCANF @@ -1553,7 +1565,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, DIAG_IGNORE_NEEDS_COMMENT (4.7, "-Wmaybe-uninitialized"); if (__glibc_unlikely (map != NULL)) - wcdigits[n] = wcdigits_extended[n]; + wcdigits[n] = digits_extended[n]; else wcdigits[n] = (const wchar_t *) _NL_CURRENT (LC_CTYPE, _NL_CTYPE_INDIGITS0_WC + n); @@ -1574,7 +1586,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, int avail = width > 0 ? width : INT_MAX; if (__glibc_unlikely (map != NULL)) - mbdigits[n] = mbdigits_extended[n]; + mbdigits[n] = digits_extended[n]; else mbdigits[n] = curctype->values[_NL_CTYPE_INDIGITS0_MB + n].string; @@ -1617,13 +1629,13 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, #endif } - if (n == 10) + if (n == num_digits_len) { /* Have not yet found the digit. */ for (level = from_level + 1; level <= to_level; ++level) { /* Search all ten digits of this level. */ - for (n = 0; n < 10; ++n) + for (n = 0; n < num_digits_len; ++n) { #ifdef COMPILE_WSCANF if (c == (wint_t) *wcdigits[n]) @@ -1679,7 +1691,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, } } - if (n < 10) + if (n < num_digits_len) c = L_('0') + n; else if (flags & GROUP) { @@ -1708,7 +1720,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, { __set_errno (ENOMEM); done = EOF; - goto errout; + break; } if (*cmpp != '\0') @@ -1742,6 +1754,13 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, c = inchar (); } + +digits_extended_fail: + for (n = 0; n < num_digits_len; n++) + free (digits_extended[n]); + + if (done == EOF) + goto errout; } else /* Read the number into workspace. */ diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile index 22192985e1..ed2660a524 100644 --- a/wcsmbs/Makefile +++ b/wcsmbs/Makefile @@ -175,6 +175,7 @@ tests := \ tst-wscanf-binary-c2x \ tst-wscanf-binary-gnu11 \ tst-wscanf-binary-gnu89 \ + tst-wscanf-to_inpunct \ wcsatcliff \ wcsmbs-tst1 \ # tests @@ -186,6 +187,7 @@ LOCALES := \ de_DE.ISO-8859-1 \ de_DE.UTF-8 \ en_US.ANSI_X3.4-1968 \ + fa_IR.UTF-8 \ hr_HR.ISO-8859-2 \ ja_JP.EUC-JP \ tr_TR.ISO-8859-9 \ @@ -207,6 +209,7 @@ $(objpfx)tst-c16-surrogate.out: $(gen-locales) $(objpfx)tst-c32-state.out: $(gen-locales) $(objpfx)test-c8rtomb.out: $(gen-locales) $(objpfx)test-mbrtoc8.out: $(gen-locales) +$(objpfx)tst-wscanf-to_inpunct.out: $(gen-locales) endif $(objpfx)tst-wcstod-round: $(libm) diff --git a/wcsmbs/tst-wscanf-to_inpunct.c b/wcsmbs/tst-wscanf-to_inpunct.c new file mode 100644 index 0000000000..72f2a1a422 --- /dev/null +++ b/wcsmbs/tst-wscanf-to_inpunct.c @@ -0,0 +1,79 @@ +/* Test scanf for languages with mapping pairs of alternate digits and + separators. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include + +/* fa_IR defines to_inpunct for numbers. */ +static const struct input_t +{ + int n; + const wchar_t str[5]; +} inputs[] = +{ + { 1, { 0x000006f1, L'\0' } }, + { 2, { 0x000006f2, L'\0' } }, + { 3, { 0x000006f3, L'\0' } }, + { 4, { 0x000006f4, L'\0' } }, + { 5, { 0x000006f5, L'\0' } }, + { 6, { 0x000006f6, L'\0' } }, + { 7, { 0x000006f7, L'\0' } }, + { 8, { 0x000006f8, L'\0' } }, + { 9, { 0x000006f9, L'\0' } }, + { 10, { 0x000006f1, 0x000006f0, L'\0' } }, + { 11, { 0x000006f1, 0x000006f1, L'\0' } }, + { 12, { 0x000006f1, 0x000006f2, L'\0' } }, + { 13, { 0x000006f1, 0x000006f3, L'\0' } }, + { 14, { 0x000006f1, 0x000006f4, L'\0' } }, + { 15, { 0x000006f1, 0x000006f5, L'\0' } }, + { 16, { 0x000006f1, 0x000006f6, L'\0' } }, + { 17, { 0x000006f1, 0x000006f7, L'\0' } }, + { 18, { 0x000006f1, 0x000006f8, L'\0' } }, + { 19, { 0x000006f1, 0x000006f9, L'\0' } }, + { 20, { 0x000006f2, 0x000006f0, L'\0' } }, + { 30, { 0x000006f3, 0x000006f0, L'\0' } }, + { 40, { 0x000006f4, 0x000006f0, L'\0' } }, + { 50, { 0x000006f5, 0x000006f0, L'\0' } }, + { 60, { 0x000006f6, 0x000006f0, L'\0' } }, + { 70, { 0x000006f7, 0x000006f0, L'\0' } }, + { 80, { 0x000006f8, 0x000006f0, L'\0' } }, + { 90, { 0x000006f9, 0x000006f0, L'\0' } }, + { 100, { 0x000006f1, 0x000006f0, 0x000006f0, L'\0' } }, + { 1000, { 0x000006f1, 0x000006f0, 0x000006f0, 0x000006f0, L'\0' } }, +}; + +static int +do_test (void) +{ + xsetlocale (LC_ALL, "fa_IR.UTF-8"); + + for (int i = 0; i < array_length (inputs); i++) + { + int n; + swscanf (inputs[i].str, L"%Id", &n); + TEST_COMPARE (n, inputs[i].n); + } + + return 0; +} + +#include