From patchwork Tue Jun 18 19:07:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 1949443 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=VXDip0cP; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W3bqz4DxFz20KL for ; Wed, 19 Jun 2024 05:08:15 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4F8CA388302D for ; Tue, 18 Jun 2024 19:08:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id BA56E3882AF8 for ; Tue, 18 Jun 2024 19:07:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BA56E3882AF8 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BA56E3882AF8 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718737673; cv=none; b=gsFREj83PvjGV637QwdTeOcVdLV4WGVsHiR9W3OrX7wPgh5cNGplY/vD85yt2VGtZnEOsj590mRWhGES8tzrr+HzYL6X5qfbfBx7rvVFm402D1E18tKBW4lYCowIrBENL7uY1cTAqXJXSqyGaOH3S6uHnOWHYkDz8elQMvf0lO4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718737673; c=relaxed/simple; bh=7r9Nk1wDlpF+H3plwPblUvR6tmaXEwdnvDPYh7Z1agI=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=vxwZr5CrLL/C2j/+eEPT7dIgMdKQ6zkXHbBng+b+OD1DvpLNCy7uHp/7JGRw3hiRiz9GxRQhI5fwg99abQdHX8EJF9pQQ1UJ9mLU5KmhPyL10HckDfjBm96k9Mvnm3oHpL9Ygg/L4sr1BQdmDNk+i+TVID0n18hCChnOgXNc43k= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718737670; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=ApJfUekfbWMg4iV6RmIl8wfEQG+siozQgRL7tXVGHww=; b=VXDip0cPZjru8lAP1fTmgWBJseiVHiMIOc5vRT+owUMA80sS5BM+aW29+6fVckvaMIy2nd Yir+6GCUbCRrvQSnOOVdQsbzRS1qylDT163+SHo1wkEnSH7HnHKxOfU6iDZk0HD6tmdh8C xSRzg1044g0CIW0/Urn9jkEtFf+RiiI= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-124-xV04ceOKPlm5uprbL5v5Ug-1; Tue, 18 Jun 2024 15:07:48 -0400 X-MC-Unique: xV04ceOKPlm5uprbL5v5Ug-1 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EDA1F1955F30 for ; Tue, 18 Jun 2024 19:07:46 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.45.224.7]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 18F0A19560B2; Tue, 18 Jun 2024 19:07:45 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 45IJ7hjh278469 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 18 Jun 2024 21:07:43 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 45IJ7gWD278468; Tue, 18 Jun 2024 21:07:42 +0200 Date: Tue, 18 Jun 2024 21:07:42 +0200 From: Jakub Jelinek To: "Joseph S. Myers" , Marek Polacek Cc: gcc-patches@gcc.gnu.org, Jason Merrill Subject: [PATCH] libcpp: Add support for gnu::offset #embed/__has_embed parameter Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, RCVD_IN_SBL_CSS, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi! The following patch adds on top of the just posted #embed patch a first extension, gnu::offset which allows to seek in the data file (for seekable files, otherwise read and throw away). I think this is useful e.g. when some binary data start with some well known header which shouldn't be included in the data etc. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-06-18 Jakub Jelinek libcpp/ * internal.h (struct cpp_embed_params): Add offset member. * directives.cc (_cpp_parse_embed_params): Parse gnu::offset parameter. * files.cc (struct _cpp_file): Add offset member. (_cpp_stack_embed): Handle params->offset. gcc/ * doc/cpp.texi (Binary Resource Inclusion): Document gnu::offset #embed parameter. gcc/testsuite/ * c-c++-common/cpp/embed-15.c: New test. * c-c++-common/cpp/embed-16.c: New test. * gcc.dg/cpp/embed-5.c: New test. Jakub --- libcpp/internal.h.jj 2024-06-18 08:37:55.759488622 +0200 +++ libcpp/internal.h 2024-06-18 08:38:39.355915868 +0200 @@ -630,7 +630,7 @@ struct cpp_embed_params { location_t loc; bool has_embed; - cpp_num_part limit; + cpp_num_part limit, offset; cpp_embed_params_tokens prefix, suffix, if_empty; }; --- libcpp/directives.cc.jj 2024-06-18 08:37:55.767488517 +0200 +++ libcpp/directives.cc 2024-06-18 11:32:05.073768407 +0200 @@ -1102,6 +1102,21 @@ _cpp_parse_embed_params (cpp_reader *pfi break; } } + else if (param_prefix_len == 3 && memcmp (param_prefix, "gnu", 3) == 0) + { + struct { int len; const char *name; } gnu_params[] = { + { 6, "offset" } + }; + for (size_t i = 0; + i < sizeof (gnu_params) / sizeof (gnu_params[0]); ++i) + if (param_name_len == gnu_params[i].len + && memcmp (param_name, gnu_params[i].name, + param_name_len) == 0) + { + param_kind = 4 + i; + break; + } + } if (param_kind != (size_t) -1) { if ((seen & (1 << param_kind)) == 0) @@ -1130,11 +1145,21 @@ _cpp_parse_embed_params (cpp_reader *pfi if (param_kind != (size_t) -1 && token->type != CPP_OPEN_PAREN) cpp_error_with_line (pfile, CPP_DL_ERROR, loc, 0, "expected '('"); - else if (param_kind == 0) + else if (param_kind == 0 || param_kind == 4) { if (params->has_embed && pfile->op_stack == NULL) _cpp_expand_op_stack (pfile); - params->limit = _cpp_parse_expr (pfile, "#embed", token); + cpp_num_part res = _cpp_parse_expr (pfile, "#embed", token); + if (param_kind == 0) + params->limit = res; + else + { + if (res > INTTYPE_MAXIMUM (off_t)) + cpp_error_with_line (pfile, CPP_DL_ERROR, loc, 0, + "too large 'gnu::offset' argument"); + else + params->offset = res; + } token = _cpp_get_token_no_padding (pfile); } else if (token->type == CPP_OPEN_PAREN) --- libcpp/files.cc.jj 2024-06-18 11:24:24.598781748 +0200 +++ libcpp/files.cc 2024-06-18 11:31:42.238066309 +0200 @@ -90,6 +90,9 @@ struct _cpp_file /* Size for #embed, perhaps smaller than st.st_size. */ size_t limit; + /* Offset for #embed. */ + off_t offset; + /* File descriptor. Invalid if -1, otherwise open. */ int fd; @@ -1243,8 +1246,11 @@ _cpp_stack_embed (cpp_reader *pfile, con _cpp_file *orig_file = file; if (file->buffer_valid && (!S_ISREG (file->st.st_mode) - || (file->limit < file->st.st_size + (size_t) 0 - && file->limit < params->limit))) + || file->offset + (cpp_num_part) 0 > params->offset + || (file->limit < file->st.st_size - file->offset + (size_t) 0 + && (params->offset - file->offset > (cpp_num_part) file->limit + || file->limit - (params->offset + - file->offset) < params->limit)))) { bool found = false; if (S_ISREG (file->st.st_mode)) @@ -1257,8 +1263,13 @@ _cpp_stack_embed (cpp_reader *pfile, con && strcmp (file->path, file->next_file->path) == 0) { file = file->next_file; - if (file->limit >= file->st.st_size + (size_t) 0 - || file->limit >= params->limit) + if (file->offset + (cpp_num_part) 0 <= params->offset + && (file->limit >= (file->st.st_size - file->offset + + (size_t) 0) + || (params->offset + - file->offset <= (cpp_num_part) file->limit + && file->limit - (params->offset + - file->offset) >= params->limit))) { found = true; break; @@ -1314,8 +1325,10 @@ _cpp_stack_embed (cpp_reader *pfile, con if (regular) { cpp_num_part limit; - if (file->st.st_size + (cpp_num_part) 0 < params->limit) - limit = file->st.st_size; + if (file->st.st_size + (cpp_num_part) 0 < params->offset) + limit = 0; + else if (file->st.st_size - params->offset < params->limit) + limit = file->st.st_size - params->offset; else limit = params->limit; if (params->has_embed) @@ -1326,6 +1339,14 @@ _cpp_stack_embed (cpp_reader *pfile, con "%s is too large", file->path); goto fail; } + if (lseek (file->fd, params->offset, SEEK_CUR) + != (off_t) params->offset) + { + cpp_errno_filename (pfile, CPP_DL_ERROR, file->path, + params->loc); + goto fail; + } + file->offset = params->offset; file->limit = limit; size = limit; } @@ -1338,6 +1359,38 @@ _cpp_stack_embed (cpp_reader *pfile, con buf = XNEWVEC (uchar, size ? size : 1); total = 0; + if (!regular && params->offset) + { + uchar *buf2 = buf; + ssize_t size2 = size; + cpp_num_part total2 = params->offset; + + if (params->offset > 8 * 1024 && size < 8 * 1024) + { + size2 = 32 * 1024; + buf2 = XNEWVEC (uchar, size2); + } + do + { + if ((cpp_num_part) size2 > total2) + size2 = total2; + count = read (file->fd, buf2, size2); + if (count < 0) + { + cpp_errno_filename (pfile, CPP_DL_ERROR, file->path, + params->loc); + if (buf2 != buf) + free (buf2); + free (buf); + goto fail; + } + total2 -= count; + } + while (total2); + if (buf2 != buf) + free (buf2); + } + while ((count = read (file->fd, buf + total, size - total)) > 0) { total += count; @@ -1378,7 +1431,10 @@ _cpp_stack_embed (cpp_reader *pfile, con file->limit = total; } else if (!regular) - file->limit = total; + { + file->offset = params->offset; + file->limit = total; + } file->buffer_start = buf; file->buffer = buf; @@ -1387,9 +1443,22 @@ _cpp_stack_embed (cpp_reader *pfile, con file->fd = -1; } else if (params->has_embed) - return file->limit && params->limit ? 1 : 2; + { + if (params->offset - file->offset > file->limit) + return 2; + size_t limit = file->limit - (params->offset - file->offset); + return limit && params->limit ? 1 : 2; + } + const uchar *buffer = file->buffer; size_t limit = file->limit; + if (params->offset - file->offset > limit) + limit = 0; + else + { + buffer += params->offset - file->offset; + limit -= params->offset - file->offset; + } if (params->limit < limit) limit = params->limit; @@ -1413,20 +1482,20 @@ _cpp_stack_embed (cpp_reader *pfile, con size_t len = 0; for (size_t i = 0; i < limit; ++i) { - if (file->buffer[i] < 10) + if (buffer[i] < 10) len += 2; - else if (file->buffer[i] < 100) + else if (buffer[i] < 100) len += 3; #if UCHAR_MAX == 255 else len += 4; #else - else if (file->buffer[i] < 1000) + else if (buffer[i] < 1000) len += 4; else { char buf[64]; - len += sprintf (buf, "%d", file->buffer[i]) + 1; + len += sprintf (buf, "%d", buffer[i]) + 1; } #endif if (len > INTTYPE_MAXIMUM (ssize_t)) @@ -1480,7 +1549,7 @@ _cpp_stack_embed (cpp_reader *pfile, con if (i == 0) tok->flags |= PREV_WHITE; tok->val.str.text = s; - tok->val.str.len = sprintf ((char *) s, "%d", file->buffer[i]); + tok->val.str.len = sprintf ((char *) s, "%d", buffer[i]); s += tok->val.str.len + 1; if (tok == &pfile->directive_result) tok = toks; --- gcc/doc/cpp.texi.jj 2024-06-18 16:56:16.000000000 +0200 +++ gcc/doc/cpp.texi 2024-06-18 18:04:17.384265373 +0200 @@ -3966,8 +3966,8 @@ treated the same), followed by parameter with currently supported standard parameters @code{limit}, @code{prefix}, @code{suffix} and @code{if_empty}, or implementation defined parameters specified by a unique vendor prefix followed by @code{::} followed by -name of the parameter. GCC will use the @code{gnu} prefix but currently -doesn't support any extensions. +name of the parameter. GCC uses the @code{gnu} prefix for vendor +parameters and currently supports the @code{gnu::offset} parameter. The @code{limit} parameter argument is a constant expression which specifies the maximum number of bytes included by the directive, @@ -3977,6 +3977,10 @@ that sequence is not empty and @code{if_ sequence which is used as expansion for @code{#embed} directive if the resource is empty. +The @code{gnu::offset} parameter argument is a constant expression +which specifies how many bytes to skip from the start of the resource. +@code{limit} is then counted from that position. + The @code{#embed} directive is not supported in the Traditional Mode (@pxref{Traditional Mode}). --- gcc/testsuite/c-c++-common/cpp/embed-15.c.jj 2024-06-18 12:15:42.157715550 +0200 +++ gcc/testsuite/c-c++-common/cpp/embed-15.c 2024-06-18 12:49:07.726616236 +0200 @@ -0,0 +1,88 @@ +/* { dg-do run } */ +/* { dg-options "--embed-dir=${srcdir}/c-c++-common/cpp/embed-dir" } */ +/* { dg-additional-options "-std=gnu99" { target c } } */ + +#if __has_embed (__FILE__ gnu::offset (4 + FOOBAR) limit (3)) != __STDC_EMBED_FOUND__ +#error "__has_embed fail" +#endif + +#embed limit(1) gnu::offset (0) prefix(int a = ) suffix (;) +#embed limit(1) __gnu__::offset (1 * 1) prefix(int b = ) suffix (;) +#embed limit(1) gnu::__offset__ (1 + 1) prefix(int c = ) suffix (;) +#embed __limit__(1) __gnu__::__offset__ (1 + (1 \ + + 1)) __prefix__(int d = ) __suffix__ (;) +const unsigned char e[] = { + #embed limit(5) gnu::offset (999) +}; +const unsigned char f[] = { + #embed limit(7) gnu::offset (998) +}; +const unsigned char g[] = { + #embed limit(8) gnu::offset (998) +}; +const unsigned char h[] = { + #embed limit(8) gnu::offset (997) +}; +const unsigned char i[] = { + #embed limit(9) gnu::offset (997) +}; +const unsigned char j[] = { + #embed limit(30) gnu::offset (990) +}; +const unsigned char k[] = { + #embed limit(26) gnu::offset (992) +}; +const unsigned char l[] = { + #embed +}; +const unsigned char m[] = { + #embed __limit__ (1000) __gnu__::__offset__ (32) +}; +#if __has_embed ( limit(5) gnu::offset (999)) != __STDC_EMBED_FOUND__ \ + || __has_embed ( limit(5) gnu::offset (999)) != __STDC_EMBED_FOUND__ \ + || __has_embed ( limit(7) gnu::offset (998)) != __STDC_EMBED_FOUND__ \ + || __has_embed ( limit(8) gnu::offset (998)) != __STDC_EMBED_FOUND__ \ + || __has_embed ( limit(8) gnu::offset (997)) != __STDC_EMBED_FOUND__ \ + || __has_embed ( limit(9) gnu::offset (997)) != __STDC_EMBED_FOUND__ \ + || __has_embed ( limit(30) gnu::offset (990)) != __STDC_EMBED_FOUND__ \ + || __has_embed ( limit(26) gnu::offset (992)) != __STDC_EMBED_FOUND__ \ + || __has_embed () != __STDC_EMBED_FOUND__ \ + || __has_embed ( limit(26) gnu::offset (992)) != __STDC_EMBED_FOUND__ +#error "__has_embed fail" +#endif + +#ifdef __cplusplus +#define C "C" +#else +#define C +#endif +extern C void abort (void); +extern C int memcmp (const void *, const void *, __SIZE_TYPE__); + +int +main () +{ + if (a != 'H' || b != 'e' || c != 'n' || d != 'r') + abort (); + if (sizeof (e) != 5 + || sizeof (f) != 7 + || sizeof (g) != 8 + || sizeof (h) != 8 + || sizeof (i) != 9 + || sizeof (j) != 30 + || sizeof (k) != 26 + || sizeof (l) < 1032 + || sizeof (m) != 1000) + abort (); + if (memcmp (e, l + 999, 5) + || memcmp (f, l + 998, 7) + || memcmp (g, l + 998, 8) + || memcmp (h, l + 997, 8) + || memcmp (i, l + 997, 9) + || memcmp (j, l + 990, 30) + || memcmp (k, l + 992, 26) + || memcmp (m, l + 32, 1000)) + abort (); + if (l[0] != 'H' || l[1] != 'e' || l[2] != 'n' || l[3] != 'r') + abort (); +} --- gcc/testsuite/c-c++-common/cpp/embed-16.c.jj 2024-06-18 12:44:40.375102653 +0200 +++ gcc/testsuite/c-c++-common/cpp/embed-16.c 2024-06-18 13:02:11.043397739 +0200 @@ -0,0 +1,19 @@ +/* { dg-do preprocess } */ +/* { dg-options "" } */ + +#embed __FILE__ gnu::offset(1) gnu::offset(1) /* { dg-error "duplicate embed parameter 'gnu::offset'" } */ +#embed __FILE__ gnu::offset prefix() suffix() /* { dg-error "expected '\\\('" } */ +#embed __FILE__ gnu::offset (1 / 0) /* { dg-error "division by zero in #embed" } */ +#embed __FILE__ __gnu__::__offset__ (+ + +) /* { dg-error "operator '\\\+' has no right operand" } */ +#define FOO 1 +#embed __FILE__ gnu::offset(0 + defined(FOO)) /* { dg-error "'defined' in #embed parameter" } */ +#if 1 + __has_embed (__FILE__ gnu::offset(1) __gnu__::__offset__(1)) /* { dg-error "duplicate embed parameter 'gnu::offset'" } */ +#endif +#if 1 + __has_embed (__FILE__ __gnu__::__offset__ prefix() suffix()) /* { dg-error "expected '\\\('" } */ +#endif +#if 1 + __has_embed (__FILE__ gnu::offset(1/0)) /* { dg-error "division by zero in #embed" } */ +#endif +#if 1 + __has_embed (__FILE__ gnu::offset(+ + +)) /* { dg-error "operator '\\\+' has no right operand" } */ +#endif +#if 1 + __has_embed (__FILE__ gnu::offset(0 + defined(FOO))) /* { dg-error "'defined' in #embed parameter" } */ +#endif --- gcc/testsuite/gcc.dg/cpp/embed-5.c.jj 2024-06-18 12:48:49.208857727 +0200 +++ gcc/testsuite/gcc.dg/cpp/embed-5.c 2024-06-18 12:49:26.230374939 +0200 @@ -0,0 +1,4 @@ +/* { dg-do run } */ +/* { dg-options "-std=c23 -pedantic-errors --embed-dir=${srcdir}/c-c++-common/cpp/embed-dir" } */ + +#include "../../c-c++-common/cpp/embed-15.c"