From patchwork Sun Aug 29 11:34:47 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Basile Starynkevitch X-Patchwork-Id: 62942 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 57DBBB711B for ; Sun, 29 Aug 2010 21:35:11 +1000 (EST) Received: (qmail 3612 invoked by alias); 29 Aug 2010 11:35:07 -0000 Received: (qmail 3600 invoked by uid 22791); 29 Aug 2010 11:35:04 -0000 X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE, TW_BJ, TW_DB, TW_EG, TW_GT X-Spam-Check-By: sourceware.org Received: from smtp-100-sunday.nerim.net (HELO kraid.nerim.net) (62.4.16.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 29 Aug 2010 11:34:56 +0000 Received: from hector.lesours (ours.starynkevitch.net [213.41.244.95]) by kraid.nerim.net (Postfix) with ESMTP id A3DD1CF162; Sun, 29 Aug 2010 13:34:52 +0200 (CEST) Received: from glinka.lesours ([192.168.0.1]) by hector.lesours with esmtp (Exim 4.72) (envelope-from ) id 1OpgA0-0008JA-Ea; Sun, 29 Aug 2010 13:34:52 +0200 Subject: Re: gengtype improvements for plugins. patch 4/N [files_rules] From: Basile Starynkevitch Reply-To: basile@starynkevitch.net To: jeremie.salvucci@free.fr Cc: gcc-patches@gcc.gnu.org In-Reply-To: <1283077418.3067.79.camel@glinka> References: <1283012591.3067.17.camel@glinka> <20100828170603.GA1108@gmx.de> <1283016347.3067.43.camel@glinka> <1283062592.3067.50.camel@glinka> <1283063995.3067.63.camel@glinka> <1283077287.3067.78.camel@glinka> <1283077418.3067.79.camel@glinka> Date: Sun, 29 Aug 2010 13:34:47 +0200 Message-ID: <1283081687.3067.86.camel@glinka> Mime-Version: 1.0 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org See http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02058.html & http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02060.html & http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02063.html for the previous pieces of the patch. The fourth piece from our patch is improving much the core get_output_file_with_visibility function. It is made much more modular and less ad hoc by a file rule machinery. We now use regular expressions to match the input file name and compute the associated output name & for name. We feel such an approach is much cleaner, and easier to understand and to extend. Here is a comment (contained in the patch) explaining the details. /** Regexpr machinery to compute the output_name and for_name-s of each input_file. We have a sequence of file rules which gives the POSIX extended regular expression to match an input file path, and two transformed strings for the corresponding output_name and the corresponding for_name. The transformed string contain dollars: $0 is replaced by the entire match, $1 is replaced by the substring matching the first parenthesis in the regexp, etc. And $$ is replaced by a single verbatim dollar. The rule order is important. The general case is last, and the particular cases should be first. An action routine can, when needed, update the out_name & for_name and return the appropriate output file. */ Attached is relpatch04to03-filerules.diff the patch relative to previous patches, relpatch04to03-filerules.ChangeLog its gcc/ChangeLog entry, and for convenience the cumulated patches all-patches-r163612-up-to-04.diff.gz to trunk. Ok for trunk? --- ../gengtype-gcc-03/gengtype.c 2010-08-29 11:25:37.000000000 +0200 +++ gcc/gengtype.c 2010-08-29 13:20:46.000000000 +0200 @@ -25,6 +25,8 @@ #include "double-int.h" #include "hashtab.h" #include "version.h" /* for version_string & pkgversion_string */ +#include "xregex.h" +#include "obstack.h" #include "gengtype.h" /* Data types, macros, etc. used only in this file. */ @@ -1725,6 +1727,214 @@ get_file_gtfilename (const input_file *i return result; } +/*** + Regexpr machinery to compute the output_name and for_name-s of each + input_file. We have a sequence of file rules which gives the POSIX + extended regular expression to match an input file path, and two + transformed strings for the corresponding output_name and the + corresponding for_name. The transformed string contain dollars: $0 + is replaced by the entire match, $1 is replaced by the substring + matching the first parenthesis in the regexp, etc. And $$ is replaced + by a single verbatim dollar. The rule order is important. The + general case is last, and the particular cases should be first. + + An action routine can, when needed, update the out_name & for_name + and return the appropriate output file. + */ + +typedef outf_p (frul_actionrout_t)(input_file*, char**poutname, char**pforname); + +struct file_rule_st { + const char* frul_srcexpr; /* source string for regular expression */ + int frul_rflags; /* flags for regcomp(3), usually + * REG_EXTENDED */ + regex_t* frul_re; /* compiled regular expression */ + const char* frul_tr_out; /* transform string for making the + * output_name, with $1 ... $9 for + * subpatterns and $0 for the whole + * matched filename */ + const char* frul_tr_for; /* tranform string for for_name */ + /* the action, if non null, is called once the rule matches, on + * the transformed out_name & for_name. It could change them and + * give the output file. */ + frul_actionrout_t* frul_action; +}; + +/* Action handling *.h files */ +static outf_p header_frul (input_file*, char**, char**); + +/* Action handling *.c files */ +static outf_p implem_frul (input_file*, char**, char**); + + +#define NULL_REGEX (regex_t*)0 +#define NULL_FRULACT (frul_actionrout_t*)0 + +/* The array of our rules governing file name generation. Order + matters! Change it with care! */ + +struct file_rule_st files_rules[] = { + + /* the c-family/ source directory is special */ + { "^(([^/]*/)*)c-family/([[:alnum:]_-]*)\\.c$", + REG_EXTENDED, NULL_REGEX, + "gt-c-family-$3.h", "c-family/$3.c", NULL_FRULACT}, + + { "^(([^/]*/)*)c-family/([[:alnum:]_-]*)\\.h$", + REG_EXTENDED, NULL_REGEX, + "gt-c-family-$3.h", "c-family/$3.h", NULL_FRULACT}, + + /* Both c-lang.h & c-tree.h gives gt-c-decl.h for c-decl.c ! */ + { "^(([^/]*/)*)c-lang\\.h$", + REG_EXTENDED, NULL_REGEX, "gt-c-decl.h", "c-decl.c", NULL_FRULACT}, + + { "^(([^/]*/)*)c-tree\\.h$", + REG_EXTENDED, NULL_REGEX, "gt-c-decl.h", "c-decl.c", NULL_FRULACT}, + + /* cp/cp-tree.h gives gt-cp-tree.h for cp/tree.c ! */ + { "^(([^/]*/)*)cp/cp-tree\\.h$", + REG_EXTENDED, NULL_REGEX, + "gt-cp-tree.h", "cp/tree.c", NULL_FRULACT }, + + /* cp/decl.h & cp/decl.c gives gt-cp-decl.h for cp/decl.c ! */ + { "^(([^/]*/)*)cp/decl\\.[ch]$", + REG_EXTENDED, NULL_REGEX, + "gt-cp-decl.h", "cp/decl.c", NULL_FRULACT }, + + /* cp/name-lookup.h gives gt-cp-name-lookup.h for cp/name-lookup.c ! */ + { "^(([^/]*/)*)cp/name-lookup\\.h$", + REG_EXTENDED, NULL_REGEX, + "gt-cp-name-lookup.h", "cp/name-lookup.c", NULL_FRULACT }, + + /* objc/objc-act.h fives gt-objc-objc-act.h for objc/objc-act.c ! */ + { "^(([^/]*/)*)objc/objc-act\\.h$", + REG_EXTENDED, NULL_REGEX, + "gt-objc-objc-act.h", "objc/objc-act.c", NULL_FRULACT }, + + /* General cases. For header & implementation files, we need a + * special action to handle the language. */ + { "^(([^/]*/)*)([[:alnum:]_-]*)\\.c$", + REG_EXTENDED, NULL_REGEX, "gt-$3.h", "$3.c", implem_frul}, + { "^(([^/]*/)*)([[:alnum:]_-]*)\\.h$", + REG_EXTENDED, NULL_REGEX, "gt-$3.h", "$3.h", header_frul}, + { "^(([^/]*/)*)([[:alnum:]_-]*)\\.in$", + REG_EXTENDED, NULL_REGEX, "gt-$3.h", "$3.in", NULL_FRULACT}, + + /* In the future, we need to add a case for C++ sources. */ + + /* null for end of rules */ + {NULL, 0, NULL_REGEX, NULL, NULL, NULL_FRULACT} +}; + + +/* Special file rules action for handling header files. */ +static outf_p +header_frul(input_file* inpf, char**poutname, char**pforname) +{ + const char *basename = 0; + int lang_index = 0; + const char* inpname = input_file_name (inpf); + dbgprintf ("inpf %p inpname %s outname %s forname %s", (void*) inpf, inpname, *poutname, *pforname); + basename = get_file_basename (inpf); + lang_index = get_prefix_langdir_index (basename); + dbgprintf ("basename %s lang_index %d", basename, lang_index); + + if (lang_index >= 0) + return base_files[lang_index]; + else { + /* TODO: free the old outname */ + *poutname = CONST_CAST (char*, "gtype-desc.c"); + *pforname = NULL; + dbgprintf("special gtype-desc.c for inpname %s", inpname); + return NULL; + } +} + +/* Special file rules action for handling implementation files, + * notably taking care of the language. */ + +static outf_p +implem_frul (input_file* inpf, char**poutname, char**pforname) +{ + char *newbasename = NULL; + char* newoutname = NULL; + const char* inpname = input_file_name (inpf); + dbgprintf ("inpf %p inpname %s oriiginal outname %s forname %s", + (void*) inpf, inpname, *poutname, *pforname); + newoutname = CONST_CAST (char*, get_file_gtfilename (inpf)); + dbgprintf ("newoutname %s", newoutname); + newbasename = CONST_CAST (char*, get_file_basename (inpf)); + dbgprintf ("newbasename %s", newbasename); + /* TODO: free the old outname & forname */ + *poutname = newoutname; + *pforname = newbasename; + return NULL; +} + + +/* utility function which returns NULL on regexpr mismatch, or the + * malloc-ed substituted string using TRS on matching of the FIL input + * file against the REX regexp. */ +static char* +input_file_substitute (const input_file *fil, const regex_t* rex, + const char* trs, int rflags) +{ + regmatch_t pmatch[10]; + int notmatched = 0; + struct obstack str_obstack; + char* str = NULL; + const char* filnam = input_file_name (fil); + memset (&pmatch, 0, sizeof(pmatch)); + notmatched = regexec (rex, filnam, 10, pmatch, rflags); + dbgprintf ("filnam %s", filnam); + if (!notmatched) + { + char* rawstr = NULL; + const char* pt = NULL; + obstack_init (&str_obstack); + for (pt = trs; *pt; pt++) { + char c = *pt; + if (c == '$') { + if (pt[1] == '$') + { + /* A double dollar $$ is substituted by a single verbatim + dollar, but who really uses dollar signs in file + paths? */ + obstack_1grow (&str_obstack, '$'); + } + else if (ISDIGIT(pt[1])) + { + /* Handle $0 $1 .. $9 by appropriate substitution. */ + int dolnum = pt[1] - '0'; + int so = pmatch[dolnum].rm_so; + int eo = pmatch[dolnum].rm_eo; + dbgprintf ("so=%d eo=%d dolnum=%d", so, eo, dolnum); + if (so>=0 && eo>=so) + obstack_grow (&str_obstack, filnam + so, eo - so); + } + else + /* This can happen only when files_rules is buggy! */ + fatal ("invalid dollar in transform string %s", trs); + /* Always skip the character after the dollar. */ + pt++; + } + else + obstack_1grow (&str_obstack, c); + } + /* add the terminating null */ + obstack_1grow (&str_obstack, (char) 0); + rawstr = XOBFINISH (&str_obstack, char *); + str = xstrdup (rawstr); + obstack_free (&str_obstack, rawstr); + dbgprintf ("matched replacement %s", str); + rawstr = NULL; + return str; + } + else + dbgprintf ("non-matched filename %s", filnam); + return NULL; +} + /* An output file, suitable for definitions, that can see declarations made in INPF and is linked into every language that uses INPF. */ @@ -1733,10 +1943,8 @@ outf_p get_output_file_with_visibility (input_file *inpf) { outf_p r; - size_t len; - const char *basename; - const char *for_name; - const char *output_name; + const char *for_name = NULL; + const char *output_name = NULL; /* This can happen when we need a file with visibility on a structure that we've never seen. We have to just hope that it's @@ -1763,64 +1971,93 @@ get_output_file_with_visibility (input_f if (inpf->inpoutf != NULL) return inpf->inpoutf; - /* Determine the output file name. */ - basename = get_file_basename (inpf); - len = strlen (basename); - if ((len > 2 && memcmp (basename+len-2, ".c", 2) == 0) - || (len > 2 && memcmp (basename+len-2, ".y", 2) == 0) - || (len > 3 && memcmp (basename+len-3, ".in", 3) == 0)) - { - output_name = get_file_gtfilename (inpf); - for_name = basename; - } - /* Some headers get used by more than one front-end; hence, it - would be inappropriate to spew them out to a single gtype-.h - (and gengtype doesn't know how to direct spewage into multiple - gtype-.h headers at this time). Instead, we pair up these - headers with source files (and their special purpose gt-*.h headers). */ - else if (strncmp (basename, "c-family", 8) == 0 - && IS_DIR_SEPARATOR (basename[8]) - && strcmp (basename + 9, "c-common.h") == 0) - output_name = "gt-c-family-c-common.h", for_name = "c-family/c-common.c"; - else if (strcmp (basename, "c-lang.h") == 0) - output_name = "gt-c-decl.h", for_name = "c-decl.c"; - else if (strcmp (basename, "c-tree.h") == 0) - output_name = "gt-c-decl.h", for_name = "c-decl.c"; - else if (strncmp (basename, "cp", 2) == 0 && IS_DIR_SEPARATOR (basename[2]) - && strcmp (basename + 3, "cp-tree.h") == 0) - output_name = "gt-cp-tree.h", for_name = "cp/tree.c"; - else if (strncmp (basename, "cp", 2) == 0 && IS_DIR_SEPARATOR (basename[2]) - && strcmp (basename + 3, "decl.h") == 0) - output_name = "gt-cp-decl.h", for_name = "cp/decl.c"; - else if (strncmp (basename, "cp", 2) == 0 && IS_DIR_SEPARATOR (basename[2]) - && strcmp (basename + 3, "name-lookup.h") == 0) - output_name = "gt-cp-name-lookup.h", for_name = "cp/name-lookup.c"; - else if (strncmp (basename, "objc", 4) == 0 && IS_DIR_SEPARATOR (basename[4]) - && strcmp (basename + 5, "objc-act.h") == 0) - output_name = "gt-objc-objc-act.h", for_name = "objc/objc-act.c"; - else + /* Use our file_rules machinery! */ { - int lang_index = get_prefix_langdir_index (basename); - - if (lang_index >= 0) { - inpf->inpoutf = base_files[lang_index]; - return base_files[lang_index]; + int rulix = 0; + for (; files_rules[rulix].frul_srcexpr != NULL; rulix++) + { + char* outs = NULL; + char* fors = NULL; + dbgprintf("rulix#%d srcexpr %s", + rulix, files_rules[rulix].frul_srcexpr); + if (!files_rules[rulix].frul_re) + { + /* We lazily compile the regexpr only once. */ + int err = 0; + files_rules[rulix].frul_re = XCNEW(regex_t); + err = regcomp (files_rules[rulix].frul_re, + files_rules[rulix].frul_srcexpr, + files_rules[rulix].frul_rflags); + if (err) { + /* The regular expression compilation fails only when + file_rules is buggy. We give a possibly truncated + error message in this impossible case. */ + char errbuf[80]; + memset(errbuf, 0, sizeof(errbuf)); + regerror (err, files_rules[rulix].frul_re, + errbuf, sizeof(errbuf)-1); + fatal("file rule regexpr error %s", errbuf); + } + }; + outs = input_file_substitute (inpf, files_rules[rulix].frul_re, + files_rules[rulix].frul_tr_out, 0); + if (!outs) + continue; + + fors = input_file_substitute (inpf, files_rules[rulix].frul_re, + files_rules[rulix].frul_tr_for, 0); + dbgprintf("rulix#%d outs %s fors %s", + rulix, outs, fors); + if (outs && fors) { + dbgprintf ("raw outs %s fors %s", outs, fors); + output_name = outs; + for_name = fors; + if (files_rules[rulix].frul_action) { + /* Invoke our action routine. */ + outf_p of = NULL; + dbgprintf("before action rulix %d outs %s fors %s", + rulix, outs, fors); + of = + (files_rules[rulix].frul_action) (inpf, + &outs, &fors); + output_name = outs; + for_name = fors; + dbgprintf("after action rulix %d of=%p output_name %s for_name %s", + rulix, (void*)of, output_name, for_name); + /* If the action routine returned something, give it back + immediately. */ + if (of) { + inpf->inpoutf = of; + return of; + } + }; + /* The rule matched, and had no action, or that action did + not return any output file but could have changed the + output_name or for_name. We continue out of the loop. */ + break; + } } - - output_name = "gtype-desc.c"; - for_name = NULL; } - /* Look through to see if we've ever seen this output filename before. */ + dbgprintf ("usual case output_name %s for_name %s", output_name, for_name); + + /* Look through to see if we've ever seen this output filename + before. */ for (r = output_files; r; r = r->next) if (strcmp (r->name, output_name) == 0) + { + dbgprintf("found r @ %p %s", (void*)r, r->name); + inpf->inpoutf = r; return r; + } - /* If not, create it. */ + /* If not, create it, and cache it in the input file. */ r = create_file (for_name, output_name); gcc_assert (r && r->name); + dbgprintf("created r %s", r->name); + inpf->inpoutf = r; return r; }