Message ID | 20230908145908.915341-1-arthur.cohen@embecosm.com |
---|---|
State | New |
Headers | show |
Series | [v3] libcpp: add function to check XID properties | expand |
Ping? Best, Arthur On 9/8/23 16:59, Arthur Cohen wrote: > From: Raiki Tamura <tamaron1203@gmail.com> > > Fixed to include the enum's name which I had forgotten to commit. > > Thanks > > ---- > > This commit adds a new function intended for checking the XID properties > of a possibly unicode character, as well as the accompanying enum > describing the possible properties. > > libcpp/ChangeLog: > > * charset.cc (cpp_check_xid_property): New. > * include/cpplib.h > (cpp_check_xid_property): New. > (enum cpp_xid_property): New. > > Signed-off-by: Raiki Tamura <tamaron1203@gmail.com> > --- > libcpp/charset.cc | 36 ++++++++++++++++++++++++++++++++++++ > libcpp/include/cpplib.h | 7 +++++++ > 2 files changed, 43 insertions(+) > > diff --git a/libcpp/charset.cc b/libcpp/charset.cc > index 7b625c9956a..a92ba75539e 100644 > --- a/libcpp/charset.cc > +++ b/libcpp/charset.cc > @@ -1256,6 +1256,42 @@ _cpp_uname2c_uax44_lm2 (const char *name, size_t len, char *canon_name) > return result; > } > > +/* Returns flags representing the XID properties of the given codepoint. */ > +unsigned int > +cpp_check_xid_property (cppchar_t c) > +{ > + // fast path for ASCII > + if (c < 0x80) > + { > + if (('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z')) > + return CPP_XID_START | CPP_XID_CONTINUE; > + if (('0' <= c && c <= '9') || c == '_') > + return CPP_XID_CONTINUE; > + } > + > + if (c > UCS_LIMIT) > + return 0; > + > + int mn, mx, md; > + mn = 0; > + mx = ARRAY_SIZE (ucnranges) - 1; > + while (mx != mn) > + { > + md = (mn + mx) / 2; > + if (c <= ucnranges[md].end) > + mx = md; > + else > + mn = md + 1; > + } > + > + unsigned short flags = ucnranges[mn].flags; > + > + if (flags & CXX23) > + return CPP_XID_START | CPP_XID_CONTINUE; > + if (flags & NXX23) > + return CPP_XID_CONTINUE; > + return 0; > +} > > /* Returns 1 if C is valid in an identifier, 2 if C is valid except at > the start of an identifier, and 0 if C is not valid in an > diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h > index fcdaf082b09..583e3071e90 100644 > --- a/libcpp/include/cpplib.h > +++ b/libcpp/include/cpplib.h > @@ -1606,4 +1606,11 @@ bool cpp_valid_utf8_p (const char *data, size_t num_bytes); > bool cpp_is_combining_char (cppchar_t c); > bool cpp_is_printable_char (cppchar_t c); > > +enum cpp_xid_property { > + CPP_XID_START = 1, > + CPP_XID_CONTINUE = 2 > +}; > + > +unsigned int cpp_check_xid_property (cppchar_t c); > + > #endif /* ! LIBCPP_CPPLIB_H */
On Fri, 8 Sep 2023, Arthur Cohen wrote: > + if (c < 0x80) > + { > + if (('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z')) > + return CPP_XID_START | CPP_XID_CONTINUE; > + if (('0' <= c && c <= '9') || c == '_') > + return CPP_XID_CONTINUE; This may be an artifact of how the patch was mailed, but indentation seems off here (should be six spaces for both return statements). > + md = (mn + mx) / 2; > + if (c <= ucnranges[md].end) > + mx = md; > + else > + mn = md + 1; And likewise here (should be a tab for both assignments). OK with those indentation fixes.
Hi Joseph, Thanks for the review! On 12/18/23 20:00, Joseph Myers wrote: > On Fri, 8 Sep 2023, Arthur Cohen wrote: > >> + if (c < 0x80) >> + { >> + if (('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z')) >> + return CPP_XID_START | CPP_XID_CONTINUE; >> + if (('0' <= c && c <= '9') || c == '_') >> + return CPP_XID_CONTINUE; > > This may be an artifact of how the patch was mailed, but indentation seems > off here (should be six spaces for both return statements) > >> + md = (mn + mx) / 2; >> + if (c <= ucnranges[md].end) >> + mx = md; >> + else >> + mn = md + 1; > > And likewise here (should be a tab for both assignments). > > OK with those indentation fixes. > Thanks for noticing - this was indeed misindented. I'll make the suggested fixes and will push the patch. Best, Arthur
diff --git a/libcpp/charset.cc b/libcpp/charset.cc index 7b625c9956a..a92ba75539e 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1256,6 +1256,42 @@ _cpp_uname2c_uax44_lm2 (const char *name, size_t len, char *canon_name) return result; } +/* Returns flags representing the XID properties of the given codepoint. */ +unsigned int +cpp_check_xid_property (cppchar_t c) +{ + // fast path for ASCII + if (c < 0x80) + { + if (('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z')) + return CPP_XID_START | CPP_XID_CONTINUE; + if (('0' <= c && c <= '9') || c == '_') + return CPP_XID_CONTINUE; + } + + if (c > UCS_LIMIT) + return 0; + + int mn, mx, md; + mn = 0; + mx = ARRAY_SIZE (ucnranges) - 1; + while (mx != mn) + { + md = (mn + mx) / 2; + if (c <= ucnranges[md].end) + mx = md; + else + mn = md + 1; + } + + unsigned short flags = ucnranges[mn].flags; + + if (flags & CXX23) + return CPP_XID_START | CPP_XID_CONTINUE; + if (flags & NXX23) + return CPP_XID_CONTINUE; + return 0; +} /* Returns 1 if C is valid in an identifier, 2 if C is valid except at the start of an identifier, and 0 if C is not valid in an diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h index fcdaf082b09..583e3071e90 100644 --- a/libcpp/include/cpplib.h +++ b/libcpp/include/cpplib.h @@ -1606,4 +1606,11 @@ bool cpp_valid_utf8_p (const char *data, size_t num_bytes); bool cpp_is_combining_char (cppchar_t c); bool cpp_is_printable_char (cppchar_t c); +enum cpp_xid_property { + CPP_XID_START = 1, + CPP_XID_CONTINUE = 2 +}; + +unsigned int cpp_check_xid_property (cppchar_t c); + #endif /* ! LIBCPP_CPPLIB_H */