Message ID | Zur7vC30g1Ehu2za@tucnak |
---|---|
State | New |
Headers | show |
Series | libcpp: Add -Wtrailing-blanks warning | expand |
On 9/18/24 10:11 AM, Jakub Jelinek wrote: > Hi! > > Trailing blanks is something even git diff diagnoses; while it is a coding > style issue, if it is so common that git diff diagnoses it, I think it could > be useful to various projects to check that at compile time. > > Dunno if it should be included in -Wextra, currently it isn't, and due to > tons of trailing whitespace in our sources, haven't enabled it for when > building gcc itself either. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Note, git diff also diagnoses indentation with tab following space, wonder > if we couldn't have trivial warning options where one would simply ask for > checking of indentation with no tabs, just spaces vs. indentation with > tabs followed by spaces (but never tab width or more spaces in the > indentation). I think that would be easy to do also on the libcpp side. > Checking how much something should be exactly indented requires syntax > analysis (at least some limited one) and can consider columns of first token > on line, but what the exact indentation blanks were is something only libcpp > knows. > > 2024-09-18 Jakub Jelinek <jakub@redhat.com> > > libcpp/ > * include/cpplib.h (struct cpp_options): Add cpp_warn_trailing_blanks > member. > (enum cpp_warning_reason): Add CPP_W_TRAILING_BLANKS. > * internal.h (struct _cpp_line_note): Document 'B' line note. > * lex.cc (_cpp_clean_line): Add 'B' line note for trailing blanks > except for trailing whitespace after backslash. Formatting fix. > (_cpp_process_line_notes): Emit -Wtrailing-blanks diagnostics. > Formatting fixes. > (lex_raw_string): Clear type on 'B' notes. > gcc/ > * doc/invoke.texi (Wtrailing-blanks): Document. > gcc/c-family/ > * c.opt (Wtrailing-blanks): New option. > gcc/testsuite/ > * c-c++-common/cpp/Wtrailing-blanks.c: New test. I'd tend to think we want this and would want to (at the appropriate time) turn it on for our builds. Better catch this nit early rather than at commit/push time IMHO. As for the actual review, I'm not going to be much help here. I don't know anything about this code. Jeff
Jeff Law <jeffreyalaw@gmail.com> writes: > On 9/18/24 10:11 AM, Jakub Jelinek wrote: >> Hi! >> >> Trailing blanks is something even git diff diagnoses; while it is a coding >> style issue, if it is so common that git diff diagnoses it, I think it could >> be useful to various projects to check that at compile time. Nice! Thanks for doing this. >> Dunno if it should be included in -Wextra, currently it isn't, and due to >> tons of trailing whitespace in our sources, haven't enabled it for when >> building gcc itself either. >> >> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? >> >> Note, git diff also diagnoses indentation with tab following space, wonder >> if we couldn't have trivial warning options where one would simply ask for >> checking of indentation with no tabs, just spaces vs. indentation with >> tabs followed by spaces (but never tab width or more spaces in the >> indentation). I think that would be easy to do also on the libcpp side. >> Checking how much something should be exactly indented requires syntax >> analysis (at least some limited one) and can consider columns of first token >> on line, but what the exact indentation blanks were is something only libcpp >> knows. >> >> 2024-09-18 Jakub Jelinek <jakub@redhat.com> >> >> libcpp/ >> * include/cpplib.h (struct cpp_options): Add cpp_warn_trailing_blanks >> member. >> (enum cpp_warning_reason): Add CPP_W_TRAILING_BLANKS. >> * internal.h (struct _cpp_line_note): Document 'B' line note. >> * lex.cc (_cpp_clean_line): Add 'B' line note for trailing blanks >> except for trailing whitespace after backslash. Formatting fix. >> (_cpp_process_line_notes): Emit -Wtrailing-blanks diagnostics. >> Formatting fixes. >> (lex_raw_string): Clear type on 'B' notes. >> gcc/ >> * doc/invoke.texi (Wtrailing-blanks): Document. >> gcc/c-family/ >> * c.opt (Wtrailing-blanks): New option. >> gcc/testsuite/ >> * c-c++-common/cpp/Wtrailing-blanks.c: New test. > I'd tend to think we want this and would want to (at the appropriate > time) turn it on for our builds. Better catch this nit early rather > than at commit/push time IMHO. +1 I'd much rather learn about this kind of error before the code reaches a review tool :) From a quick check, it doesn't look like Clang has this, so there is no existing name to follow. Richard
On Wed, Sep 18, 2024 at 06:17:58PM +0100, Richard Sandiford wrote: > +1 I'd much rather learn about this kind of error before the code reaches > a review tool :) > > >From a quick check, it doesn't look like Clang has this, so there is no > existing name to follow. I was considering also -Wtrailing-whitespace, but 1) git diff really warns just about trailing spaces/tabs, not form feeds or vertical tabs 2) gcc source contains tons of spots with form feed in it (though, I think pretty much always as the sole character on a line). And not really sure how people use vertical tabs in the source if at all. Perhaps form feed could be not warned if at end of line if it isn't the sole character on a line... Jakub
On Wed, Sep 18, 2024 at 1:33 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Wed, Sep 18, 2024 at 06:17:58PM +0100, Richard Sandiford wrote: > > +1 I'd much rather learn about this kind of error before the code reaches > > a review tool :) > > > > >From a quick check, it doesn't look like Clang has this, so there is no > > existing name to follow. > > I was considering also -Wtrailing-whitespace, but > 1) git diff really warns just about trailing spaces/tabs, not form feeds or > vertical tabs > 2) gcc source contains tons of spots with form feed in it (though, > I think pretty much always as the sole character on a line). > And not really sure how people use vertical tabs in the source if at all. > Perhaps form feed could be not warned if at end of line if it isn't the sole > character on a line... > > Jakub > On the topic of warnings for minor whitespace nits that git will complain about, what do people think about -Wnewline-eof? clang has it, and Apple's fork of gcc had it, but they never contributed it back upstream...
On Wed, Sep 18, 2024 at 7:33 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Wed, Sep 18, 2024 at 06:17:58PM +0100, Richard Sandiford wrote: > > +1 I'd much rather learn about this kind of error before the code reaches > > a review tool :) > > > > >From a quick check, it doesn't look like Clang has this, so there is no > > existing name to follow. > > I was considering also -Wtrailing-whitespace, but > 1) git diff really warns just about trailing spaces/tabs, not form feeds or > vertical tabs > 2) gcc source contains tons of spots with form feed in it (though, > I think pretty much always as the sole character on a line). > And not really sure how people use vertical tabs in the source if at all. > Perhaps form feed could be not warned if at end of line if it isn't the sole > character on a line... Generally I like diagnosing this early. For the above I'd say -Wtrailing-whitespace= with a set of things to diagnose (and a sane default - just spaces and tabs - for -Wtrailiing-whitespace) would be nice. As for naming possibly follow the is{space,blank,cntrl} character classifications? If those are a good fit, that is. Richard. > Jakub >
On Thu, Sep 19, 2024 at 08:17:24AM +0200, Richard Biener wrote: > On Wed, Sep 18, 2024 at 7:33 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > On Wed, Sep 18, 2024 at 06:17:58PM +0100, Richard Sandiford wrote: > > > +1 I'd much rather learn about this kind of error before the code reaches > > > a review tool :) > > > > > > >From a quick check, it doesn't look like Clang has this, so there is no > > > existing name to follow. > > > > I was considering also -Wtrailing-whitespace, but > > 1) git diff really warns just about trailing spaces/tabs, not form feeds or > > vertical tabs > > 2) gcc source contains tons of spots with form feed in it (though, > > I think pretty much always as the sole character on a line). > > And not really sure how people use vertical tabs in the source if at all. > > Perhaps form feed could be not warned if at end of line if it isn't the sole > > character on a line... > > Generally I like diagnosing this early. For the above I'd say > -Wtrailing-whitespace= > with a set of things to diagnose (and a sane default - just spaces and > tabs - for > -Wtrailiing-whitespace) would be nice. As for naming possibly follow the > is{space,blank,cntrl} character classifications? If those are a good > fit, that is. I think the character classifications risk problems. space is ' ' '\t' '\n' '\r' '\f' '\v' in the C locale, blank is ' ' '\t' cntrl is a lot of chars but not ' ' if we extend by the safe-ctype vspace '\r' '\n' nvspace ' ' '\t' '\f' '\v' '\0' Obviously, we shouldn't look at '\r' and '\n', those aren't trailing characters, those are line separators. Would we need to consider all UTF-8 (or EBCDIC-UTF) control characters is cntrl? 0000..0009 ; Control # Cc [10] <control-0000>..<control-0009> 000B..000C ; Control # Cc [2] <control-000B>..<control-000C> 000E..001F ; Control # Cc [18] <control-000E>..<control-001F> 007F..009F ; Control # Cc [33] <control-007F>..<control-009F> 00AD ; Control # Cf SOFT HYPHEN 061C ; Control # Cf ARABIC LETTER MARK 180E ; Control # Cf MONGOLIAN VOWEL SEPARATOR 200B ; Control # Cf ZERO WIDTH SPACE 200E..200F ; Control # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK 2028 ; Control # Zl LINE SEPARATOR 2029 ; Control # Zp PARAGRAPH SEPARATOR 202A..202E ; Control # Cf [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE 2060..2064 ; Control # Cf [5] WORD JOINER..INVISIBLE PLUS 2065 ; Control # Cn <reserved-2065> 2066..206F ; Control # Cf [10] LEFT-TO-RIGHT ISOLATE..NOMINAL DIGIT SHAPES FEFF ; Control # Cf ZERO WIDTH NO-BREAK SPACE FFF0..FFF8 ; Control # Cn [9] <reserved-FFF0>..<reserved-FFF8> FFF9..FFFB ; Control # Cf [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR 13430..1343F ; Control # Cf [16] EGYPTIAN HIEROGLYPH VERTICAL JOINER..EGYPTIAN HIEROGLYPH END WALLED ENCLOSURE 1BCA0..1BCA3 ; Control # Cf [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND FORMAT UP STEP 1D173..1D17A ; Control # Cf [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE E0000 ; Control # Cn <reserved-E0000> E0001 ; Control # Cf LANGUAGE TAG E0002..E001F ; Control # Cn [30] <reserved-E0002>..<reserved-E001F> E0080..E00FF ; Control # Cn [128] <reserved-E0080>..<reserved-E00FF> E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF> Wonder why anybody would be interested to find just trailing spaces and not trailing tabs or vice versa, so if we have categories, blank would be one, then perhaps nvspace as something not including '\0', so just ' ' '\t' '\f' '\v' and if really needed, control characters with added ' ', but how to call that and would it really need to parse UTF-8/EBCDIC and look at pregenerated tables? Jakub
On Thu, Sep 19, 2024 at 09:07:06AM +0200, Jakub Jelinek wrote: > space is ' ' '\t' '\n' '\r' '\f' '\v' in the C locale, > blank is ' ' '\t' > cntrl is a lot of chars but not ' ' > if we extend by the safe-ctype > vspace '\r' '\n' > nvspace ' ' '\t' '\f' '\v' '\0' > Obviously, we shouldn't look at '\r' and '\n', those aren't trailing > characters, those are line separators. > > Would we need to consider all UTF-8 (or EBCDIC-UTF) control characters is > cntrl? > 0000..0009 ; Control # Cc [10] <control-0000>..<control-0009> > 000B..000C ; Control # Cc [2] <control-000B>..<control-000C> > 000E..001F ; Control # Cc [18] <control-000E>..<control-001F> > 007F..009F ; Control # Cc [33] <control-007F>..<control-009F> > 00AD ; Control # Cf SOFT HYPHEN > 061C ; Control # Cf ARABIC LETTER MARK > 180E ; Control # Cf MONGOLIAN VOWEL SEPARATOR > 200B ; Control # Cf ZERO WIDTH SPACE > 200E..200F ; Control # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK > 2028 ; Control # Zl LINE SEPARATOR > 2029 ; Control # Zp PARAGRAPH SEPARATOR > 202A..202E ; Control # Cf [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE > 2060..2064 ; Control # Cf [5] WORD JOINER..INVISIBLE PLUS > 2065 ; Control # Cn <reserved-2065> > 2066..206F ; Control # Cf [10] LEFT-TO-RIGHT ISOLATE..NOMINAL DIGIT SHAPES > FEFF ; Control # Cf ZERO WIDTH NO-BREAK SPACE > FFF0..FFF8 ; Control # Cn [9] <reserved-FFF0>..<reserved-FFF8> > FFF9..FFFB ; Control # Cf [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR > 13430..1343F ; Control # Cf [16] EGYPTIAN HIEROGLYPH VERTICAL JOINER..EGYPTIAN HIEROGLYPH END WALLED ENCLOSURE > 1BCA0..1BCA3 ; Control # Cf [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND FORMAT UP STEP > 1D173..1D17A ; Control # Cf [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE > E0000 ; Control # Cn <reserved-E0000> > E0001 ; Control # Cf LANGUAGE TAG > E0002..E001F ; Control # Cn [30] <reserved-E0002>..<reserved-E001F> > E0080..E00FF ; Control # Cn [128] <reserved-E0080>..<reserved-E00FF> > E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF> > > Wonder why anybody would be interested to find just trailing spaces and not > trailing tabs or vice versa, so if we have categories, blank would be one, > then perhaps nvspace as something not including '\0', so just ' ' '\t' '\f' > '\v' and if really needed, control characters with added ' ', but how to > call that and would it really need to parse UTF-8/EBCDIC and look at > pregenerated tables? And there are also: 0009..000D ; White_Space # Cc [5] <control-0009>..<control-000D> 0020 ; White_Space # Zs SPACE 0085 ; White_Space # Cc <control-0085> 00A0 ; White_Space # Zs NO-BREAK SPACE 1680 ; White_Space # Zs OGHAM SPACE MARK 2000..200A ; White_Space # Zs [11] EN QUAD..HAIR SPACE 2028 ; White_Space # Zl LINE SEPARATOR 2029 ; White_Space # Zp PARAGRAPH SEPARATOR 202F ; White_Space # Zs NARROW NO-BREAK SPACE 205F ; White_Space # Zs MEDIUM MATHEMATICAL SPACE 3000 ; White_Space # Zs IDEOGRAPHIC SPACE Jakub
--- libcpp/include/cpplib.h.jj 2024-09-13 16:09:32.690455174 +0200 +++ libcpp/include/cpplib.h 2024-09-18 13:01:26.289338279 +0200 @@ -594,6 +594,9 @@ struct cpp_options /* True if -finput-charset= option has been used explicitly. */ bool cpp_input_charset_explicit; + /* True if -Wtrailing-blanks. */ + bool cpp_warn_trailing_blanks; + /* Dependency generation. */ struct { @@ -709,7 +712,8 @@ enum cpp_warning_reason { CPP_W_EXPANSION_TO_DEFINED, CPP_W_BIDIRECTIONAL, CPP_W_INVALID_UTF8, - CPP_W_UNICODE + CPP_W_UNICODE, + CPP_W_TRAILING_BLANKS }; /* Callback for header lookup for HEADER, which is the name of a --- libcpp/internal.h.jj 2024-09-18 09:45:36.832570227 +0200 +++ libcpp/internal.h 2024-09-18 12:36:04.386099371 +0200 @@ -318,8 +318,8 @@ struct _cpp_line_note /* Type of note. The 9 'from' trigraph characters represent those trigraphs, '\\' an escaped newline, ' ' an escaped newline with - intervening space, 0 represents a note that has already been handled, - and anything else is invalid. */ + intervening space, 'B' trailing blanks, 0 represents a note that + has already been handled, and anything else is invalid. */ unsigned int type; }; --- libcpp/lex.cc.jj 2024-09-13 16:09:32.720454758 +0200 +++ libcpp/lex.cc 2024-09-18 14:00:46.344062046 +0200 @@ -928,7 +928,7 @@ _cpp_clean_line (cpp_reader *pfile) if (p == buffer->next_line || p[-1] != '\\') break; - add_line_note (buffer, p - 1, p != d ? ' ': '\\'); + add_line_note (buffer, p - 1, p != d ? ' ' : '\\'); d = p - 2; buffer->next_line = p - 1; } @@ -943,6 +943,11 @@ _cpp_clean_line (cpp_reader *pfile) } } } + done: + if (d > buffer->next_line + && ISBLANK (d[-1]) + && CPP_OPTION (pfile, cpp_warn_trailing_blanks)) + add_line_note (buffer, d - 1, 'B'); } else { @@ -955,7 +960,6 @@ _cpp_clean_line (cpp_reader *pfile) s++; } - done: *d = '\n'; /* A sentinel note that should never be processed. */ add_line_note (buffer, d + 1, '\n'); @@ -1013,13 +1017,23 @@ _cpp_process_line_notes (cpp_reader *pfi if (note->type == '\\' || note->type == ' ') { - if (note->type == ' ' && !in_comment) - cpp_error_with_line (pfile, CPP_DL_WARNING, pfile->line_table->highest_line, col, - "backslash and newline separated by space"); + if (note->type == ' ') + { + if (!in_comment) + cpp_error_with_line (pfile, CPP_DL_WARNING, + pfile->line_table->highest_line, col, + "backslash and newline separated by " + "space"); + else if (CPP_OPTION (pfile, cpp_warn_trailing_blanks)) + cpp_warning_with_line (pfile, CPP_W_TRAILING_BLANKS, + pfile->line_table->highest_line, col, + "trailing blanks"); + } if (buffer->next_line > buffer->rlimit) { - cpp_error_with_line (pfile, CPP_DL_PEDWARN, pfile->line_table->highest_line, col, + cpp_error_with_line (pfile, CPP_DL_PEDWARN, + pfile->line_table->highest_line, col, "backslash-newline at end of file"); /* Prevent "no newline at end of file" warning. */ buffer->next_line = buffer->rlimit; @@ -1040,15 +1054,16 @@ _cpp_process_line_notes (cpp_reader *pfi note->type, (int) _cpp_trigraph_map[note->type]); else - { - cpp_warning_with_line - (pfile, CPP_W_TRIGRAPHS, - pfile->line_table->highest_line, col, - "trigraph ??%c ignored, use -trigraphs to enable", - note->type); - } + cpp_warning_with_line (pfile, CPP_W_TRIGRAPHS, + pfile->line_table->highest_line, col, + "trigraph ??%c ignored, use -trigraphs " + "to enable", note->type); } } + else if (note->type == 'B') + cpp_warning_with_line (pfile, CPP_W_TRAILING_BLANKS, + pfile->line_table->highest_line, col, + "trailing blanks"); else if (note->type == 0) /* Already processed in lex_raw_string. */; else @@ -2539,6 +2554,12 @@ lex_raw_string (cpp_reader *pfile, cpp_t note->type = 0; note++; break; + + case 'B': + /* Don't warn about trailing blanks in raw string literals. */ + note->type = 0; + note++; + break; default: gcc_checking_assert (_cpp_trigraph_map[note->type]); --- gcc/doc/invoke.texi.jj 2024-09-12 18:15:20.458626277 +0200 +++ gcc/doc/invoke.texi 2024-09-18 14:46:39.584782466 +0200 @@ -8996,6 +8996,13 @@ will always be false. This warning is enabled by @option{-Wall}. +@opindex Wtrailing-blanks +@opindex Wno-trailing-blanks +@item -Wtrailing-blanks +Warn about trailing blanks (spaces or horizontal tabs) at the end of lines, +including inside of comments, but excluding trailing blanks in raw string +literals. This is a coding style warning. + @opindex Wtrampolines @opindex Wno-trampolines @item -Wtrampolines --- gcc/c-family/c.opt.jj 2024-09-12 18:15:20.415626861 +0200 +++ gcc/c-family/c.opt 2024-09-18 13:01:56.404927852 +0200 @@ -1450,6 +1450,10 @@ Wtraditional-conversion C ObjC Var(warn_traditional_conversion) Warning Warn of prototypes causing type conversions different from what would happen in the absence of prototype. +Wtrailing-blanks +C ObjC C++ ObjC++ CPP(cpp_warn_trailing_blanks) CppReason(CPP_W_TRAILING_BLANKS) Var(warn_trailing_blanks) Init(0) Warning +Warn about trailing blanks on lines except when in raw string literals. + Wtrigraphs C ObjC C++ ObjC++ CPP(warn_trigraphs) CppReason(CPP_W_TRIGRAPHS) Var(cpp_warn_trigraphs) Init(2) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall) Warn if trigraphs are encountered that might affect the meaning of the program. --- gcc/testsuite/c-c++-common/cpp/Wtrailing-blanks.c.jj 2024-09-18 14:44:22.712636656 +0200 +++ gcc/testsuite/c-c++-common/cpp/Wtrailing-blanks.c 2024-09-18 13:21:20.898071467 +0200 @@ -0,0 +1,30 @@ +/* { dg-do compile { target { c || c++11 } } } */ +/* { dg-options "-Wtrailing-blanks" } */ + +int i; +/* { dg-warning "trailing blanks" "" { target *-*-* } .-1 } */ +int j; +/* { dg-warning "trailing blanks" "" { target *-*-* } .-1 } */ +int \ + k \ + ; +/* { dg-warning "backslash and newline separated by space" "" { target *-*-* } .-3 } */ +/* { dg-warning "backslash and newline separated by space" "" { target *-*-* } .-3 } */ +/* { dg-warning "trailing blanks" "" { target *-*-* } .-3 } */ + + + +/* { dg-warning "trailing blanks" "" { target *-*-* } .-1 } */ +const char *p = R"*|*( + + +. +)*|*"; +/* { dg-warning "trailing blanks" "" { target *-*-* } .-1 } */ +// This is a comment with trailing blank +/* { dg-warning "trailing blanks" "" { target *-*-* } .-1 } */ +/* This is a comment with trailing blanks +*/ +/* { dg-warning "trailing blanks" "" { target *-*-* } .-2 } */ +/* { dg-warning "trailing blanks" "" { target *-*-* } .+1 } */ + \ No newline at end of file