[RFC,0/8] Introduce an extensible static analyzer

Message ID	20220702113331.2003820-1-afaria@redhat.com
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Alberto Faria <afaria@redhat.com> To: qemu-devel@nongnu.org Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-block@nongnu.org, "Denis V. Lunev" <den@openvz.org>, Emanuele Giuseppe Esposito <eesposit@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Ronnie Sahlberg <ronniesahlberg@gmail.com>, Hanna Reitz <hreitz@redhat.com>, Stefano Garzarella <sgarzare@redhat.com>, Kevin Wolf <kwolf@redhat.com>, Peter Xu <peterx@redhat.com>, Alberto Garcia <berto@igalia.com>, John Snow <jsnow@redhat.com>, Eric Blake <eblake@redhat.com>, Fam Zheng <fam@euphon.net>, Markus Armbruster <armbru@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>, Peter Lieven <pl@kamp.de>, Alberto Faria <afaria@redhat.com> Subject: [RFC 0/8] Introduce an extensible static analyzer Date: Sat, 2 Jul 2022 12:33:23 +0100 Message-Id: <20220702113331.2003820-1-afaria@redhat.com> Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=170.10.133.124; envelope-from=afaria@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	Introduce an extensible static analyzer \| expand [RFC,0/8] Introduce an extensible static analyzer [RFC,1/8] Add an extensible static analyzer [RFC,2/8] Drop some unused static function return values [RFC,3/8] static-analyzer: Enforce coroutine_fn restrictions for direct calls [RFC,4/8] Fix some direct calls from non-coroutine_fn to coroutine_fn [RFC,5/8] static-analyzer: Enforce coroutine_fn restrictions on function pointers [RFC,6/8] Fix some coroutine_fn indirect calls and pointer assignments [RFC,7/8] block: Add no_coroutine_fn marker [RFC,8/8] Avoid calls from coroutine_fn to no_coroutine_fn

Alberto Faria July 2, 2022, 11:33 a.m. UTC

This series introduces a static analyzer for QEMU. It consists of a
single static-analyzer.py script that relies on libclang's Python
bindings, and provides a common framework on which arbitrary static
analysis checks can be developed and run against QEMU's code base.

Summary of the series:

  - Patch 1 adds the base static analyzer, along with a simple check
    that finds static functions whose return value is never used, and
    patch 2 fixes some occurrences of this.

  - Patch 3 adds a check to ensure that non-coroutine_fn functions don't
    perform direct calls to coroutine_fn functions, and patch 4 fixes
    some violations of this rule.

  - Patch 5 adds a check to enforce coroutine_fn restrictions on
    function pointers, namely around assignment and indirect calls, and
    patch 6 fixes some problems it detects. (Implementing this check
    properly is complicated, since AFAICT annotation attributes cannot
    be applied directly to types. This part still needs a lot of work.)

  - Patch 7 introduces a no_coroutine_fn marker for functions that
    should not be called from coroutines, makes generated_co_wrapper
    evaluate to no_coroutine_fn, and adds a check enforcing this rule.
    Patch 8 fixes some violations that it finds.

The current primary motivation for this work is enforcing rules around
block layer coroutines, which is why most of the series focuses on that.
However, the static analyzer is intended to be sufficiently generic to
satisfy other present and future QEMU static analysis needs.

This is very early work-in-progress, and a lot is missing. One notable
omission is build system integration, including keeping track of which
translation units have been modified and need re-analyzing.

Performance is bad, but there is a lot of potential for optimization,
such as avoiding redundant AST traversals. Switching to C libclang is
also a possibility, although Python makes it easy to quickly prototype
new checks, which should encourage adoption and contributions.

The script takes a path to the build directory, and any number of paths
to directories or files to analyze. Example run on a 12-thread laptop:

    $ time ./static-analyzer.py build block
    block/commit.c:525:15: non-coroutine_fn function calls coroutine_fn
    block/nbd.c:206:5: non-coroutine_fn function calls coroutine_fn
    [...]
    block/ssh.c:1167:13: non-coroutine_fn function calls coroutine_fn
    block/nfs.c:229:27: non-coroutine_fn function calls coroutine_fn
    Analyzed 79 translation units.

    real    0m45.277s
    user    7m55.496s
    sys     0m1.445s

You will need libclang's Python bindings to run this. On Fedora, `dnf
install python3-clang` should suffice.

Alberto Faria (8):
  Add an extensible static analyzer
  Drop some unused static function return values
  static-analyzer: Enforce coroutine_fn restrictions for direct calls
  Fix some direct calls from non-coroutine_fn to coroutine_fn
  static-analyzer: Enforce coroutine_fn restrictions on function
    pointers
  Fix some coroutine_fn indirect calls and pointer assignments
  block: Add no_coroutine_fn marker
  Avoid calls from coroutine_fn to no_coroutine_fn

 block/block-backend.c            |  15 +-
 block/copy-before-write.c        |   3 +-
 block/dirty-bitmap.c             |   6 +-
 block/file-posix.c               |   6 +-
 block/io.c                       |  34 +-
 block/iscsi.c                    |   3 +-
 block/parallels.c                |   4 +-
 block/qcow2-bitmap.c             |   6 +-
 block/qcow2-refcount.c           |   2 +-
 block/qcow2.h                    |  14 +-
 block/qed-table.c                |   2 +-
 block/qed.c                      |   8 +-
 block/quorum.c                   |   5 +-
 block/vmdk.c                     |   4 +-
 block/vpc.c                      |   9 +-
 block/vvfat.c                    |  11 +-
 include/block/block-common.h     |   2 +-
 include/block/block-io.h         |   7 +-
 include/block/block_int-common.h |  12 +-
 include/qemu/coroutine.h         |  25 +
 static-analyzer.py               | 890 +++++++++++++++++++++++++++++++
 21 files changed, 987 insertions(+), 81 deletions(-)
 create mode 100755 static-analyzer.py

Paolo Bonzini July 2, 2022, 2:17 p.m. UTC | #1

On 7/2/22 13:33, Alberto Faria wrote:
> The current primary motivation for this work is enforcing rules around
> block layer coroutines, which is why most of the series focuses on that.
> However, the static analyzer is intended to be sufficiently generic to
> satisfy other present and future QEMU static analysis needs.
> 
> This is very early work-in-progress, and a lot is missing. One notable
> omission is build system integration, including keeping track of which
> translation units have been modified and need re-analyzing.
> 
> Performance is bad, but there is a lot of potential for optimization,
> such as avoiding redundant AST traversals. Switching to C libclang is
> also a possibility, although Python makes it easy to quickly prototype
> new checks, which should encourage adoption and contributions.
> 
> The script takes a path to the build directory, and any number of paths
> to directories or files to analyze. Example run on a 12-thread laptop:

Thanks, this is very useful!  Unfortunately there's quite a lot of fixes 
to go in (including your generated_co_wrapper cleanup series and 
mine[1]) before this can be enabled, but I think this is the way to go 
to 1) ease maintainability of coroutine code 2) move towards a model 
where there are no mixed coroutine/non-coroutine functions.

I'll review it when I'm back, since a phone screen is not the best 
environment to do that. :)

Paolo

[1] https://patchew.org/QEMU/20220509103019.215041-1-pbonzini@redhat.com/

Daniel P. Berrangé July 4, 2022, 4:28 p.m. UTC | #2

On Sat, Jul 02, 2022 at 12:33:23PM +0100, Alberto Faria wrote:
> This series introduces a static analyzer for QEMU. It consists of a
> single static-analyzer.py script that relies on libclang's Python
> bindings, and provides a common framework on which arbitrary static
> analysis checks can be developed and run against QEMU's code base.
> 
> Summary of the series:
> 
>   - Patch 1 adds the base static analyzer, along with a simple check
>     that finds static functions whose return value is never used, and
>     patch 2 fixes some occurrences of this.
> 
>   - Patch 3 adds a check to ensure that non-coroutine_fn functions don't
>     perform direct calls to coroutine_fn functions, and patch 4 fixes
>     some violations of this rule.
> 
>   - Patch 5 adds a check to enforce coroutine_fn restrictions on
>     function pointers, namely around assignment and indirect calls, and
>     patch 6 fixes some problems it detects. (Implementing this check
>     properly is complicated, since AFAICT annotation attributes cannot
>     be applied directly to types. This part still needs a lot of work.)
> 
>   - Patch 7 introduces a no_coroutine_fn marker for functions that
>     should not be called from coroutines, makes generated_co_wrapper
>     evaluate to no_coroutine_fn, and adds a check enforcing this rule.
>     Patch 8 fixes some violations that it finds.
> 
> The current primary motivation for this work is enforcing rules around
> block layer coroutines, which is why most of the series focuses on that.
> However, the static analyzer is intended to be sufficiently generic to
> satisfy other present and future QEMU static analysis needs.
> 
> This is very early work-in-progress, and a lot is missing. One notable
> omission is build system integration, including keeping track of which
> translation units have been modified and need re-analyzing.
> 
> Performance is bad, but there is a lot of potential for optimization,
> such as avoiding redundant AST traversals. Switching to C libclang is
> also a possibility, although Python makes it easy to quickly prototype
> new checks, which should encourage adoption and contributions.

Have you done any measurement see how much of the overhead is from
the checks you implemented, vs how much is inherantly forced on us
by libclang ? ie what does it look like if you just load the libclang
framework and run it actross all source files, without doing any
checks in python.

If i run 'clang-tidy' across the entire source tree, it takes 3 minutes
on my machine, but there's overhead of repeatedly starting the process
in there.

I think anything that actually fully parses the source files is going
to have a decent sized unavoidable overhead, when run across the whole
source tree.

Still having a properly parsed abstract source tree is an inherantly
useful thing. for certain types of static analysis check. Some things
get unreliable real fast if you try to anlyse using regex matches and
similar approaches that are the common alternative. So libclang is
understandably appealing in this respect.

> The script takes a path to the build directory, and any number of paths
> to directories or files to analyze. Example run on a 12-thread laptop:
> 
>     $ time ./static-analyzer.py build block
>     block/commit.c:525:15: non-coroutine_fn function calls coroutine_fn
>     block/nbd.c:206:5: non-coroutine_fn function calls coroutine_fn
>     [...]
>     block/ssh.c:1167:13: non-coroutine_fn function calls coroutine_fn
>     block/nfs.c:229:27: non-coroutine_fn function calls coroutine_fn
>     Analyzed 79 translation units.
> 
>     real    0m45.277s
>     user    7m55.496s
>     sys     0m1.445s

Ouch, that is incredibly slow :-(

With regards,
Daniel

Alberto Faria July 4, 2022, 7:30 p.m. UTC | #3

On Mon, Jul 4, 2022 at 5:28 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> Have you done any measurement see how much of the overhead is from
> the checks you implemented, vs how much is inherantly forced on us
> by libclang ? ie what does it look like if you just load the libclang
> framework and run it actross all source files, without doing any
> checks in python.

Running the script with all checks disabled, i.e., doing nothing after
TranslationUnit.from_source():

    $ time ./static-analyzer.py build
    [...]
    Analyzed 8775 translation units in 274.0 seconds.

    real    4m34.537s
    user    49m32.555s
    sys     1m18.731s

    $ time ./static-analyzer.py build block util
    Analyzed 162 translation units in 4.2 seconds.

    real    0m4.804s
    user    0m40.389s
    sys     0m1.690s

This is still with 12 threads on a 12-hardware thread laptop, but
scalability is near perfect. (The time reported by the script doesn't
include loading and inspection of the compilation database.)

So, not great. What's more, TranslationUnit.from_source() delegates
all work to clang_parseTranslationUnit(), so I suspect C libclang
wouldn't do much better.

And with all checks enabled:

    $ time ./static-analyzer.py build block util
    [...]
    Analyzed 162 translation units in 86.4 seconds.

    real    1m26.999s
    user    14m51.163s
    sys     0m2.205s

Yikes. Also not great at all, although the current implementation does
many inefficient things, like redundant AST traversals. Cutting
through some of clang.cindex's abstractions should also help, e.g.,
using libclang's visitor API properly instead of calling
clang_visitChildren() for every get_children().

Perhaps we should set a target for how slow we can tolerate this thing
to be, as a percentage of total build time, and determine if the
libclang approach is viable. I'm thinking maybe 10%?

> If i run 'clang-tidy' across the entire source tree, it takes 3 minutes
> on my machine, but there's overhead of repeatedly starting the process
> in there.

Is that parallelized in some way? It seems strange that clang-tidy
would be so much faster than libclang.

> I think anything that actually fully parses the source files is going
> to have a decent sized unavoidable overhead, when run across the whole
> source tree.
>
> Still having a properly parsed abstract source tree is an inherantly
> useful thing. for certain types of static analysis check. Some things
> get unreliable real fast if you try to anlyse using regex matches and
> similar approaches that are the common alternative. So libclang is
> understandably appealing in this respect.
>
> > The script takes a path to the build directory, and any number of paths
> > to directories or files to analyze. Example run on a 12-thread laptop:
> >
> >     $ time ./static-analyzer.py build block
> >     block/commit.c:525:15: non-coroutine_fn function calls coroutine_fn
> >     block/nbd.c:206:5: non-coroutine_fn function calls coroutine_fn
> >     [...]
> >     block/ssh.c:1167:13: non-coroutine_fn function calls coroutine_fn
> >     block/nfs.c:229:27: non-coroutine_fn function calls coroutine_fn
> >     Analyzed 79 translation units.
> >
> >     real    0m45.277s
> >     user    7m55.496s
> >     sys     0m1.445s
>
> Ouch, that is incredibly slow :-(

Yup :-(

Alberto

Daniel P. Berrangé July 5, 2022, 7:16 a.m. UTC | #4

On Mon, Jul 04, 2022 at 08:30:08PM +0100, Alberto Faria wrote:
> On Mon, Jul 4, 2022 at 5:28 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > Have you done any measurement see how much of the overhead is from
> > the checks you implemented, vs how much is inherantly forced on us
> > by libclang ? ie what does it look like if you just load the libclang
> > framework and run it actross all source files, without doing any
> > checks in python.
> 
> Running the script with all checks disabled, i.e., doing nothing after
> TranslationUnit.from_source():
> 
>     $ time ./static-analyzer.py build
>     [...]
>     Analyzed 8775 translation units in 274.0 seconds.
> 
>     real    4m34.537s
>     user    49m32.555s
>     sys     1m18.731s
> 
>     $ time ./static-analyzer.py build block util
>     Analyzed 162 translation units in 4.2 seconds.
> 
>     real    0m4.804s
>     user    0m40.389s
>     sys     0m1.690s
> 
> This is still with 12 threads on a 12-hardware thread laptop, but
> scalability is near perfect. (The time reported by the script doesn't
> include loading and inspection of the compilation database.)
> 
> So, not great. What's more, TranslationUnit.from_source() delegates
> all work to clang_parseTranslationUnit(), so I suspect C libclang
> wouldn't do much better.
> 
> And with all checks enabled:
> 
>     $ time ./static-analyzer.py build block util
>     [...]
>     Analyzed 162 translation units in 86.4 seconds.
> 
>     real    1m26.999s
>     user    14m51.163s
>     sys     0m2.205s
> 
> Yikes. Also not great at all, although the current implementation does
> many inefficient things, like redundant AST traversals. Cutting
> through some of clang.cindex's abstractions should also help, e.g.,
> using libclang's visitor API properly instead of calling
> clang_visitChildren() for every get_children().
> 
> Perhaps we should set a target for how slow we can tolerate this thing
> to be, as a percentage of total build time, and determine if the
> libclang approach is viable. I'm thinking maybe 10%?
> 
> > If i run 'clang-tidy' across the entire source tree, it takes 3 minutes
> > on my machine, but there's overhead of repeatedly starting the process
> > in there.
> 
> Is that parallelized in some way? It seems strange that clang-tidy
> would be so much faster than libclang.

No, that was me doing a dumb

     for i in `git ls-tree --name-only -r HEAD:`
     do
        clang-tidy $i 1>/dev/null 2>&1
     done

so in fact it was parsing all source files, not just .c files (and
likely whining about non-C files.  This was on my laptop with 6
cores / 2 SMT

With regards,
Daniel

Alberto Faria July 5, 2022, 11:28 a.m. UTC | #5

On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>      for i in `git ls-tree --name-only -r HEAD:`
>      do
>         clang-tidy $i 1>/dev/null 2>&1
>      done

All of those invocations are probably failing quickly due to missing
includes and other problems, since the location of the compilation
database and some other arguments haven't been specified.

Accounting for those problems (and enabling just one random C++ check):

    $ time clang-tidy -p build \
        --extra-arg-before=-Wno-unknown-warning-option \
        --extra-arg='-isystem [...]' \
        --checks='-*,clang-analyzer-cplusplus.Move' \
        $( find block -name '*.c' )
    [...]

    real    3m0.260s
    user    2m58.041s
    sys     0m1.467s

Single-threaded static-analyzer.py without any checks:

    $ time ./static-analyzer.py build block -j 1
    Analyzed 79 translation units in 16.0 seconds.

    real    0m16.665s
    user    0m15.967s
    sys     0m0.604s

And with just the 'return-value-never-used' check enabled for a
somewhat fairer comparison:

    $ time ./static-analyzer.py build block -j 1 \
        -c return-value-never-used
    Analyzed 79 translation units in 61.5 seconds.

    real    1m2.080s
    user    1m1.372s
    sys     0m0.513s

Which is good news!

Alberto

Daniel P. Berrangé July 5, 2022, 4:12 p.m. UTC | #6

On Tue, Jul 05, 2022 at 12:28:55PM +0100, Alberto Faria wrote:
> On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >      for i in `git ls-tree --name-only -r HEAD:`
> >      do
> >         clang-tidy $i 1>/dev/null 2>&1
> >      done
> 
> All of those invocations are probably failing quickly due to missing
> includes and other problems, since the location of the compilation
> database and some other arguments haven't been specified.

Opps yes, I was waaaay too minimalist in testing that.

> 
> Accounting for those problems (and enabling just one random C++ check):
> 
>     $ time clang-tidy -p build \
>         --extra-arg-before=-Wno-unknown-warning-option \
>         --extra-arg='-isystem [...]' \
>         --checks='-*,clang-analyzer-cplusplus.Move' \
>         $( find block -name '*.c' )
>     [...]
> 
>     real    3m0.260s
>     user    2m58.041s
>     sys     0m1.467s

Only analysing the block tree, but if we consider a static analysis
framework is desirable to use for whole of qemu.git, lets see the
numbers for everything.

What follows was done on  my P1 Gen2 thinkpad with 6 core / 12 threads,
where I use 'make -j 12' normally.

First as a benchmark, lets see a full compile of whole of QEMU (with
GCC) on Fedora 36 x86_64

    => 14 minutes

I find this waaaaay too slow though, so I typically configure QEMU with
--target-list=x86_64-softmmu since that suffices 90% of the time.

   => 2 minutes

A 'make check' on this x86_64-only build takes another 2 minutes.

Now, a static analysis baseline across the whole tree with default
tests enabled

 $ clang-tidy --quiet -p build $(git ls-tree -r --name-only HEAD: | grep '\.c$')

  => 45 minutes

wow, wasn't expecting it to be that slow !

Lets restrict to just the block/ dir

 $ clang-tidy --quiet -p build $(find block -name '*.c')

  => 4 minutes

And further restrict to just 1 simple check

 $ clang-tidy --quiet   --checks='-*,clang-analyzer-cplusplus.Move'  -p build $(find block -name '*.c')
  => 2 minutes 30

So extrapolated just that single check would probably work out
at 30 mins for the whole tree.

Overall this isn't cheap, and in the same order of magnitude
as a full compile. I guess this shouldn't be that surprising
really.

> Single-threaded static-analyzer.py without any checks:
> 
>     $ time ./static-analyzer.py build block -j 1
>     Analyzed 79 translation units in 16.0 seconds.
> 
>     real    0m16.665s
>     user    0m15.967s
>     sys     0m0.604s
> 
> And with just the 'return-value-never-used' check enabled for a
> somewhat fairer comparison:
> 
>     $ time ./static-analyzer.py build block -j 1 \
>         -c return-value-never-used
>     Analyzed 79 translation units in 61.5 seconds.
> 
>     real    1m2.080s
>     user    1m1.372s
>     sys     0m0.513s
> 
> Which is good news!

On my machine, a whole tree analysis allowing parallel execution
(iiuc no -j arg means use all cores):

  ./static-analyzer.py build  $(git ls-tree -r --name-only HEAD: | grep '\.c$'

   => 13 minutes

Or just the block layer

  ./static-analyzer.py build  $(find block -name '*.c')

   => 45 seconds

So your script is faster than clang-tidy, which suggests python probably
isn't the major dominating factor in speed, at least at this point in
time.

Still, a full tree analysis time of 13 minutes, compared to  my normal
'make' time of 2 minutes is an order of magnitude.

One thing that I noticed when doing this is that we can only really
do static analysis of files that we can actually compile on the host.
IOW, if on a Linux host, we don't get static analysis of code that
is targetted at FreeBSD / Windows platforms. Obvious really, since
libclang has to do a full parse and this will lack header files
for those platforms. That's just the tradeoff you have to accept
when using a compiler for static analysis, vs a tool that does
"dumb" string based regex matching to detect mistakes. Both kinds
of tools likely have their place for different tasks.

Overall I think a libclang based analysis tool will be useful, but
I can't see us enabling it as a standard part of 'make check'
given the time penalty.

Feels like something that'll have to be opt-in to a large degree
for regular contributors. In terms of gating CI though, it is less
of an issue, since we massively parallelize jobs. As long as we
have a dedicated build job just for running this static analysis
check in isolation, and NOT as 'make check' in all existing jobs,
it can happen in parallel with all the other build jobs, and we
won't notice the speed.

In summary, I think this approach is viable despite the speed
penalty provided we dont wire it into 'make check' by default.

With regards,
Daniel

Daniel P. Berrangé July 5, 2022, 4:13 p.m. UTC | #7

On Sat, Jul 02, 2022 at 12:33:23PM +0100, Alberto Faria wrote:
> This series introduces a static analyzer for QEMU. It consists of a
> single static-analyzer.py script that relies on libclang's Python
> bindings, and provides a common framework on which arbitrary static
> analysis checks can be developed and run against QEMU's code base.
> 
> Summary of the series:
> 
>   - Patch 1 adds the base static analyzer, along with a simple check
>     that finds static functions whose return value is never used, and
>     patch 2 fixes some occurrences of this.
> 
>   - Patch 3 adds a check to ensure that non-coroutine_fn functions don't
>     perform direct calls to coroutine_fn functions, and patch 4 fixes
>     some violations of this rule.
> 
>   - Patch 5 adds a check to enforce coroutine_fn restrictions on
>     function pointers, namely around assignment and indirect calls, and
>     patch 6 fixes some problems it detects. (Implementing this check
>     properly is complicated, since AFAICT annotation attributes cannot
>     be applied directly to types. This part still needs a lot of work.)
> 
>   - Patch 7 introduces a no_coroutine_fn marker for functions that
>     should not be called from coroutines, makes generated_co_wrapper
>     evaluate to no_coroutine_fn, and adds a check enforcing this rule.
>     Patch 8 fixes some violations that it finds.


FWIW, after applying this series 'make check' throws lots of failures
and hangs for me in the block I/O tests, so something appears not quite
correct here. I didn't bother to investigate/debug since you marked this
as just an RFC

With regards,
Daniel

Alberto Faria July 6, 2022, 9:54 a.m. UTC | #8

On Tue, Jul 5, 2022 at 5:12 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Tue, Jul 05, 2022 at 12:28:55PM +0100, Alberto Faria wrote:
> > On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >      for i in `git ls-tree --name-only -r HEAD:`
> > >      do
> > >         clang-tidy $i 1>/dev/null 2>&1
> > >      done
> >
> > All of those invocations are probably failing quickly due to missing
> > includes and other problems, since the location of the compilation
> > database and some other arguments haven't been specified.
>
> Opps yes, I was waaaay too minimalist in testing that.
>
> >
> > Accounting for those problems (and enabling just one random C++ check):
> >
> >     $ time clang-tidy -p build \
> >         --extra-arg-before=-Wno-unknown-warning-option \
> >         --extra-arg='-isystem [...]' \
> >         --checks='-*,clang-analyzer-cplusplus.Move' \
> >         $( find block -name '*.c' )
> >     [...]
> >
> >     real    3m0.260s
> >     user    2m58.041s
> >     sys     0m1.467s
>
> Only analysing the block tree, but if we consider a static analysis
> framework is desirable to use for whole of qemu.git, lets see the
> numbers for everything.
>
> What follows was done on  my P1 Gen2 thinkpad with 6 core / 12 threads,
> where I use 'make -j 12' normally.
>
> First as a benchmark, lets see a full compile of whole of QEMU (with
> GCC) on Fedora 36 x86_64
>
>     => 14 minutes
>
>
> I find this waaaaay too slow though, so I typically configure QEMU with
> --target-list=x86_64-softmmu since that suffices 90% of the time.
>
>    => 2 minutes
>
>
> A 'make check' on this x86_64-only build takes another 2 minutes.
>
>
> Now, a static analysis baseline across the whole tree with default
> tests enabled
>
>  $ clang-tidy --quiet -p build $(git ls-tree -r --name-only HEAD: | grep '\.c$')
>
>   => 45 minutes
>
> wow, wasn't expecting it to be that slow !
>
> Lets restrict to just the block/ dir
>
>  $ clang-tidy --quiet -p build $(find block -name '*.c')
>
>   => 4 minutes
>
> And further restrict to just 1 simple check
>
>  $ clang-tidy --quiet   --checks='-*,clang-analyzer-cplusplus.Move'  -p build $(find block -name '*.c')
>   => 2 minutes 30
>
>
> So extrapolated just that single check would probably work out
> at 30 mins for the whole tree.
>
> Overall this isn't cheap, and in the same order of magnitude
> as a full compile. I guess this shouldn't be that surprising
> really.
>
>
>
> > Single-threaded static-analyzer.py without any checks:
> >
> >     $ time ./static-analyzer.py build block -j 1
> >     Analyzed 79 translation units in 16.0 seconds.
> >
> >     real    0m16.665s
> >     user    0m15.967s
> >     sys     0m0.604s
> >
> > And with just the 'return-value-never-used' check enabled for a
> > somewhat fairer comparison:
> >
> >     $ time ./static-analyzer.py build block -j 1 \
> >         -c return-value-never-used
> >     Analyzed 79 translation units in 61.5 seconds.
> >
> >     real    1m2.080s
> >     user    1m1.372s
> >     sys     0m0.513s
> >
> > Which is good news!

(Well, good news for the Python libclang approach vs others like
clang-tidy plugins; bad news in absolute terms.)

>
> On my machine, a whole tree analysis allowing parallel execution
> (iiuc no -j arg means use all cores):
>
>   ./static-analyzer.py build  $(git ls-tree -r --name-only HEAD: | grep '\.c$'
>
>    => 13 minutes
>
> Or just the block layer
>
>   ./static-analyzer.py build  $(find block -name '*.c')
>
>    => 45 seconds
>
>
> So your script is faster than clang-tidy, which suggests python probably
> isn't the major dominating factor in speed, at least at this point in
> time.
>
>
> Still, a full tree analysis time of 13 minutes, compared to  my normal
> 'make' time of 2 minutes is an order of magnitude.

There goes my 10% overhead target...

>
>
> One thing that I noticed when doing this is that we can only really
> do static analysis of files that we can actually compile on the host.
> IOW, if on a Linux host, we don't get static analysis of code that
> is targetted at FreeBSD / Windows platforms. Obvious really, since
> libclang has to do a full parse and this will lack header files
> for those platforms. That's just the tradeoff you have to accept
> when using a compiler for static analysis, vs a tool that does
> "dumb" string based regex matching to detect mistakes. Both kinds
> of tools likely have their place for different tasks.

Right, I don't think there's anything reasonable we can do about this
limitation.

>
>
> Overall I think a libclang based analysis tool will be useful, but
> I can't see us enabling it as a standard part of 'make check'
> given the time penalty.
>
>
> Feels like something that'll have to be opt-in to a large degree
> for regular contributors. In terms of gating CI though, it is less
> of an issue, since we massively parallelize jobs. As long as we
> have a dedicated build job just for running this static analysis
> check in isolation, and NOT as 'make check' in all existing jobs,
> it can happen in parallel with all the other build jobs, and we
> won't notice the speed.
>
> In summary, I think this approach is viable despite the speed
> penalty provided we dont wire it into 'make check' by default.

Agreed. Thanks for gathering these numbers.

Making the script use build dependency information, to avoid
re-analyzing translation units that weren't modified since the last
analysis, should make it fast enough to be usable iteratively during
development. Header precompilation could also be worth looking into.
Doing that + running a full analysis in CI should be good enough.

Alberto

>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Alberto Faria July 6, 2022, 9:56 a.m. UTC | #9

On Tue, Jul 5, 2022 at 5:13 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> FWIW, after applying this series 'make check' throws lots of failures
> and hangs for me in the block I/O tests, so something appears not quite
> correct here. I didn't bother to investigate/debug since you marked this
> as just an RFC

Thanks, it appears some coroutine_fn functions are being called from
non-coroutine context, so some call conversions from bdrv_... to
bdrv_co_... introduce problems. These changes are only intended as
examples of using the tool for the time being.

Alberto

Daniel P. Berrangé July 6, 2022, 10:15 a.m. UTC | #10

On Wed, Jul 06, 2022 at 10:54:51AM +0100, Alberto Faria wrote:
> On Tue, Jul 5, 2022 at 5:12 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Tue, Jul 05, 2022 at 12:28:55PM +0100, Alberto Faria wrote:
> > > On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > Overall I think a libclang based analysis tool will be useful, but
> > I can't see us enabling it as a standard part of 'make check'
> > given the time penalty.
> >
> >
> > Feels like something that'll have to be opt-in to a large degree
> > for regular contributors. In terms of gating CI though, it is less
> > of an issue, since we massively parallelize jobs. As long as we
> > have a dedicated build job just for running this static analysis
> > check in isolation, and NOT as 'make check' in all existing jobs,
> > it can happen in parallel with all the other build jobs, and we
> > won't notice the speed.
> >
> > In summary, I think this approach is viable despite the speed
> > penalty provided we dont wire it into 'make check' by default.
> 
> Agreed. Thanks for gathering these numbers.
> 
> Making the script use build dependency information, to avoid
> re-analyzing translation units that weren't modified since the last
> analysis, should make it fast enough to be usable iteratively during
> development. Header precompilation could also be worth looking into.
> Doing that + running a full analysis in CI should be good enough.

For clang-tidy, I've been trying it out integrated into emacs
via eglot and clangd. This means I get clang-tidy errors reported
interactively as I write code, so wouldn't need to run a full
tree analysis. Unfortunately, unless I'm missing something, there's
no way to extend clangd to plugin extra checks.  So it would need
to re-implement something equivalent to clangd for our custom checks,
and then integrate that into eglot (or equiv for other editors).


With regards,
Daniel

Alberto Faria July 8, 2022, 5:18 p.m. UTC | #11

On Wed, Jul 6, 2022 at 11:15 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> For clang-tidy, I've been trying it out integrated into emacs
> via eglot and clangd. This means I get clang-tidy errors reported
> interactively as I write code, so wouldn't need to run a full
> tree analysis. Unfortunately, unless I'm missing something, there's
> no way to extend clangd to plugin extra checks.  So it would need
> to re-implement something equivalent to clangd for our custom checks,
> and then integrate that into eglot (or equiv for other editors).

That would be very handy. Still, running the script on the whole tree
would be necessary to ensure that changes to headers don't break
translation units that are not open in the editor.

Alberto

[RFC,0/8] Introduce an extensible static analyzer

Message

Comments