Message ID | 20220702113331.2003820-1-afaria@redhat.com |
---|---|
Headers | show |
Series | Introduce an extensible static analyzer | expand |
On 7/2/22 13:33, Alberto Faria wrote: > The current primary motivation for this work is enforcing rules around > block layer coroutines, which is why most of the series focuses on that. > However, the static analyzer is intended to be sufficiently generic to > satisfy other present and future QEMU static analysis needs. > > This is very early work-in-progress, and a lot is missing. One notable > omission is build system integration, including keeping track of which > translation units have been modified and need re-analyzing. > > Performance is bad, but there is a lot of potential for optimization, > such as avoiding redundant AST traversals. Switching to C libclang is > also a possibility, although Python makes it easy to quickly prototype > new checks, which should encourage adoption and contributions. > > The script takes a path to the build directory, and any number of paths > to directories or files to analyze. Example run on a 12-thread laptop: Thanks, this is very useful! Unfortunately there's quite a lot of fixes to go in (including your generated_co_wrapper cleanup series and mine[1]) before this can be enabled, but I think this is the way to go to 1) ease maintainability of coroutine code 2) move towards a model where there are no mixed coroutine/non-coroutine functions. I'll review it when I'm back, since a phone screen is not the best environment to do that. :) Paolo [1] https://patchew.org/QEMU/20220509103019.215041-1-pbonzini@redhat.com/
On Sat, Jul 02, 2022 at 12:33:23PM +0100, Alberto Faria wrote: > This series introduces a static analyzer for QEMU. It consists of a > single static-analyzer.py script that relies on libclang's Python > bindings, and provides a common framework on which arbitrary static > analysis checks can be developed and run against QEMU's code base. > > Summary of the series: > > - Patch 1 adds the base static analyzer, along with a simple check > that finds static functions whose return value is never used, and > patch 2 fixes some occurrences of this. > > - Patch 3 adds a check to ensure that non-coroutine_fn functions don't > perform direct calls to coroutine_fn functions, and patch 4 fixes > some violations of this rule. > > - Patch 5 adds a check to enforce coroutine_fn restrictions on > function pointers, namely around assignment and indirect calls, and > patch 6 fixes some problems it detects. (Implementing this check > properly is complicated, since AFAICT annotation attributes cannot > be applied directly to types. This part still needs a lot of work.) > > - Patch 7 introduces a no_coroutine_fn marker for functions that > should not be called from coroutines, makes generated_co_wrapper > evaluate to no_coroutine_fn, and adds a check enforcing this rule. > Patch 8 fixes some violations that it finds. > > The current primary motivation for this work is enforcing rules around > block layer coroutines, which is why most of the series focuses on that. > However, the static analyzer is intended to be sufficiently generic to > satisfy other present and future QEMU static analysis needs. > > This is very early work-in-progress, and a lot is missing. One notable > omission is build system integration, including keeping track of which > translation units have been modified and need re-analyzing. > > Performance is bad, but there is a lot of potential for optimization, > such as avoiding redundant AST traversals. Switching to C libclang is > also a possibility, although Python makes it easy to quickly prototype > new checks, which should encourage adoption and contributions. Have you done any measurement see how much of the overhead is from the checks you implemented, vs how much is inherantly forced on us by libclang ? ie what does it look like if you just load the libclang framework and run it actross all source files, without doing any checks in python. If i run 'clang-tidy' across the entire source tree, it takes 3 minutes on my machine, but there's overhead of repeatedly starting the process in there. I think anything that actually fully parses the source files is going to have a decent sized unavoidable overhead, when run across the whole source tree. Still having a properly parsed abstract source tree is an inherantly useful thing. for certain types of static analysis check. Some things get unreliable real fast if you try to anlyse using regex matches and similar approaches that are the common alternative. So libclang is understandably appealing in this respect. > The script takes a path to the build directory, and any number of paths > to directories or files to analyze. Example run on a 12-thread laptop: > > $ time ./static-analyzer.py build block > block/commit.c:525:15: non-coroutine_fn function calls coroutine_fn > block/nbd.c:206:5: non-coroutine_fn function calls coroutine_fn > [...] > block/ssh.c:1167:13: non-coroutine_fn function calls coroutine_fn > block/nfs.c:229:27: non-coroutine_fn function calls coroutine_fn > Analyzed 79 translation units. > > real 0m45.277s > user 7m55.496s > sys 0m1.445s Ouch, that is incredibly slow :-( With regards, Daniel
On Mon, Jul 4, 2022 at 5:28 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > Have you done any measurement see how much of the overhead is from > the checks you implemented, vs how much is inherantly forced on us > by libclang ? ie what does it look like if you just load the libclang > framework and run it actross all source files, without doing any > checks in python. Running the script with all checks disabled, i.e., doing nothing after TranslationUnit.from_source(): $ time ./static-analyzer.py build [...] Analyzed 8775 translation units in 274.0 seconds. real 4m34.537s user 49m32.555s sys 1m18.731s $ time ./static-analyzer.py build block util Analyzed 162 translation units in 4.2 seconds. real 0m4.804s user 0m40.389s sys 0m1.690s This is still with 12 threads on a 12-hardware thread laptop, but scalability is near perfect. (The time reported by the script doesn't include loading and inspection of the compilation database.) So, not great. What's more, TranslationUnit.from_source() delegates all work to clang_parseTranslationUnit(), so I suspect C libclang wouldn't do much better. And with all checks enabled: $ time ./static-analyzer.py build block util [...] Analyzed 162 translation units in 86.4 seconds. real 1m26.999s user 14m51.163s sys 0m2.205s Yikes. Also not great at all, although the current implementation does many inefficient things, like redundant AST traversals. Cutting through some of clang.cindex's abstractions should also help, e.g., using libclang's visitor API properly instead of calling clang_visitChildren() for every get_children(). Perhaps we should set a target for how slow we can tolerate this thing to be, as a percentage of total build time, and determine if the libclang approach is viable. I'm thinking maybe 10%? > If i run 'clang-tidy' across the entire source tree, it takes 3 minutes > on my machine, but there's overhead of repeatedly starting the process > in there. Is that parallelized in some way? It seems strange that clang-tidy would be so much faster than libclang. > I think anything that actually fully parses the source files is going > to have a decent sized unavoidable overhead, when run across the whole > source tree. > > Still having a properly parsed abstract source tree is an inherantly > useful thing. for certain types of static analysis check. Some things > get unreliable real fast if you try to anlyse using regex matches and > similar approaches that are the common alternative. So libclang is > understandably appealing in this respect. > > > The script takes a path to the build directory, and any number of paths > > to directories or files to analyze. Example run on a 12-thread laptop: > > > > $ time ./static-analyzer.py build block > > block/commit.c:525:15: non-coroutine_fn function calls coroutine_fn > > block/nbd.c:206:5: non-coroutine_fn function calls coroutine_fn > > [...] > > block/ssh.c:1167:13: non-coroutine_fn function calls coroutine_fn > > block/nfs.c:229:27: non-coroutine_fn function calls coroutine_fn > > Analyzed 79 translation units. > > > > real 0m45.277s > > user 7m55.496s > > sys 0m1.445s > > Ouch, that is incredibly slow :-( Yup :-( Alberto
On Mon, Jul 04, 2022 at 08:30:08PM +0100, Alberto Faria wrote: > On Mon, Jul 4, 2022 at 5:28 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > Have you done any measurement see how much of the overhead is from > > the checks you implemented, vs how much is inherantly forced on us > > by libclang ? ie what does it look like if you just load the libclang > > framework and run it actross all source files, without doing any > > checks in python. > > Running the script with all checks disabled, i.e., doing nothing after > TranslationUnit.from_source(): > > $ time ./static-analyzer.py build > [...] > Analyzed 8775 translation units in 274.0 seconds. > > real 4m34.537s > user 49m32.555s > sys 1m18.731s > > $ time ./static-analyzer.py build block util > Analyzed 162 translation units in 4.2 seconds. > > real 0m4.804s > user 0m40.389s > sys 0m1.690s > > This is still with 12 threads on a 12-hardware thread laptop, but > scalability is near perfect. (The time reported by the script doesn't > include loading and inspection of the compilation database.) > > So, not great. What's more, TranslationUnit.from_source() delegates > all work to clang_parseTranslationUnit(), so I suspect C libclang > wouldn't do much better. > > And with all checks enabled: > > $ time ./static-analyzer.py build block util > [...] > Analyzed 162 translation units in 86.4 seconds. > > real 1m26.999s > user 14m51.163s > sys 0m2.205s > > Yikes. Also not great at all, although the current implementation does > many inefficient things, like redundant AST traversals. Cutting > through some of clang.cindex's abstractions should also help, e.g., > using libclang's visitor API properly instead of calling > clang_visitChildren() for every get_children(). > > Perhaps we should set a target for how slow we can tolerate this thing > to be, as a percentage of total build time, and determine if the > libclang approach is viable. I'm thinking maybe 10%? > > > If i run 'clang-tidy' across the entire source tree, it takes 3 minutes > > on my machine, but there's overhead of repeatedly starting the process > > in there. > > Is that parallelized in some way? It seems strange that clang-tidy > would be so much faster than libclang. No, that was me doing a dumb for i in `git ls-tree --name-only -r HEAD:` do clang-tidy $i 1>/dev/null 2>&1 done so in fact it was parsing all source files, not just .c files (and likely whining about non-C files. This was on my laptop with 6 cores / 2 SMT With regards, Daniel
On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > for i in `git ls-tree --name-only -r HEAD:` > do > clang-tidy $i 1>/dev/null 2>&1 > done All of those invocations are probably failing quickly due to missing includes and other problems, since the location of the compilation database and some other arguments haven't been specified. Accounting for those problems (and enabling just one random C++ check): $ time clang-tidy -p build \ --extra-arg-before=-Wno-unknown-warning-option \ --extra-arg='-isystem [...]' \ --checks='-*,clang-analyzer-cplusplus.Move' \ $( find block -name '*.c' ) [...] real 3m0.260s user 2m58.041s sys 0m1.467s Single-threaded static-analyzer.py without any checks: $ time ./static-analyzer.py build block -j 1 Analyzed 79 translation units in 16.0 seconds. real 0m16.665s user 0m15.967s sys 0m0.604s And with just the 'return-value-never-used' check enabled for a somewhat fairer comparison: $ time ./static-analyzer.py build block -j 1 \ -c return-value-never-used Analyzed 79 translation units in 61.5 seconds. real 1m2.080s user 1m1.372s sys 0m0.513s Which is good news! Alberto
On Tue, Jul 05, 2022 at 12:28:55PM +0100, Alberto Faria wrote: > On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > > for i in `git ls-tree --name-only -r HEAD:` > > do > > clang-tidy $i 1>/dev/null 2>&1 > > done > > All of those invocations are probably failing quickly due to missing > includes and other problems, since the location of the compilation > database and some other arguments haven't been specified. Opps yes, I was waaaay too minimalist in testing that. > > Accounting for those problems (and enabling just one random C++ check): > > $ time clang-tidy -p build \ > --extra-arg-before=-Wno-unknown-warning-option \ > --extra-arg='-isystem [...]' \ > --checks='-*,clang-analyzer-cplusplus.Move' \ > $( find block -name '*.c' ) > [...] > > real 3m0.260s > user 2m58.041s > sys 0m1.467s Only analysing the block tree, but if we consider a static analysis framework is desirable to use for whole of qemu.git, lets see the numbers for everything. What follows was done on my P1 Gen2 thinkpad with 6 core / 12 threads, where I use 'make -j 12' normally. First as a benchmark, lets see a full compile of whole of QEMU (with GCC) on Fedora 36 x86_64 => 14 minutes I find this waaaaay too slow though, so I typically configure QEMU with --target-list=x86_64-softmmu since that suffices 90% of the time. => 2 minutes A 'make check' on this x86_64-only build takes another 2 minutes. Now, a static analysis baseline across the whole tree with default tests enabled $ clang-tidy --quiet -p build $(git ls-tree -r --name-only HEAD: | grep '\.c$') => 45 minutes wow, wasn't expecting it to be that slow ! Lets restrict to just the block/ dir $ clang-tidy --quiet -p build $(find block -name '*.c') => 4 minutes And further restrict to just 1 simple check $ clang-tidy --quiet --checks='-*,clang-analyzer-cplusplus.Move' -p build $(find block -name '*.c') => 2 minutes 30 So extrapolated just that single check would probably work out at 30 mins for the whole tree. Overall this isn't cheap, and in the same order of magnitude as a full compile. I guess this shouldn't be that surprising really. > Single-threaded static-analyzer.py without any checks: > > $ time ./static-analyzer.py build block -j 1 > Analyzed 79 translation units in 16.0 seconds. > > real 0m16.665s > user 0m15.967s > sys 0m0.604s > > And with just the 'return-value-never-used' check enabled for a > somewhat fairer comparison: > > $ time ./static-analyzer.py build block -j 1 \ > -c return-value-never-used > Analyzed 79 translation units in 61.5 seconds. > > real 1m2.080s > user 1m1.372s > sys 0m0.513s > > Which is good news! On my machine, a whole tree analysis allowing parallel execution (iiuc no -j arg means use all cores): ./static-analyzer.py build $(git ls-tree -r --name-only HEAD: | grep '\.c$' => 13 minutes Or just the block layer ./static-analyzer.py build $(find block -name '*.c') => 45 seconds So your script is faster than clang-tidy, which suggests python probably isn't the major dominating factor in speed, at least at this point in time. Still, a full tree analysis time of 13 minutes, compared to my normal 'make' time of 2 minutes is an order of magnitude. One thing that I noticed when doing this is that we can only really do static analysis of files that we can actually compile on the host. IOW, if on a Linux host, we don't get static analysis of code that is targetted at FreeBSD / Windows platforms. Obvious really, since libclang has to do a full parse and this will lack header files for those platforms. That's just the tradeoff you have to accept when using a compiler for static analysis, vs a tool that does "dumb" string based regex matching to detect mistakes. Both kinds of tools likely have their place for different tasks. Overall I think a libclang based analysis tool will be useful, but I can't see us enabling it as a standard part of 'make check' given the time penalty. Feels like something that'll have to be opt-in to a large degree for regular contributors. In terms of gating CI though, it is less of an issue, since we massively parallelize jobs. As long as we have a dedicated build job just for running this static analysis check in isolation, and NOT as 'make check' in all existing jobs, it can happen in parallel with all the other build jobs, and we won't notice the speed. In summary, I think this approach is viable despite the speed penalty provided we dont wire it into 'make check' by default. With regards, Daniel
On Sat, Jul 02, 2022 at 12:33:23PM +0100, Alberto Faria wrote: > This series introduces a static analyzer for QEMU. It consists of a > single static-analyzer.py script that relies on libclang's Python > bindings, and provides a common framework on which arbitrary static > analysis checks can be developed and run against QEMU's code base. > > Summary of the series: > > - Patch 1 adds the base static analyzer, along with a simple check > that finds static functions whose return value is never used, and > patch 2 fixes some occurrences of this. > > - Patch 3 adds a check to ensure that non-coroutine_fn functions don't > perform direct calls to coroutine_fn functions, and patch 4 fixes > some violations of this rule. > > - Patch 5 adds a check to enforce coroutine_fn restrictions on > function pointers, namely around assignment and indirect calls, and > patch 6 fixes some problems it detects. (Implementing this check > properly is complicated, since AFAICT annotation attributes cannot > be applied directly to types. This part still needs a lot of work.) > > - Patch 7 introduces a no_coroutine_fn marker for functions that > should not be called from coroutines, makes generated_co_wrapper > evaluate to no_coroutine_fn, and adds a check enforcing this rule. > Patch 8 fixes some violations that it finds. FWIW, after applying this series 'make check' throws lots of failures and hangs for me in the block I/O tests, so something appears not quite correct here. I didn't bother to investigate/debug since you marked this as just an RFC With regards, Daniel
On Tue, Jul 5, 2022 at 5:12 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > On Tue, Jul 05, 2022 at 12:28:55PM +0100, Alberto Faria wrote: > > On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > > > for i in `git ls-tree --name-only -r HEAD:` > > > do > > > clang-tidy $i 1>/dev/null 2>&1 > > > done > > > > All of those invocations are probably failing quickly due to missing > > includes and other problems, since the location of the compilation > > database and some other arguments haven't been specified. > > Opps yes, I was waaaay too minimalist in testing that. > > > > > Accounting for those problems (and enabling just one random C++ check): > > > > $ time clang-tidy -p build \ > > --extra-arg-before=-Wno-unknown-warning-option \ > > --extra-arg='-isystem [...]' \ > > --checks='-*,clang-analyzer-cplusplus.Move' \ > > $( find block -name '*.c' ) > > [...] > > > > real 3m0.260s > > user 2m58.041s > > sys 0m1.467s > > Only analysing the block tree, but if we consider a static analysis > framework is desirable to use for whole of qemu.git, lets see the > numbers for everything. > > What follows was done on my P1 Gen2 thinkpad with 6 core / 12 threads, > where I use 'make -j 12' normally. > > First as a benchmark, lets see a full compile of whole of QEMU (with > GCC) on Fedora 36 x86_64 > > => 14 minutes > > > I find this waaaaay too slow though, so I typically configure QEMU with > --target-list=x86_64-softmmu since that suffices 90% of the time. > > => 2 minutes > > > A 'make check' on this x86_64-only build takes another 2 minutes. > > > Now, a static analysis baseline across the whole tree with default > tests enabled > > $ clang-tidy --quiet -p build $(git ls-tree -r --name-only HEAD: | grep '\.c$') > > => 45 minutes > > wow, wasn't expecting it to be that slow ! > > Lets restrict to just the block/ dir > > $ clang-tidy --quiet -p build $(find block -name '*.c') > > => 4 minutes > > And further restrict to just 1 simple check > > $ clang-tidy --quiet --checks='-*,clang-analyzer-cplusplus.Move' -p build $(find block -name '*.c') > => 2 minutes 30 > > > So extrapolated just that single check would probably work out > at 30 mins for the whole tree. > > Overall this isn't cheap, and in the same order of magnitude > as a full compile. I guess this shouldn't be that surprising > really. > > > > > Single-threaded static-analyzer.py without any checks: > > > > $ time ./static-analyzer.py build block -j 1 > > Analyzed 79 translation units in 16.0 seconds. > > > > real 0m16.665s > > user 0m15.967s > > sys 0m0.604s > > > > And with just the 'return-value-never-used' check enabled for a > > somewhat fairer comparison: > > > > $ time ./static-analyzer.py build block -j 1 \ > > -c return-value-never-used > > Analyzed 79 translation units in 61.5 seconds. > > > > real 1m2.080s > > user 1m1.372s > > sys 0m0.513s > > > > Which is good news! (Well, good news for the Python libclang approach vs others like clang-tidy plugins; bad news in absolute terms.) > > On my machine, a whole tree analysis allowing parallel execution > (iiuc no -j arg means use all cores): > > ./static-analyzer.py build $(git ls-tree -r --name-only HEAD: | grep '\.c$' > > => 13 minutes > > Or just the block layer > > ./static-analyzer.py build $(find block -name '*.c') > > => 45 seconds > > > So your script is faster than clang-tidy, which suggests python probably > isn't the major dominating factor in speed, at least at this point in > time. > > > Still, a full tree analysis time of 13 minutes, compared to my normal > 'make' time of 2 minutes is an order of magnitude. There goes my 10% overhead target... > > > One thing that I noticed when doing this is that we can only really > do static analysis of files that we can actually compile on the host. > IOW, if on a Linux host, we don't get static analysis of code that > is targetted at FreeBSD / Windows platforms. Obvious really, since > libclang has to do a full parse and this will lack header files > for those platforms. That's just the tradeoff you have to accept > when using a compiler for static analysis, vs a tool that does > "dumb" string based regex matching to detect mistakes. Both kinds > of tools likely have their place for different tasks. Right, I don't think there's anything reasonable we can do about this limitation. > > > Overall I think a libclang based analysis tool will be useful, but > I can't see us enabling it as a standard part of 'make check' > given the time penalty. > > > Feels like something that'll have to be opt-in to a large degree > for regular contributors. In terms of gating CI though, it is less > of an issue, since we massively parallelize jobs. As long as we > have a dedicated build job just for running this static analysis > check in isolation, and NOT as 'make check' in all existing jobs, > it can happen in parallel with all the other build jobs, and we > won't notice the speed. > > In summary, I think this approach is viable despite the speed > penalty provided we dont wire it into 'make check' by default. Agreed. Thanks for gathering these numbers. Making the script use build dependency information, to avoid re-analyzing translation units that weren't modified since the last analysis, should make it fast enough to be usable iteratively during development. Header precompilation could also be worth looking into. Doing that + running a full analysis in CI should be good enough. Alberto > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >
On Tue, Jul 5, 2022 at 5:13 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > FWIW, after applying this series 'make check' throws lots of failures > and hangs for me in the block I/O tests, so something appears not quite > correct here. I didn't bother to investigate/debug since you marked this > as just an RFC Thanks, it appears some coroutine_fn functions are being called from non-coroutine context, so some call conversions from bdrv_... to bdrv_co_... introduce problems. These changes are only intended as examples of using the tool for the time being. Alberto
On Wed, Jul 06, 2022 at 10:54:51AM +0100, Alberto Faria wrote: > On Tue, Jul 5, 2022 at 5:12 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Tue, Jul 05, 2022 at 12:28:55PM +0100, Alberto Faria wrote: > > > On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > Overall I think a libclang based analysis tool will be useful, but > > I can't see us enabling it as a standard part of 'make check' > > given the time penalty. > > > > > > Feels like something that'll have to be opt-in to a large degree > > for regular contributors. In terms of gating CI though, it is less > > of an issue, since we massively parallelize jobs. As long as we > > have a dedicated build job just for running this static analysis > > check in isolation, and NOT as 'make check' in all existing jobs, > > it can happen in parallel with all the other build jobs, and we > > won't notice the speed. > > > > In summary, I think this approach is viable despite the speed > > penalty provided we dont wire it into 'make check' by default. > > Agreed. Thanks for gathering these numbers. > > Making the script use build dependency information, to avoid > re-analyzing translation units that weren't modified since the last > analysis, should make it fast enough to be usable iteratively during > development. Header precompilation could also be worth looking into. > Doing that + running a full analysis in CI should be good enough. For clang-tidy, I've been trying it out integrated into emacs via eglot and clangd. This means I get clang-tidy errors reported interactively as I write code, so wouldn't need to run a full tree analysis. Unfortunately, unless I'm missing something, there's no way to extend clangd to plugin extra checks. So it would need to re-implement something equivalent to clangd for our custom checks, and then integrate that into eglot (or equiv for other editors). With regards, Daniel
On Wed, Jul 6, 2022 at 11:15 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > For clang-tidy, I've been trying it out integrated into emacs > via eglot and clangd. This means I get clang-tidy errors reported > interactively as I write code, so wouldn't need to run a full > tree analysis. Unfortunately, unless I'm missing something, there's > no way to extend clangd to plugin extra checks. So it would need > to re-implement something equivalent to clangd for our custom checks, > and then integrate that into eglot (or equiv for other editors). That would be very handy. Still, running the script on the whole tree would be necessary to ensure that changes to headers don't break translation units that are not open in the editor. Alberto