Message ID | 20230503203947.3417-3-farosas@suse.de |
---|---|
State | New |
Headers | show |
Series | docs: Speedup docs build | expand |
On Wed, 3 May 2023 at 21:39, Fabiano Rosas <farosas@suse.de> wrote: > > For the documentation builds (man pages & manual), we let Sphinx > decide when to rebuild and use a depfile to know when to trigger the > make target. > > We currently use a trick of having the man pages custom_target take as > input the html pages custom_target object, which causes both targets > to be executed if one of the dependencies has changed. However, having > this at the custom_target level means that the two builds are > effectively serialized. > > We can eliminate the dependency between the targets by adding a second > depfile for the man pages build, allowing them to be parallelized by > ninja while keeping sphinx in charge of deciding when to rebuild. > > Since they can now run in parallel, separate the Sphinx cache > directory of the two builds. We need this not only for data > consistency but also because Sphinx writes builder-dependent > environment information to the cache directory (see notes under > smartquotes_excludes in sphinx docs [1]). The sphinx-build manpage disagrees about that last part. https://www.sphinx-doc.org/en/master/man/sphinx-build.html says about -d: "with this option you can select a different cache directory (the doctrees can be shared between all builders)" If we don't share the cache directory, presumably Sphinx now ends up parsing all the input files twice, once per builder, rather than being able to share them? thanks -- PMM
Peter Maydell <peter.maydell@linaro.org> writes: > On Wed, 3 May 2023 at 21:39, Fabiano Rosas <farosas@suse.de> wrote: >> >> For the documentation builds (man pages & manual), we let Sphinx >> decide when to rebuild and use a depfile to know when to trigger the >> make target. >> >> We currently use a trick of having the man pages custom_target take as >> input the html pages custom_target object, which causes both targets >> to be executed if one of the dependencies has changed. However, having >> this at the custom_target level means that the two builds are >> effectively serialized. >> >> We can eliminate the dependency between the targets by adding a second >> depfile for the man pages build, allowing them to be parallelized by >> ninja while keeping sphinx in charge of deciding when to rebuild. >> >> Since they can now run in parallel, separate the Sphinx cache >> directory of the two builds. We need this not only for data >> consistency but also because Sphinx writes builder-dependent >> environment information to the cache directory (see notes under >> smartquotes_excludes in sphinx docs [1]). > > The sphinx-build manpage disagrees about that last part. > https://www.sphinx-doc.org/en/master/man/sphinx-build.html > says about -d: > "with this option you can select a different cache directory > (the doctrees can be shared between all builders)" > The issue I had is that sphinx by default uses smart quotes for html builders, but not for man builders. But whichever builder runs first gets to set the smartquotes option and that sticks for the next builder. That causes our man pages to come up with fancy curly quotes instead of ' which is probably not an issue, but I didn't want to produce different output from what we already have today. I ended up conflating the cache directory (-d) with the environment (-E), so it is possible that we can reuse the cache but not the environment (where I assume the smartquotes option is stored). Well, I better go read the sphinx code and figure that out. > If we don't share the cache directory, presumably Sphinx > now ends up parsing all the input files twice, once per > builder, rather than being able to share them? > Yes, but having it run in parallel from the ninja level is still faster. Of course, if we could reuse the cache, this could potentially be even faster. I'll try to determine if it is really safe to do so.
On Thu, 4 May 2023 at 13:06, Fabiano Rosas <farosas@suse.de> wrote: > > Peter Maydell <peter.maydell@linaro.org> writes: > > > On Wed, 3 May 2023 at 21:39, Fabiano Rosas <farosas@suse.de> wrote: > >> Since they can now run in parallel, separate the Sphinx cache > >> directory of the two builds. We need this not only for data > >> consistency but also because Sphinx writes builder-dependent > >> environment information to the cache directory (see notes under > >> smartquotes_excludes in sphinx docs [1]). > > > > The sphinx-build manpage disagrees about that last part. > > https://www.sphinx-doc.org/en/master/man/sphinx-build.html > > says about -d: > > "with this option you can select a different cache directory > > (the doctrees can be shared between all builders)" > > > > The issue I had is that sphinx by default uses smart quotes for html > builders, but not for man builders. But whichever builder runs first > gets to set the smartquotes option and that sticks for the next > builder. That causes our man pages to come up with fancy curly quotes > instead of ' which is probably not an issue, but I didn't want to > produce different output from what we already have today. > > I ended up conflating the cache directory (-d) with the environment > (-E), so it is possible that we can reuse the cache but not the > environment (where I assume the smartquotes option is stored). Well, I > better go read the sphinx code and figure that out. > > > If we don't share the cache directory, presumably Sphinx > > now ends up parsing all the input files twice, once per > > builder, rather than being able to share them? > > > > Yes, but having it run in parallel from the ninja level is still > faster. Of course, if we could reuse the cache, this could potentially > be even faster. I'll try to determine if it is really safe to do so. Yeah, I wouldn't be surprised if we need the caches separate for concurrency reasons, so this may just be a "commit message might need tweaking" nit. -- PMM
On 5/3/23 22:39, Fabiano Rosas wrote: > For the documentation builds (man pages & manual), we let Sphinx > decide when to rebuild and use a depfile to know when to trigger the > make target. > > We currently use a trick of having the man pages custom_target take as > input the html pages custom_target object, which causes both targets > to be executed if one of the dependencies has changed. However, having > this at the custom_target level means that the two builds are > effectively serialized. > > We can eliminate the dependency between the targets by adding a second > depfile for the man pages build, allowing them to be parallelized by > ninja while keeping sphinx in charge of deciding when to rebuild. > > Since they can now run in parallel, separate the Sphinx cache > directory of the two builds. We need this not only for data > consistency but also because Sphinx writes builder-dependent > environment information to the cache directory (see notes under > smartquotes_excludes in sphinx docs [1]). > > Note that after this patch the commands `make man` and `make html` > only build the specified target. To keep the old behavior of building > both targets, use `make man html` or `make sphinxdocs`. > > 1- https://www.sphinx-doc.org/en/master/usage/configuration.html Unfortunately this breaks CentOS 8, which has an older version of ninja: ninja: error: build.ninja:16369: multiple outputs aren't (yet?) supported by depslog; bring this up on the mailing list if it affects you This was fixed in ninja 1.10.0. Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > On 5/3/23 22:39, Fabiano Rosas wrote: >> For the documentation builds (man pages & manual), we let Sphinx >> decide when to rebuild and use a depfile to know when to trigger the >> make target. >> >> We currently use a trick of having the man pages custom_target take as >> input the html pages custom_target object, which causes both targets >> to be executed if one of the dependencies has changed. However, having >> this at the custom_target level means that the two builds are >> effectively serialized. >> >> We can eliminate the dependency between the targets by adding a second >> depfile for the man pages build, allowing them to be parallelized by >> ninja while keeping sphinx in charge of deciding when to rebuild. >> >> Since they can now run in parallel, separate the Sphinx cache >> directory of the two builds. We need this not only for data >> consistency but also because Sphinx writes builder-dependent >> environment information to the cache directory (see notes under >> smartquotes_excludes in sphinx docs [1]). >> >> Note that after this patch the commands `make man` and `make html` >> only build the specified target. To keep the old behavior of building >> both targets, use `make man html` or `make sphinxdocs`. >> >> 1- https://www.sphinx-doc.org/en/master/usage/configuration.html Sorry it took me a while to get back to this, I've been caught in downstream work. > > Unfortunately this breaks CentOS 8, which has an older version of ninja: > > ninja: error: build.ninja:16369: multiple outputs aren't (yet?) > supported by depslog; bring this up on the mailing list if it affects you > > This was fixed in ninja 1.10.0. > It looks like it would be easier to just wait until all our supported build platforms reach this version. Is this CentOS 8 or CentOS Stream 8? I believe CentOS Stream 8 would drop from our support matrix at the end of this year. And CentOS 8 should have already dropped no? Due to Stream 9 being released in 2021. Unless we do not count Stream as a new version over plain CentOS. For the dates and versions, I'm looking at: https://en.wikipedia.org/wiki/CentOS https://repology.org/project/ninja/versions
diff --git a/docs/meson.build b/docs/meson.build index 6d0986579e..858e737431 100644 --- a/docs/meson.build +++ b/docs/meson.build @@ -42,7 +42,9 @@ if sphinx_build.found() endif if build_docs - SPHINX_ARGS += ['-Dversion=' + meson.project_version(), '-Drelease=' + get_option('pkgversion')] + SPHINX_ARGS += ['-Dversion=' + meson.project_version(), + '-Drelease=' + get_option('pkgversion'), + '-Ddepfile=@DEPFILE@', '-Ddepfile_stamp=@OUTPUT0@'] man_pages = { 'qemu-ga.8': (have_ga ? 'man8' : ''), @@ -61,41 +63,43 @@ if build_docs } sphinxdocs = [] - sphinxmans = [] private_dir = meson.current_build_dir() / 'manual.p' output_dir = meson.current_build_dir() / 'manual' input_dir = meson.current_source_dir() - this_manual = custom_target('QEMU manual', + manual = custom_target('QEMU manual', build_by_default: build_docs, - output: 'docs.stamp', + output: 'manual.stamp', input: files('conf.py'), - depfile: 'docs.d', - command: [SPHINX_ARGS, '-Ddepfile=@DEPFILE@', - '-Ddepfile_stamp=@OUTPUT0@', - '-b', 'html', '-d', private_dir, + depfile: 'manual.dep', + command: [SPHINX_ARGS, '-b', 'html', '-d', private_dir, input_dir, output_dir]) - sphinxdocs += this_manual + sphinxdocs += manual install_subdir(output_dir, install_dir: qemu_docdir, strip_directory: true) - these_man_pages = [] - install_dirs = [] + man_private_dir = meson.current_build_dir() / 'man.p' + # man.stamp is not installed + these_man_pages = ['man.stamp'] + install_dirs = [false] + foreach page, section : man_pages these_man_pages += page install_dirs += section == '' ? false : get_option('mandir') / section endforeach - sphinxmans += custom_target('QEMU man pages', + + man_pages = custom_target('QEMU man pages', build_by_default: build_docs, output: these_man_pages, - input: this_manual, + depfile: 'man.dep', install: build_docs, install_dir: install_dirs, - command: [SPHINX_ARGS, '-b', 'man', '-d', private_dir, + command: [SPHINX_ARGS, '-b', 'man', '-d', man_private_dir, input_dir, meson.current_build_dir()]) + sphinxdocs += man_pages alias_target('sphinxdocs', sphinxdocs) - alias_target('html', sphinxdocs) - alias_target('man', sphinxmans) + alias_target('html', manual) + alias_target('man', man_pages) endif
For the documentation builds (man pages & manual), we let Sphinx decide when to rebuild and use a depfile to know when to trigger the make target. We currently use a trick of having the man pages custom_target take as input the html pages custom_target object, which causes both targets to be executed if one of the dependencies has changed. However, having this at the custom_target level means that the two builds are effectively serialized. We can eliminate the dependency between the targets by adding a second depfile for the man pages build, allowing them to be parallelized by ninja while keeping sphinx in charge of deciding when to rebuild. Since they can now run in parallel, separate the Sphinx cache directory of the two builds. We need this not only for data consistency but also because Sphinx writes builder-dependent environment information to the cache directory (see notes under smartquotes_excludes in sphinx docs [1]). Note that after this patch the commands `make man` and `make html` only build the specified target. To keep the old behavior of building both targets, use `make man html` or `make sphinxdocs`. 1- https://www.sphinx-doc.org/en/master/usage/configuration.html Signed-off-by: Fabiano Rosas <farosas@suse.de> --- docs/meson.build | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-)