Message ID | 20220526002045.1872855-2-sam@gentoo.org |
---|---|
State | New |
Headers | show |
Series | [1/2] nss: add assert to DB_LOOKUP_FCT (BZ #28752) | expand |
> On 26 May 2022, at 01:20, Sam James <sam@gentoo.org> wrote: > > Skip the chroot test if the database isn't loaded > correctly (because the chroot test uses some > existing DB state). It looks like there's a test failure with the trybot. I'll look into it.
On 25/05/2022 21:20, Sam James via Libc-alpha wrote: > Skip the chroot test if the database isn't loaded > correctly (because the chroot test uses some > existing DB state). > > The __stat64_time64 -> fstatat call can fail if > running under an (aggressive) seccomp filter, > like Firefox seems to use. > > This manifested in a crash when using glib built > with FAM support with such a Firefox build. > > Suggested-by: DJ Delorie <dj@redhat.com> > Signed-off-by: Sam James <sam@gentoo.org> > --- > nss/nss_database.c | 33 ++++++++++++++++++--------------- > 1 file changed, 18 insertions(+), 15 deletions(-) > > diff --git a/nss/nss_database.c b/nss/nss_database.c > index d56c5b798d..6c9b440a98 100644 > --- a/nss/nss_database.c > +++ b/nss/nss_database.c > @@ -420,21 +420,24 @@ nss_database_check_reload_and_get (struct nss_database_state *local, > return true; > } > > - /* Before we reload, verify that "/" hasn't changed. We assume that > - errors here are very unlikely, but the chance that we're entering > - a container is also very unlikely, so we err on the side of both > - very unlikely things not happening at the same time. */ > - if (__stat64_time64 ("/", &str) != 0 > - || (local->root_ino != 0 > - && (str.st_ino != local->root_ino > - || str.st_dev != local->root_dev))) > - { > - /* Change detected; disable reloading and return current state. */ > - atomic_store_release (&local->data.reload_disabled, 1); > - *result = local->data.services[database_index]; > - __libc_lock_unlock (local->lock); > - return true; > - } > + if (local->data.services[database_index] != NULL) { > + /* Before we reload, verify that "/" hasn't changed. We assume that > + errors here are very unlikely, but the chance that we're entering > + a container is also very unlikely, so we err on the side of both > + very unlikely things not happening at the same time. */ > + if (__stat64_time64 ("/", &str) != 0 > + || (local->root_ino != 0 > + && (str.st_ino != local->root_ino > + || str.st_dev != local->root_dev))) > + { > + /* Change detected; disable reloading and return current state. */ > + atomic_store_release (&local->data.reload_disabled, 1); > + *result = local->data.services[database_index]; > + __libc_lock_unlock (local->lock); > + return true; > + } > + } > + > local->root_ino = str.st_ino; > local->root_dev = str.st_dev; > __libc_lock_unlock (local->lock); Besides the buildbot issue [1] you already noted, please use the expected GNU indentation. We have now a clang-format script (.clang-format) to help the format. [1] https://www.delorie.com/trybots/32bit/9696/nss-tst-reload1.out
Sam James <sam@gentoo.org> writes: >> On 26 May 2022, at 01:20, Sam James <sam@gentoo.org> wrote: >> >> Skip the chroot test if the database isn't loaded >> correctly (because the chroot test uses some >> existing DB state). > > It looks like there's a test failure with the trybot. I'll look into it. I think we need to move the local->* settings inside the new block as otherwise the str.* we're referencing are undefined. diff --git a/nss/nss_database.c b/nss/nss_database.c index d56c5b798d..72883d9b42 100644 --- a/nss/nss_database.c +++ b/nss/nss_database.c @@ -420,23 +420,26 @@ nss_database_check_reload_and_get (struct nss_database_state *local, return true; } - /* Before we reload, verify that "/" hasn't changed. We assume that - errors here are very unlikely, but the chance that we're entering - a container is also very unlikely, so we err on the side of both - very unlikely things not happening at the same time. */ - if (__stat64_time64 ("/", &str) != 0 - || (local->root_ino != 0 - && (str.st_ino != local->root_ino - || str.st_dev != local->root_dev))) - { - /* Change detected; disable reloading and return current state. */ - atomic_store_release (&local->data.reload_disabled, 1); - *result = local->data.services[database_index]; - __libc_lock_unlock (local->lock); - return true; - } - local->root_ino = str.st_ino; - local->root_dev = str.st_dev; + if (local->data.services[database_index] != NULL) { + /* Before we reload, verify that "/" hasn't changed. We assume that + errors here are very unlikely, but the chance that we're entering + a container is also very unlikely, so we err on the side of both + very unlikely things not happening at the same time. */ + if (__stat64_time64 ("/", &str) != 0 + || (local->root_ino != 0 + && (str.st_ino != local->root_ino + || str.st_dev != local->root_dev))) + { + /* Change detected; disable reloading and return current state. */ + atomic_store_release (&local->data.reload_disabled, 1); + *result = local->data.services[database_index]; + __libc_lock_unlock (local->lock); + return true; + } + local->root_ino = str.st_ino; + local->root_dev = str.st_dev; + } + __libc_lock_unlock (local->lock); /* Avoid overwriting the global configuration until we have loaded
> On 4 Jun 2022, at 04:26, DJ Delorie via Libc-alpha <libc-alpha@sourceware.org> wrote: > > Sam James <sam@gentoo.org> writes: >>> On 26 May 2022, at 01:20, Sam James <sam@gentoo.org> wrote: >>> >>> Skip the chroot test if the database isn't loaded >>> correctly (because the chroot test uses some >>> existing DB state). >> >> It looks like there's a test failure with the trybot. I'll look into it. > > I think we need to move the local->* settings inside the new block as > otherwise the str.* we're referencing are undefined. Duh. Thanks.
> On 4 Jun 2022, at 04:26, DJ Delorie <dj@redhat.com> wrote: > > Sam James <sam@gentoo.org> writes: >>> On 26 May 2022, at 01:20, Sam James <sam@gentoo.org> wrote: >>> >>> Skip the chroot test if the database isn't loaded >>> correctly (because the chroot test uses some >>> existing DB state). >> >> It looks like there's a test failure with the trybot. I'll look into it. > > I think we need to move the local->* settings inside the new block as > otherwise the str.* we're referencing are undefined. > Huh, it still fails, but makes a bit more sense. It's upset because now we apparently we return a result despite not being started up: ``` FAIL: nss/tst-reload2 original exit status 1 error: tst-reload2.c:132: not true: pw->pw_uid != 2468 tst-reload2.c:138: numeric comparison failure left: 5 (0x5); from: pw->pw_uid right: 1234 (0x4d2); from: 1234 error: tst-reload2.c:140: not true: gr != NULL error: tst-reload2.c:148: not true: he != NULL error: 4 test failures ``` If local->data.services[database_index] is null, we go ahead and reload anyway even if we're in a chroot, because we assume that it's unlikely to have both cases. But it's null .. even if we are just entering a chroot, but then we skip the test? Are we back to always wanting to do the chroot test, because otherwise we're going to accidentally end up loading from the chroot again? i.e. if it's null, we never change local->root_{into,dev}, so they continue to point into the chroot. But I can't always do a statx call because then we're back at square 1. And now I've talked myself into a corner and I'm confused :)
Sam James <sam@gentoo.org> writes: > Huh, it still fails, but makes a bit more sense. It's upset because > now we apparently we return a result despite not being started > up: So back to basics... This routine is called when we have a real getXbyY() call and need a config. Our code happens after we've checked for nsswitch.conf changing. action: FALSE = error returned to user action: TRUE = old configuration re-used action: CONT = continue to load new configuration services[db] inode.changed action NULL NO CONT - we need some config NULL YES CONT - we need some config non-null NO CONT - we need to reload non-null YES TRUE - use old config So I think the patch is correct. I'm seeing tst-reload2 fail also (sigh, only tested tst-reload1) It looks like we need to initialize local->root_* even if [services] is NULL. I.e. we need to run that stat always, despite deferring its check. This is a little messier but I think the logic is all in the right order. We might need to move the local-> if{} to before the NULL check, although we must load the config at least once so it should be OK. diff --git a/nss/nss_database.c b/nss/nss_database.c index d56c5b798d..f2ed2f2c25 100644 --- a/nss/nss_database.c +++ b/nss/nss_database.c @@ -420,23 +420,32 @@ nss_database_check_reload_and_get (struct nss_database_state *local, return true; } - /* Before we reload, verify that "/" hasn't changed. We assume that - errors here are very unlikely, but the chance that we're entering - a container is also very unlikely, so we err on the side of both - very unlikely things not happening at the same time. */ - if (__stat64_time64 ("/", &str) != 0 - || (local->root_ino != 0 - && (str.st_ino != local->root_ino - || str.st_dev != local->root_dev))) + int stat_rv = __stat64_time64 ("/", &str); + + if (local->data.services[database_index] != NULL) { - /* Change detected; disable reloading and return current state. */ - atomic_store_release (&local->data.reload_disabled, 1); - *result = local->data.services[database_index]; - __libc_lock_unlock (local->lock); - return true; + /* Before we reload, verify that "/" hasn't changed. We assume that + errors here are very unlikely, but the chance that we're entering + a container is also very unlikely, so we err on the side of both + very unlikely things not happening at the same time. */ + if (stat_rv != 0 + || (local->root_ino != 0 + && (str.st_ino != local->root_ino + || str.st_dev != local->root_dev))) + { + /* Change detected; disable reloading and return current state. */ + atomic_store_release (&local->data.reload_disabled, 1); + *result = local->data.services[database_index]; + __libc_lock_unlock (local->lock); + return true; + } + } + if (stat_rv == 0) + { + local->root_ino = str.st_ino; + local->root_dev = str.st_dev; } - local->root_ino = str.st_ino; - local->root_dev = str.st_dev; + __libc_lock_unlock (local->lock); /* Avoid overwriting the global configuration until we have loaded
> On 4 Jun 2022, at 21:51, DJ Delorie <dj@redhat.com> wrote: > > Sam James <sam@gentoo.org> writes: >> Huh, it still fails, but makes a bit more sense. It's upset because >> now we apparently we return a result despite not being started >> up: > > So back to basics... > > This routine is called when we have a real getXbyY() call and need > a config. > > Our code happens after we've checked for nsswitch.conf changing. > > action: FALSE = error returned to user > action: TRUE = old configuration re-used > action: CONT = continue to load new configuration > > services[db] inode.changed action > > NULL NO CONT - we need some config > NULL YES CONT - we need some config > non-null NO CONT - we need to reload > non-null YES TRUE - use old config > > So I think the patch is correct. I'm seeing tst-reload2 fail also > (sigh, only tested tst-reload1) > > It looks like we need to initialize local->root_* even if [services] is > NULL. I.e. we need to run that stat always, despite deferring its > check. > > This is a little messier but I think the logic is all in the right > order. We might need to move the local-> if{} to before the NULL check, > although we must load the config at least once so it should be OK. > I'm glad I nearly got there -- I was worried about making the stat call at all, but the issue is referencing the result, of course, if it's not safe to do so. Let me give this a whack! > diff --git a/nss/nss_database.c b/nss/nss_database.c > index d56c5b798d..f2ed2f2c25 100644 > --- a/nss/nss_database.c > +++ b/nss/nss_database.c > @@ -420,23 +420,32 @@ nss_database_check_reload_and_get (struct nss_database_state *local, > return true; > } > > - /* Before we reload, verify that "/" hasn't changed. We assume that > - errors here are very unlikely, but the chance that we're entering > - a container is also very unlikely, so we err on the side of both > - very unlikely things not happening at the same time. */ > - if (__stat64_time64 ("/", &str) != 0 > - || (local->root_ino != 0 > - && (str.st_ino != local->root_ino > - || str.st_dev != local->root_dev))) > + int stat_rv = __stat64_time64 ("/", &str); > + > + if (local->data.services[database_index] != NULL) > { > - /* Change detected; disable reloading and return current state. */ > - atomic_store_release (&local->data.reload_disabled, 1); > - *result = local->data.services[database_index]; > - __libc_lock_unlock (local->lock); > - return true; > + /* Before we reload, verify that "/" hasn't changed. We assume that > + errors here are very unlikely, but the chance that we're entering > + a container is also very unlikely, so we err on the side of both > + very unlikely things not happening at the same time. */ > + if (stat_rv != 0 > + || (local->root_ino != 0 > + && (str.st_ino != local->root_ino > + || str.st_dev != local->root_dev))) > + { > + /* Change detected; disable reloading and return current state. */ > + atomic_store_release (&local->data.reload_disabled, 1); > + *result = local->data.services[database_index]; > + __libc_lock_unlock (local->lock); > + return true; > + } > + } > + if (stat_rv == 0) > + { > + local->root_ino = str.st_ino; > + local->root_dev = str.st_dev; > } > - local->root_ino = str.st_ino; > - local->root_dev = str.st_dev; > + > __libc_lock_unlock (local->lock); > > /* Avoid overwriting the global configuration until we have loaded > > Best, sam
> On 4 Jun 2022, at 22:29, Sam James via Libc-alpha <libc-alpha@sourceware.org> wrote: > > > >> On 4 Jun 2022, at 21:51, DJ Delorie <dj@redhat.com> wrote: >> >> Sam James <sam@gentoo.org> writes: >>> Huh, it still fails, but makes a bit more sense. It's upset because >>> now we apparently we return a result despite not being started >>> up: >> >> So back to basics... >> >> This routine is called when we have a real getXbyY() call and need >> a config. >> >> Our code happens after we've checked for nsswitch.conf changing. >> >> action: FALSE = error returned to user >> action: TRUE = old configuration re-used >> action: CONT = continue to load new configuration >> >> services[db] inode.changed action >> >> NULL NO CONT - we need some config >> NULL YES CONT - we need some config >> non-null NO CONT - we need to reload >> non-null YES TRUE - use old config >> >> So I think the patch is correct. I'm seeing tst-reload2 fail also >> (sigh, only tested tst-reload1) >> >> It looks like we need to initialize local->root_* even if [services] is >> NULL. I.e. we need to run that stat always, despite deferring its >> check. >> >> This is a little messier but I think the logic is all in the right >> order. We might need to move the local-> if{} to before the NULL check, >> although we must load the config at least once so it should be OK. >> > > I'm glad I nearly got there -- I was worried about making the stat > call at all, but the issue is referencing the result, of course, if > it's not safe to do so. > > Let me give this a whack! >> Okay, it still fails (I put the check before and after, no difference, although I think after makes more sense), but bear with me: ``` FAIL: nss/tst-reload2 original exit status 1 error: tst-reload2.c:132: not true: pw->pw_uid != 2468 tst-reload2.c:138: numeric comparison failure left: 5 (0x5); from: pw->pw_uid right: 1234 (0x4d2); from: 1234 error: tst-reload2.c:140: not true: gr != NULL error: tst-reload2.c:148: not true: he != NULL error: 4 test failures make[1]: Leaving directory '/home/sam/git/glibc' ``` Are we sure tst-reload2 is correct? At line 140, shouldn't it be null (I mean, it is when the test fails, but isn't that okay?), because we don't want to load the inner config? Especially given on line 142, we want the getgrnam() call to return null. And Group ID 5 is only defined in group_table_data2. Now, where I'm less sure is the failure on line 148 (gethostbyname). test2 is in subdir/etc/nsswitch.conf and nothing about test2 is available in the outer configuration. *BUT* the comment above it says "... can still load the files DSO.", so I'm less confident I'm understanding that one. > > Best, > sam
> On 4 Jun 2022, at 23:11, Sam James <sam@gentoo.org> wrote: > > > >> On 4 Jun 2022, at 22:29, Sam James via Libc-alpha <libc-alpha@sourceware.org> wrote: >> >> >> >>> On 4 Jun 2022, at 21:51, DJ Delorie <dj@redhat.com> wrote: >>> >>> Sam James <sam@gentoo.org> writes: >>>> Huh, it still fails, but makes a bit more sense. It's upset because >>>> now we apparently we return a result despite not being started >>>> up: >>> >>> So back to basics... >>> >>> This routine is called when we have a real getXbyY() call and need >>> a config. >>> >>> Our code happens after we've checked for nsswitch.conf changing. >>> >>> action: FALSE = error returned to user >>> action: TRUE = old configuration re-used >>> action: CONT = continue to load new configuration >>> >>> services[db] inode.changed action >>> >>> NULL NO CONT - we need some config >>> NULL YES CONT - we need some config >>> non-null NO CONT - we need to reload >>> non-null YES TRUE - use old config >>> >>> So I think the patch is correct. I'm seeing tst-reload2 fail also >>> (sigh, only tested tst-reload1) >>> >>> It looks like we need to initialize local->root_* even if [services] is >>> NULL. I.e. we need to run that stat always, despite deferring its >>> check. >>> >>> This is a little messier but I think the logic is all in the right >>> order. We might need to move the local-> if{} to before the NULL check, >>> although we must load the config at least once so it should be OK. >>> >> >> I'm glad I nearly got there -- I was worried about making the stat >> call at all, but the issue is referencing the result, of course, if >> it's not safe to do so. >> >> Let me give this a whack! >>> > > Okay, it still fails (I put the check before and after, no difference, although I think after makes more sense), but bear with me: > ``` > FAIL: nss/tst-reload2 > original exit status 1 > error: tst-reload2.c:132: not true: pw->pw_uid != 2468 > tst-reload2.c:138: numeric comparison failure > left: 5 (0x5); from: pw->pw_uid > right: 1234 (0x4d2); from: 1234 > error: tst-reload2.c:140: not true: gr != NULL > error: tst-reload2.c:148: not true: he != NULL > error: 4 test failures > make[1]: Leaving directory '/home/sam/git/glibc' > ``` > > Are we sure tst-reload2 is correct? > > At line 140, shouldn't it be null (I mean, it is when the test fails, but isn't that okay?), because we don't want to load the inner config? > > Especially given on line 142, we want the getgrnam() call to return null. And Group ID 5 is only defined in group_table_data2. > > Now, where I'm less sure is the failure on line 148 (gethostbyname). test2 is in subdir/etc/nsswitch.conf and nothing about test2 is available in the outer configuration. *BUT* the comment above it says "... can still load the files DSO.", so I'm less confident I'm understanding that one. > (This doesn't address all of the failures, so it might be I'm just misunderstanding, but line 140 vs line 142 definitely seems odd?) >> >> Best, >> sam
Weird, it works for me, both reload1 and reload2 inside and outside of the trybot's build container. Note: the testroot is NOT automatically updated each time you type "make", because that's a lot of overhead that's rarely needed. If you modify the sources and want to see the results in containerized tests: $ rm -rf $BUILD/testroot.* $ make -j $BUILD/testroot.pristine/install.stamp
> On 5 Jun 2022, at 01:55, DJ Delorie <dj@redhat.com> wrote: > > > Weird, it works for me, both reload1 and reload2 inside and outside of > the trybot's build container. > > Note: the testroot is NOT automatically updated each time you type > "make", because that's a lot of overhead that's rarely needed. If you > modify the sources and want to see the results in containerized tests: > > $ rm -rf $BUILD/testroot.* > $ make -j $BUILD/testroot.pristine/install.stamp > Ah, thanks, that did it! I hadn't realised about the testroot part. (Then got bit by https://sourceware.org/bugzilla/show_bug.cgi?id=25836). Passing! \o/
On Jun 04 2022, DJ Delorie via Libc-alpha wrote: > $ rm -rf $BUILD/testroot.* > $ make -j $BUILD/testroot.pristine/install.stamp There should be a makefile target to do that.
Andreas Schwab <schwab@linux-m68k.org> writes: >> $ rm -rf $BUILD/testroot.* >> $ make -j $BUILD/testroot.pristine/install.stamp > > There should be a makefile target to do that. I'm not opposed, but: 1. Bikeshed a clever name, and 2. How would anyone learn of this new option when they need it?
diff --git a/nss/nss_database.c b/nss/nss_database.c index d56c5b798d..6c9b440a98 100644 --- a/nss/nss_database.c +++ b/nss/nss_database.c @@ -420,21 +420,24 @@ nss_database_check_reload_and_get (struct nss_database_state *local, return true; } - /* Before we reload, verify that "/" hasn't changed. We assume that - errors here are very unlikely, but the chance that we're entering - a container is also very unlikely, so we err on the side of both - very unlikely things not happening at the same time. */ - if (__stat64_time64 ("/", &str) != 0 - || (local->root_ino != 0 - && (str.st_ino != local->root_ino - || str.st_dev != local->root_dev))) - { - /* Change detected; disable reloading and return current state. */ - atomic_store_release (&local->data.reload_disabled, 1); - *result = local->data.services[database_index]; - __libc_lock_unlock (local->lock); - return true; - } + if (local->data.services[database_index] != NULL) { + /* Before we reload, verify that "/" hasn't changed. We assume that + errors here are very unlikely, but the chance that we're entering + a container is also very unlikely, so we err on the side of both + very unlikely things not happening at the same time. */ + if (__stat64_time64 ("/", &str) != 0 + || (local->root_ino != 0 + && (str.st_ino != local->root_ino + || str.st_dev != local->root_dev))) + { + /* Change detected; disable reloading and return current state. */ + atomic_store_release (&local->data.reload_disabled, 1); + *result = local->data.services[database_index]; + __libc_lock_unlock (local->lock); + return true; + } + } + local->root_ino = str.st_ino; local->root_dev = str.st_dev; __libc_lock_unlock (local->lock);
Skip the chroot test if the database isn't loaded correctly (because the chroot test uses some existing DB state). The __stat64_time64 -> fstatat call can fail if running under an (aggressive) seccomp filter, like Firefox seems to use. This manifested in a crash when using glib built with FAM support with such a Firefox build. Suggested-by: DJ Delorie <dj@redhat.com> Signed-off-by: Sam James <sam@gentoo.org> --- nss/nss_database.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-)