Message ID | 20240807194812.819412-1-peterx@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | mm/mprotect: Fix dax puds | expand |
On Wed, 7 Aug 2024 15:48:04 -0400 Peter Xu <peterx@redhat.com> wrote: > > Dax supports pud pages for a while, but mprotect on puds was missing since > the start. This series tries to fix that by providing pud handling in > mprotect(). The goal is to add more types of pud mappings like hugetlb or > pfnmaps. This series paves way for it by fixing known pud entries. > > Considering nobody reported this until when I looked at those other types > of pud mappings, I am thinking maybe it doesn't need to be a fix for stable > and this may not need to be backported. I would guess whoever cares about > mprotect() won't care 1G dax puds yet, vice versa. I hope fixing that in > new kernels would be fine, but I'm open to suggestions. Yes, I'm not sure this is a "fix" at all. We're implementing something which previously wasn't there. Perhaps the entire series should be called "mm: implement mprotect() for DAX PUDs"?
On Wed, 7 Aug 2024 15:48:04 -0400 Peter Xu <peterx@redhat.com> wrote: > > Tests > ===== > > What I did test: > > - cross-build tests that I normally cover [1] > > - smoke tested on x86_64 the simplest program [2] on dev_dax 1G PUD > mprotect() using QEMU's nvdimm emulations [3] and ndctl to create > namespaces with proper alignments, which used to throw "bad pud" but now > it'll run through all fine. I checked sigbus happens if with illegal > access on protected puds. > > - vmtests. > > What I didn't test: > > - fsdax: I wanted to also give it a shot, but only until then I noticed it > doesn't seem to be supported (according to dax_iomap_fault(), which will > always fallback on PUD_ORDER). I did remember it was supported before, I > could miss something important there.. please shoot if so. OK. Who are you addressing this question to? > - userfault wp-async: I also wanted to test userfault-wp async be able to > split huge puds (here it's simply a clear_pud.. though), but it won't > work for devdax anyway due to not allowed to do smaller than 1G faults in > this case. So skip too. Sounds OK. So that's an additional project if anyone cares enough? > - Power, as no hardware on hand. Hopefully the powerpc people can help with that. What tests do you ask that they run?
On Wed, Aug 07, 2024 at 02:17:03PM -0700, Andrew Morton wrote: > On Wed, 7 Aug 2024 15:48:04 -0400 Peter Xu <peterx@redhat.com> wrote: > > > > > Dax supports pud pages for a while, but mprotect on puds was missing since > > the start. This series tries to fix that by providing pud handling in > > mprotect(). The goal is to add more types of pud mappings like hugetlb or > > pfnmaps. This series paves way for it by fixing known pud entries. > > > > Considering nobody reported this until when I looked at those other types > > of pud mappings, I am thinking maybe it doesn't need to be a fix for stable > > and this may not need to be backported. I would guess whoever cares about > > mprotect() won't care 1G dax puds yet, vice versa. I hope fixing that in > > new kernels would be fine, but I'm open to suggestions. > > Yes, I'm not sure this is a "fix" at all. We're implementing something > which previously wasn't there. Perhaps the entire series should be > called "mm: implement mprotect() for DAX PUDs"? The problem is mprotect() will skip the dax 1G PUD while it shouldn't; meanwhile it'll dump some bad PUD in dmesg. Both of them look like (corner case) bugs to me.. where: - skipping the 1G pud means mprotect() will succeed even if the pud won't be updated with the correct permission specified. Logically that can cause e.g. in mprotect(RO) then write the page can cause data corrupt, as the pud page will still be writable. - the bad pud will generate a pr_err() into dmesg, with no limit so far I can see. So I think it means an userspace can DoS the kernel log if it wants.. simply by creating the PUD and keep mprotect-ing it But yeah this series fixes this "bug" by implementing that part.. Thanks,
On Wed, 7 Aug 2024 17:34:10 -0400 Peter Xu <peterx@redhat.com> wrote: > The problem is mprotect() will skip the dax 1G PUD while it shouldn't; > meanwhile it'll dump some bad PUD in dmesg. Both of them look like (corner > case) bugs to me.. where: > > - skipping the 1G pud means mprotect() will succeed even if the pud won't > be updated with the correct permission specified. Logically that can > cause e.g. in mprotect(RO) then write the page can cause data corrupt, > as the pud page will still be writable. > > - the bad pud will generate a pr_err() into dmesg, with no limit so far I > can see. So I think it means an userspace can DoS the kernel log if it > wants.. simply by creating the PUD and keep mprotect-ing it > I edited this important info into the [0/n] text, thanks. So current kernels can be made to spew into the kernel logs? That's considered serious. Can unprivileged userspace code do this?
On Wed, Aug 07, 2024 at 02:23:16PM -0700, Andrew Morton wrote: > On Wed, 7 Aug 2024 15:48:04 -0400 Peter Xu <peterx@redhat.com> wrote: > > > > > Tests > > ===== > > > > What I did test: > > > > - cross-build tests that I normally cover [1] > > > > - smoke tested on x86_64 the simplest program [2] on dev_dax 1G PUD > > mprotect() using QEMU's nvdimm emulations [3] and ndctl to create > > namespaces with proper alignments, which used to throw "bad pud" but now > > it'll run through all fine. I checked sigbus happens if with illegal > > access on protected puds. > > > > - vmtests. > > > > What I didn't test: > > > > - fsdax: I wanted to also give it a shot, but only until then I noticed it > > doesn't seem to be supported (according to dax_iomap_fault(), which will > > always fallback on PUD_ORDER). I did remember it was supported before, I > > could miss something important there.. please shoot if so. > > OK. Who are you addressing this question to? Anyone who is familiar with fsdax + 1g. Maybe Matthew would be the most suitable, but I didn't track further on fsdax. > > > - userfault wp-async: I also wanted to test userfault-wp async be able to > > split huge puds (here it's simply a clear_pud.. though), but it won't > > work for devdax anyway due to not allowed to do smaller than 1G faults in > > this case. So skip too. > > Sounds OK. So that's an additional project if anyone cares enough? Right. > > > - Power, as no hardware on hand. > > Hopefully the powerpc people can help with that. What tests do you ask > that they run? The test program [2] in cover letter should work as a very basic test; one needs to setup the dax device to use 1g mapping first, though: [2] https://github.com/xzpeter/clibs/blob/master/misc/dax.c At least per my experience not much fancy things we can do there, e.g., I think at least dev_dax has a limitation on vma split that it must be 1g aligned when use 1g mappings, so even split can't happen (as iirc I used to try some random mprotect on smaller ranges).. Thanks,
On Wed, Aug 07, 2024 at 02:44:54PM -0700, Andrew Morton wrote: > On Wed, 7 Aug 2024 17:34:10 -0400 Peter Xu <peterx@redhat.com> wrote: > > > The problem is mprotect() will skip the dax 1G PUD while it shouldn't; > > meanwhile it'll dump some bad PUD in dmesg. Both of them look like (corner > > case) bugs to me.. where: > > > > - skipping the 1G pud means mprotect() will succeed even if the pud won't > > be updated with the correct permission specified. Logically that can > > cause e.g. in mprotect(RO) then write the page can cause data corrupt, > > as the pud page will still be writable. > > > > - the bad pud will generate a pr_err() into dmesg, with no limit so far I > > can see. So I think it means an userspace can DoS the kernel log if it > > wants.. simply by creating the PUD and keep mprotect-ing it > > > > I edited this important info into the [0/n] text, thanks. > > So current kernels can be made to spew into the kernel logs? That's I suppose yes to this one. > considered serious. Can unprivileged userspace code do this? AFAIU, /dev/dax* require root privilege by default, so looks not. But anyone more familiar with real life dax usages please correct me otherwise. Thanks,