Message ID | 20210826131747.GE26318@willie-the-truck |
---|---|
State | New |
Headers | show |
Series | [GIT,PULL] arm64 fix for 5.14 | expand |
On Thu, Aug 26, 2021 at 6:17 AM Will Deacon <will@kernel.org> wrote: > > Please pull this single arm64 fix for 5.14. Pulled. But adding Christoph to the cc, since I do think the eventual fix needs to be in the DMA mapping code: > We received a report this week > that the generic version of pfn_valid(), which we switched to this merge > window in 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), interacts > badly with dma_map_resource() due to the following check: > > /* Don't allow RAM to be mapped */ > if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr)))) > return DMA_MAPPING_ERROR; > > Since the ongoing saga to determine the semantics of pfn_valid() is > unlikely to be resolved this week (does it indicate valid memory, or > just the presence of a struct page, or whether that struct page has been > initialised?), just revert back to our old version of pfn_valid() for > 5.14. I think that's the right thing for now, but yeah, that condition for WARN_ON_ONCE() seems very questionable. "pfn_valid()" is more about whether you can do a "pfn_to_page()" lookup on it. II get the feeling that the dma-mapping code should allow pages that are PageReserved() to be mapped - they aren't "ram" in the kernel sense. Perhaps also make sure it's not the zero page (which is PageReserved(), but most definitely RAM). In a PC world that would be (for example) the legacy PCI space at 0xa0000-0xfffff, but I could easily imagine other platforms having other situations. Linus
The pull request you sent on Thu, 26 Aug 2021 14:17:48 +0100:
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-fixes
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/1a6d80ff2419e8ad627b4bf4775a8b4c70af535d
Thank you!
On Thu, Aug 26, 2021 at 11:41:34AM -0700, Linus Torvalds wrote: > "pfn_valid()" is more about whether you can do a "pfn_to_page()" lookup on it. > > II get the feeling that the dma-mapping code should allow pages that > are PageReserved() to be mapped - they aren't "ram" in the kernel > sense. > > Perhaps also make sure it's not the zero page (which is > PageReserved(), but most definitely RAM). > > In a PC world that would be (for example) the legacy PCI space at > 0xa0000-0xfffff, but I could easily imagine other platforms having > other situations. So what would be the correct check for "this is not actually page backed normal RAM"?
On Fri, Aug 27, 2021 at 12:40 AM Christoph Hellwig <hch@lst.de> wrote: > > > In a PC world that would be (for example) the legacy PCI space at > > 0xa0000-0xfffff, but I could easily imagine other platforms having > > other situations. > > So what would be the correct check for "this is not actually page backed > normal RAM"? It would probably be interesting to have the arm people explain the call chain for the warning that caused that revert, so we'd have a very concrete example of the situation that goes wrong, but taking a wild stab at it, the code might be something like /* Don't allow RAM to be mapped */ if (WARN_ON_ONCE(phys_addr_is_ram(phys_addr))) return DMA_MAPPING_ERROR; and then having something like static inline bool phys_addr_is_ram(phys_addr_t phys_addr) { unsigned long pfn = PHYS_PFN(phys_addr); if (!pfn_valid(pfn)) return false; return is_zero_pfn(pfn) || !PageReserved(pfn_to_page(pfn)); } might be close to right. The ARM code actually uses that complex pfn_to_section_nr() and memblock_is_memory() etc. That seems a bit of an overkill, since the memblock code should have translated all that into being reserved. But again, I don't actually know exactly what triggered the issue on ARM, so the above is just my "this seems to be a more proper check" suggestion. Will? Linus
On Fri, Aug 27, 2021 at 10:03:29AM -0700, Linus Torvalds wrote: > The ARM code actually uses that complex pfn_to_section_nr() and > memblock_is_memory() etc. That seems a bit of an overkill, since the > memblock code should have translated all that into being reserved. > > But again, I don't actually know exactly what triggered the issue on > ARM, so the above is just my "this seems to be a more proper check" > suggestion. They CCed me on their earlier discussion, but I did not catch up on it until you responded to the pull request If I understood it correct it was about a platform device mapping a MMIO region (like a PCI bar), but something about section alignment cause pfn_valid to mistrigger.
On Fri, Aug 27, 2021 at 10:10 AM Christoph Hellwig <hch@lst.de> wrote: > > They CCed me on their earlier discussion, but I did not catch up on it > until you responded to the pull request If I understood it correct it > was about a platform device mapping a MMIO region (like a PCI bar), > but something about section alignment cause pfn_valid to mistrigger. Yeah, so I can easily see the maxpfn numbers can easily end up being rounded up to a whole memory section etc. I think my suggested solution should JustWork(tm) - exactly because if the area is then in that "this pfn is valid" area, it will double-check the actual underlying page. That said, I think x86 avoids the problem another way - by just making sure max_pfn is exact. That works too, as long as there are no holes in the RAM map that might be used for PCI BAR's. So I think arm could fix it that way too, depending on their memory layout. (x86 has that traditional hole at A0000-FFFFF, but it's special enough that it presumably is never an issue). Linus
[+David] On Fri, Aug 27, 2021 at 10:16:27AM -0700, Linus Torvalds wrote: > On Fri, Aug 27, 2021 at 10:10 AM Christoph Hellwig <hch@lst.de> wrote: > > > > They CCed me on their earlier discussion, but I did not catch up on it > > until you responded to the pull request If I understood it correct it > > was about a platform device mapping a MMIO region (like a PCI bar), > > but something about section alignment cause pfn_valid to mistrigger. > > Yeah, so I can easily see the maxpfn numbers can easily end up being > rounded up to a whole memory section etc. > > I think my suggested solution should JustWork(tm) - exactly because if > the area is then in that "this pfn is valid" area, it will > double-check the actual underlying page. I think the pitfall there is that the 'struct page' might well exist, but isn't necessarily initialised with anything meaningful. I remember seeing something like that in the past (I think for "no-map" memory) and David's reply here: https://lore.kernel.org/r/aff3942e-b9ce-5bae-8214-0e5d89cd071c@redhat.com hints that there are still gotchas with looking at the page flags for pages if the memory is offline or ZONE_DEVICE. Don't get me wrong, I'd really like to use the generic code here as I think it would help to set expectations around what pfn_valid() actually means, I'm just less keen on the try-it-and-see-what-breaks approach given how sensitive it is to the layout of the physical memory map. > That said, I think x86 avoids the problem another way - by just making > sure max_pfn is exact. That works too, as long as there are no holes > in the RAM map that might be used for PCI BAR's. > > So I think arm could fix it that way too, depending on their memory layout. The physical memory map is the wild west, unfortunately. It's one of the things where everybody does something different and it's very common to see disjoint banks of memory placed seemingly randomly around. Will
On Fri, Aug 27, 2021 at 10:16:27AM -0700, Linus Torvalds wrote: > On Fri, Aug 27, 2021 at 10:10 AM Christoph Hellwig <hch@lst.de> wrote: > > They CCed me on their earlier discussion, but I did not catch up on it > > until you responded to the pull request If I understood it correct it > > was about a platform device mapping a MMIO region (like a PCI bar), > > but something about section alignment cause pfn_valid to mistrigger. > > Yeah, so I can easily see the maxpfn numbers can easily end up being > rounded up to a whole memory section etc. > > I think my suggested solution should JustWork(tm) - exactly because if > the area is then in that "this pfn is valid" area, it will > double-check the actual underlying page. > > That said, I think x86 avoids the problem another way - by just making > sure max_pfn is exact. That works too, as long as there are no holes > in the RAM map that might be used for PCI BAR's. > > So I think arm could fix it that way too, depending on their memory layout. The suggested solution in the original thread was to change the generic DMA code to use memblock_is_memory() instead of pfn_valid(): https://lore.kernel.org/lkml/b720e7c8-ca44-0a25-480b-05bf49d03c35@redhat.com/ Given how late we discovered this in the -rc cycle, the decision was to revert the pfn_valid() patch. We'll re-instate it at some point but someone needs to sanity check the other pfn_valid() call sites and the expected semantics.
On 31.08.21 15:31, Will Deacon wrote: > [+David] > > On Fri, Aug 27, 2021 at 10:16:27AM -0700, Linus Torvalds wrote: >> On Fri, Aug 27, 2021 at 10:10 AM Christoph Hellwig <hch@lst.de> wrote: >>> >>> They CCed me on their earlier discussion, but I did not catch up on it >>> until you responded to the pull request If I understood it correct it >>> was about a platform device mapping a MMIO region (like a PCI bar), >>> but something about section alignment cause pfn_valid to mistrigger. >> >> Yeah, so I can easily see the maxpfn numbers can easily end up being >> rounded up to a whole memory section etc. >> >> I think my suggested solution should JustWork(tm) - exactly because if >> the area is then in that "this pfn is valid" area, it will >> double-check the actual underlying page. > > I think the pitfall there is that the 'struct page' might well exist, > but isn't necessarily initialised with anything meaningful. I remember > seeing something like that in the past (I think for "no-map" memory) and > David's reply here: > > https://lore.kernel.org/r/aff3942e-b9ce-5bae-8214-0e5d89cd071c@redhat.com > > hints that there are still gotchas with looking at the page flags for > pages if the memory is offline or ZONE_DEVICE. > > Don't get me wrong, I'd really like to use the generic code here as I > think it would help to set expectations around what pfn_valid() actually > means, I'm just less keen on the try-it-and-see-what-breaks approach > given how sensitive it is to the layout of the physical memory map. > >> That said, I think x86 avoids the problem another way - by just making >> sure max_pfn is exact. That works too, as long as there are no holes >> in the RAM map that might be used for PCI BAR's. >> >> So I think arm could fix it that way too, depending on their memory layout. > > The physical memory map is the wild west, unfortunately. It's one of the > things where everybody does something different and it's very common to see > disjoint banks of memory placed seemingly randomly around. The resource tree is usually the best place to really identify what's system RAM and what's not IIRC. memblock should work on applicable archs as well. Identifying ZONE_DEVICE ranges reliably is a different story ...