mbox series

[0/8] Raspberry Pi 4 DMA addressing support

Message ID 20190731154752.16557-1-nsaenzjulienne@suse.de
Headers show
Series Raspberry Pi 4 DMA addressing support | expand

Message

Nicolas Saenz Julienne July 31, 2019, 3:47 p.m. UTC
Hi all,
this series attempts to address some issues we found while bringing up
the new Raspberry Pi 4 in arm64 and it's intended to serve as a follow
up of this discussion:
https://lkml.org/lkml/2019/7/17/476

The new Raspberry Pi 4 has up to 4GB of memory but most peripherals can
only address the first GB: their DMA address range is
0xc0000000-0xfc000000 which is aliased to the first GB of physical
memory 0x00000000-0x3c000000. Note that only some peripherals have these
limitations: the ARM cores, PCIe, V3D, GENET, and 40-bit DMA channels
have a wider view of the address space.

Part of this is solved in arm32 by setting up the machine specific
'.dma_zone_size = SZ_1G', which takes care of the allocating the
coherent memory area at the right spot. Yet no buffer bouncing (needed
for dma streaming) is available at the moment, but that's a story for
another series.

Unfortunately there is no such thing as '.dma_zone_size' in arm64 also
only ZONE_DMA32 is created which is interpreted by dma-direct and the
arm64 code as if all peripherals where be able to address the first 4GB
of memory.

In the light of this, the series implements the following changes:

- Add code that parses the device-tree in oder to find the SoC's common
  DMA area.

- Create a ZONE_DMA whenever that area is needed and add the rest of the
  lower 4 GB of memory to ZONE_DMA32*.

- Create the CMA area in a place suitable for all peripherals.

- Inform dma-direct of the new runtime calculated min_mask*.

That's all.

Regards,
Nicolas

* These solutions where already discussed on the previous RFC (see link
above).

---

Nicolas Saenz Julienne (8):
  arm64: mm: use arm64_dma_phys_limit instead of calling
    max_zone_dma_phys()
  arm64: rename variables used to calculate ZONE_DMA32's size
  of/fdt: add function to get the SoC wide DMA addressable memory size
  arm64: re-introduce max_zone_dma_phys()
  arm64: use ZONE_DMA on DMA addressing limited devices
  dma-direct: turn ARCH_ZONE_DMA_BITS into a variable
  arm64: update arch_zone_dma_bits to fine tune dma-direct min mask
  mm: comment arm64's usage of 'enum zone_type'

 arch/arm64/Kconfig              |  4 ++
 arch/arm64/mm/init.c            | 78 ++++++++++++++++++++++++++-------
 arch/powerpc/include/asm/page.h |  9 ----
 arch/powerpc/mm/mem.c           | 14 +++++-
 arch/s390/include/asm/page.h    |  2 -
 arch/s390/mm/init.c             |  1 +
 drivers/of/fdt.c                | 72 ++++++++++++++++++++++++++++++
 include/linux/dma-direct.h      |  2 +
 include/linux/mmzone.h          | 21 ++++-----
 include/linux/of_fdt.h          |  2 +
 kernel/dma/direct.c             |  8 ++--
 11 files changed, 168 insertions(+), 45 deletions(-)

Comments

Catalin Marinas July 31, 2019, 5:07 p.m. UTC | #1
On Wed, Jul 31, 2019 at 05:47:48PM +0200, Nicolas Saenz Julienne wrote:
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 1c4ffabbe1cb..f5279ef85756 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -50,6 +50,13 @@
>  s64 memstart_addr __ro_after_init = -1;
>  EXPORT_SYMBOL(memstart_addr);
>  
> +/*
> + * We might create both a ZONE_DMA and ZONE_DMA32. ZONE_DMA is needed if there
> + * are periferals unable to address the first naturally aligned 4GB of ram.
> + * ZONE_DMA32 will be expanded to cover the rest of that memory. If such
> + * limitations doesn't exist only ZONE_DMA32 is created.
> + */

Shouldn't we instead only create ZONE_DMA to cover the whole 32-bit
range and leave ZONE_DMA32 empty? Can__GFP_DMA allocations fall back
onto ZONE_DMA32?
Christoph Hellwig Aug. 1, 2019, 2:04 p.m. UTC | #2
A few nitpicks, otherwise this looks great:

> @@ -201,7 +202,7 @@ static int __init mark_nonram_nosave(void)
>   * everything else. GFP_DMA32 page allocations automatically fall back to
>   * ZONE_DMA.
>   *
> - * By using 31-bit unconditionally, we can exploit ARCH_ZONE_DMA_BITS to
> + * By using 31-bit unconditionally, we can exploit arch_zone_dma_bits to
>   * inform the generic DMA mapping code.  32-bit only devices (if not handled
>   * by an IOMMU anyway) will take a first dip into ZONE_NORMAL and get
>   * otherwise served by ZONE_DMA.
> @@ -237,9 +238,18 @@ void __init paging_init(void)
>  	printk(KERN_DEBUG "Memory hole size: %ldMB\n",
>  	       (long int)((top_of_ram - total_ram) >> 20));
>  
> +	/*
> +	 * Allow 30-bit DMA for very limited Broadcom wifi chips on many
> +	 * powerbooks.
> +	 */
> +	if (IS_ENABLED(CONFIG_PPC32))
> +		arch_zone_dma_bits = 30;
> +	else
> +		arch_zone_dma_bits = 31;
> +

So the above unconditionally comment obviously isn't true any more, and
Ben also said for the recent ppc32 hack he'd prefer dynamic detection.

Maybe Ben and or other ppc folks can chime in an add a patch to the series
to sort this out now that we have a dynamic ZONE_DMA threshold?

> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 59bdceea3737..40dfc9b4ee4c 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -19,9 +19,7 @@
>   * Most architectures use ZONE_DMA for the first 16 Megabytes, but
>   * some use it for entirely different regions:
>   */
> -#ifndef ARCH_ZONE_DMA_BITS
> -#define ARCH_ZONE_DMA_BITS 24
> -#endif
> +unsigned int arch_zone_dma_bits __ro_after_init = 24;

I'd prefer to drop the arch_ prefix and just calls this zone_dma_bits.
In the long run we really need to find a way to just automatically set
this from the meminit code, but that is out of scope for this series.
For now can you please just update the comment above to say something
like:

/*
 * Most architectures use ZONE_DMA for the first 16 Megabytes, but some use it
 * it for entirely different regions.  In that case the arch code needs to
 * override the variable below for dma-direct to work properly.
 */
Christoph Hellwig Aug. 1, 2019, 2:08 p.m. UTC | #3
On Wed, Jul 31, 2019 at 05:47:51PM +0200, Nicolas Saenz Julienne wrote:
> +	 * Architecture			Limit
> +	 * ----------------------------------
> +	 * parisc, ia64, sparc, arm64	<4G
> +	 * s390, powerpc		<2G
> +	 * arm				Various
> +	 * alpha			Unlimited or 0-16MB.
>  	 *
>  	 * i386, x86_64 and multiple other arches
> -	 * 			<16M.
> +	 *				<16M.

powerpc is also Various now, arm64 isn't really < 4G, ia64 only uses
ZONE_DMA32 these days, and parisc doesn't seem to use neither ZONE_DMA
nor ZONE_DMA32.

Based on that I'm not sure the list really makes much sense.

>  	 */
>  	ZONE_DMA,
>  #endif
>  #ifdef CONFIG_ZONE_DMA32
>  	/*
> -	 * x86_64 needs two ZONE_DMAs because it supports devices that are
> -	 * only able to do DMA to the lower 16M but also 32 bit devices that
> -	 * can only do DMA areas below 4G.
> +	 * x86_64 and arm64 need two ZONE_DMAs because they support devices
> +	 * that are only able to DMA a fraction of the 32 bit addressable
> +	 * memory area, but also devices that are limited to that whole 32 bit
> +	 * area.
>  	 */
>  	ZONE_DMA32,

Maybe just say various architectures instead of mentioning specific
ones?  Something like "Some 64-bit platforms need.."
Nicolas Saenz Julienne Aug. 1, 2019, 3:44 p.m. UTC | #4
On Wed, 2019-07-31 at 18:07 +0100, Catalin Marinas wrote:
> On Wed, Jul 31, 2019 at 05:47:48PM +0200, Nicolas Saenz Julienne wrote:
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 1c4ffabbe1cb..f5279ef85756 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -50,6 +50,13 @@
> >  s64 memstart_addr __ro_after_init = -1;
> >  EXPORT_SYMBOL(memstart_addr);
> >  
> > +/*
> > + * We might create both a ZONE_DMA and ZONE_DMA32. ZONE_DMA is needed if
> > there
> > + * are periferals unable to address the first naturally aligned 4GB of ram.
> > + * ZONE_DMA32 will be expanded to cover the rest of that memory. If such
> > + * limitations doesn't exist only ZONE_DMA32 is created.
> > + */
> 
> Shouldn't we instead only create ZONE_DMA to cover the whole 32-bit
> range and leave ZONE_DMA32 empty? Can__GFP_DMA allocations fall back
> onto ZONE_DMA32?

Hi Catalin, thanks for the review.

You're right, the GFP_DMA page allocation will fail with a nasty dmesg error if
ZONE_DMA is configured but empty. Unsurprisingly the opposite situation is fine
(GFP_DMA32 with an empty ZONE_DMA32).

I switched to the scheme you're suggesting for the next version of the series.
The comment will be something the likes of this:

/*
 * We create both a ZONE_DMA and ZONE_DMA32. ZONE_DMA's size is decided based
 * on whether the SoC's peripherals are able to address the first naturally
 * aligned 4 GB of ram.
 *
 * If limited, ZONE_DMA covers that area and ZONE_DMA32 the rest of that 32 bit
 * addressable memory.
 *
 * If not ZONE_DMA is expanded to cover the whole 32 bit addressable memory and
 * ZONE_DMA32 is left empty.
 */

 Regards,
 Nicolas
Nicolas Saenz Julienne Aug. 1, 2019, 3:59 p.m. UTC | #5
Hi Christoph, thanks for the review.

On Thu, 2019-08-01 at 16:04 +0200, Christoph Hellwig wrote:
> A few nitpicks, otherwise this looks great:
> 
> > @@ -201,7 +202,7 @@ static int __init mark_nonram_nosave(void)
> >   * everything else. GFP_DMA32 page allocations automatically fall back to
> >   * ZONE_DMA.
> >   *
> > - * By using 31-bit unconditionally, we can exploit ARCH_ZONE_DMA_BITS to
> > + * By using 31-bit unconditionally, we can exploit arch_zone_dma_bits to
> >   * inform the generic DMA mapping code.  32-bit only devices (if not
> > handled
> >   * by an IOMMU anyway) will take a first dip into ZONE_NORMAL and get
> >   * otherwise served by ZONE_DMA.
> > @@ -237,9 +238,18 @@ void __init paging_init(void)
> >  	printk(KERN_DEBUG "Memory hole size: %ldMB\n",
> >  	       (long int)((top_of_ram - total_ram) >> 20));
> >  
> > +	/*
> > +	 * Allow 30-bit DMA for very limited Broadcom wifi chips on many
> > +	 * powerbooks.
> > +	 */
> > +	if (IS_ENABLED(CONFIG_PPC32))
> > +		arch_zone_dma_bits = 30;
> > +	else
> > +		arch_zone_dma_bits = 31;
> > +
> 
> So the above unconditionally comment obviously isn't true any more, and
> Ben also said for the recent ppc32 hack he'd prefer dynamic detection.
> 
> Maybe Ben and or other ppc folks can chime in an add a patch to the series
> to sort this out now that we have a dynamic ZONE_DMA threshold?

Noted, for now I'll remove the comment.

> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 59bdceea3737..40dfc9b4ee4c 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -19,9 +19,7 @@
> >   * Most architectures use ZONE_DMA for the first 16 Megabytes, but
> >   * some use it for entirely different regions:
> >   */
> > -#ifndef ARCH_ZONE_DMA_BITS
> > -#define ARCH_ZONE_DMA_BITS 24
> > -#endif
> > +unsigned int arch_zone_dma_bits __ro_after_init = 24;
> 
> I'd prefer to drop the arch_ prefix and just calls this zone_dma_bits.
> In the long run we really need to find a way to just automatically set
> this from the meminit code, but that is out of scope for this series.
> For now can you please just update the comment above to say something
> like:
> 
> /*
>  * Most architectures use ZONE_DMA for the first 16 Megabytes, but some use it
>  * it for entirely different regions.  In that case the arch code needs to
>  * override the variable below for dma-direct to work properly.
>  */

Ok perfect.
Robin Murphy Aug. 1, 2019, 4:07 p.m. UTC | #6
On 2019-08-01 4:44 pm, Nicolas Saenz Julienne wrote:
> On Wed, 2019-07-31 at 18:07 +0100, Catalin Marinas wrote:
>> On Wed, Jul 31, 2019 at 05:47:48PM +0200, Nicolas Saenz Julienne wrote:
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 1c4ffabbe1cb..f5279ef85756 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -50,6 +50,13 @@
>>>   s64 memstart_addr __ro_after_init = -1;
>>>   EXPORT_SYMBOL(memstart_addr);
>>>   
>>> +/*
>>> + * We might create both a ZONE_DMA and ZONE_DMA32. ZONE_DMA is needed if
>>> there
>>> + * are periferals unable to address the first naturally aligned 4GB of ram.
>>> + * ZONE_DMA32 will be expanded to cover the rest of that memory. If such
>>> + * limitations doesn't exist only ZONE_DMA32 is created.
>>> + */
>>
>> Shouldn't we instead only create ZONE_DMA to cover the whole 32-bit
>> range and leave ZONE_DMA32 empty? Can__GFP_DMA allocations fall back
>> onto ZONE_DMA32?
> 
> Hi Catalin, thanks for the review.
> 
> You're right, the GFP_DMA page allocation will fail with a nasty dmesg error if
> ZONE_DMA is configured but empty. Unsurprisingly the opposite situation is fine
> (GFP_DMA32 with an empty ZONE_DMA32).

Was that tested on something other than RPi4 with more than 4GB of RAM? 
(i.e. with a non-empty ZONE_NORMAL either way)

Robin.

> I switched to the scheme you're suggesting for the next version of the series.
> The comment will be something the likes of this:
> 
> /*
>   * We create both a ZONE_DMA and ZONE_DMA32. ZONE_DMA's size is decided based
>   * on whether the SoC's peripherals are able to address the first naturally
>   * aligned 4 GB of ram.
>   *
>   * If limited, ZONE_DMA covers that area and ZONE_DMA32 the rest of that 32 bit
>   * addressable memory.
>   *
>   * If not ZONE_DMA is expanded to cover the whole 32 bit addressable memory and
>   * ZONE_DMA32 is left empty.
>   */
> 
>   Regards,
>   Nicolas
> 
>
Nicolas Saenz Julienne Aug. 1, 2019, 4:40 p.m. UTC | #7
On Thu, 2019-08-01 at 17:07 +0100, Robin Murphy wrote:
> On 2019-08-01 4:44 pm, Nicolas Saenz Julienne wrote:
> > On Wed, 2019-07-31 at 18:07 +0100, Catalin Marinas wrote:
> > > On Wed, Jul 31, 2019 at 05:47:48PM +0200, Nicolas Saenz Julienne wrote:
> > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > > index 1c4ffabbe1cb..f5279ef85756 100644
> > > > --- a/arch/arm64/mm/init.c
> > > > +++ b/arch/arm64/mm/init.c
> > > > @@ -50,6 +50,13 @@
> > > >   s64 memstart_addr __ro_after_init = -1;
> > > >   EXPORT_SYMBOL(memstart_addr);
> > > >   
> > > > +/*
> > > > + * We might create both a ZONE_DMA and ZONE_DMA32. ZONE_DMA is needed
> > > > if
> > > > there
> > > > + * are periferals unable to address the first naturally aligned 4GB of
> > > > ram.
> > > > + * ZONE_DMA32 will be expanded to cover the rest of that memory. If
> > > > such
> > > > + * limitations doesn't exist only ZONE_DMA32 is created.
> > > > + */
> > > 
> > > Shouldn't we instead only create ZONE_DMA to cover the whole 32-bit
> > > range and leave ZONE_DMA32 empty? Can__GFP_DMA allocations fall back
> > > onto ZONE_DMA32?
> > 
> > Hi Catalin, thanks for the review.
> > 
> > You're right, the GFP_DMA page allocation will fail with a nasty dmesg error
> > if
> > ZONE_DMA is configured but empty. Unsurprisingly the opposite situation is
> > fine
> > (GFP_DMA32 with an empty ZONE_DMA32).
> 
> Was that tested on something other than RPi4 with more than 4GB of RAM? 
> (i.e. with a non-empty ZONE_NORMAL either way)

No, all I did is play around with RPi4's memory size (1 GB vs 4 GB).

I'll see If I can get access to a dts based board with more than 4 GB, If not
I'll try to fake it. It's not ideal but I can set the limit on 3 GB and have
the 3 areas created (with and witouth an empty ZONE_DMA32).

On top of that, now that you ask, I realise I neglected all the ACPI based
servers. I have access to some so I'll make sure I test everything on them too
for the next series.

Regards,
Nicolas