diff mbox

[PATCHv2,net-next,05/16] net: mvpp2: introduce PPv2.2 HW descriptors and adapt accessors

Message ID 1482943592-12556-6-git-send-email-thomas.petazzoni@free-electrons.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Thomas Petazzoni Dec. 28, 2016, 4:46 p.m. UTC
This commit adds the definition of the PPv2.2 HW descriptors, adjusts
the mvpp2_tx_desc and mvpp2_rx_desc structures accordingly, and adapts
the accessors to work on both PPv2.1 and PPv2.2.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 109 +++++++++++++++++++++++++++++++----
 1 file changed, 98 insertions(+), 11 deletions(-)

Comments

Russell King (Oracle) Jan. 6, 2017, 2:29 p.m. UTC | #1
On Wed, Dec 28, 2016 at 05:46:21PM +0100, Thomas Petazzoni wrote:
> This commit adds the definition of the PPv2.2 HW descriptors, adjusts
> the mvpp2_tx_desc and mvpp2_rx_desc structures accordingly, and adapts
> the accessors to work on both PPv2.1 and PPv2.2.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
...
> +		/* On PPv2.2, the situation is more complicated,
> +		 * because there is only 40 bits to store the virtual
> +		 * address, which is not sufficient. So on 64 bits
> +		 * systems, we use phys_to_virt() to get the virtual
> +		 * address from the physical address, which is fine
> +		 * because the kernel linear mapping includes the
> +		 * entire 40 bits physical address space. On 32 bits
> +		 * systems however, we can't use phys_to_virt(), but
> +		 * since virtual addresses are 32 bits only, there is
> +		 * enough space in the RX descriptor for the full
> +		 * virtual address.
> +		 */
> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> +		dma_addr_t dma_addr =
> +			rx_desc->pp22.buf_phys_addr_key_hash & DMA_BIT_MASK(40);
> +		phys_addr_t phys_addr =
> +			dma_to_phys(port->dev->dev.parent, dma_addr);
> +
> +		return (unsigned long)phys_to_virt(phys_addr);
> +#else
> +		return rx_desc->pp22.buf_cookie_misc & DMA_BIT_MASK(40);
> +#endif

I'm not sure that's the best way of selecting the difference.  It seems
that the issue here is the size of the virtual address, so why not test
the size of a virtual address pointer?

		if (8 * sizeof(rx_desc) > 40) {
			/* do phys addr dance */
		} else {
			return rx_desc->pp22.buf_cookie_misc & DMA_BIT_MASK(40);
		}

It also means that we get compile coverage over both sides of the
conditional.
Robin Murphy Jan. 6, 2017, 2:44 p.m. UTC | #2
On 06/01/17 14:29, Russell King - ARM Linux wrote:
> On Wed, Dec 28, 2016 at 05:46:21PM +0100, Thomas Petazzoni wrote:
>> This commit adds the definition of the PPv2.2 HW descriptors, adjusts
>> the mvpp2_tx_desc and mvpp2_rx_desc structures accordingly, and adapts
>> the accessors to work on both PPv2.1 and PPv2.2.
>>
>> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ...
>> +		/* On PPv2.2, the situation is more complicated,
>> +		 * because there is only 40 bits to store the virtual
>> +		 * address, which is not sufficient. So on 64 bits
>> +		 * systems, we use phys_to_virt() to get the virtual
>> +		 * address from the physical address, which is fine
>> +		 * because the kernel linear mapping includes the
>> +		 * entire 40 bits physical address space. On 32 bits
>> +		 * systems however, we can't use phys_to_virt(), but
>> +		 * since virtual addresses are 32 bits only, there is
>> +		 * enough space in the RX descriptor for the full
>> +		 * virtual address.
>> +		 */
>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>> +		dma_addr_t dma_addr =
>> +			rx_desc->pp22.buf_phys_addr_key_hash & DMA_BIT_MASK(40);
>> +		phys_addr_t phys_addr =
>> +			dma_to_phys(port->dev->dev.parent, dma_addr);

Ugh, this looks bogus. dma_to_phys(), in the arm64 case at least, is
essentially a SWIOTLB internal helper function which has to be
implemented in architecture code because reasons. Calling it from a
driver is almost certainly wrong (it doesn't even exist on most
architectures). Besides, if this is really a genuine dma_addr_t obtained
from a DMA API call, you cannot infer it to be related to a CPU physical
address, or convertible to one at all.

>> +
>> +		return (unsigned long)phys_to_virt(phys_addr);
>> +#else
>> +		return rx_desc->pp22.buf_cookie_misc & DMA_BIT_MASK(40);
>> +#endif
> 
> I'm not sure that's the best way of selecting the difference.

Given that CONFIG_ARCH_DMA_ADDR_T_64BIT could be enabled on 32-bit LPAE
systems, indeed it definitely isn't.

Robin.

>  It seems
> that the issue here is the size of the virtual address, so why not test
> the size of a virtual address pointer?
> 
> 		if (8 * sizeof(rx_desc) > 40) {
> 			/* do phys addr dance */
> 		} else {
> 			return rx_desc->pp22.buf_cookie_misc & DMA_BIT_MASK(40);
> 		}
> 
> It also means that we get compile coverage over both sides of the
> conditional.
>
Thomas Petazzoni Feb. 3, 2017, 1:24 p.m. UTC | #3
Hello,

On Fri, 6 Jan 2017 14:44:56 +0000, Robin Murphy wrote:

> >> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> >> +		dma_addr_t dma_addr =
> >> +			rx_desc->pp22.buf_phys_addr_key_hash & DMA_BIT_MASK(40);
> >> +		phys_addr_t phys_addr =
> >> +			dma_to_phys(port->dev->dev.parent, dma_addr);  
> 
> Ugh, this looks bogus. dma_to_phys(), in the arm64 case at least, is
> essentially a SWIOTLB internal helper function which has to be
> implemented in architecture code because reasons. Calling it from a
> driver is almost certainly wrong (it doesn't even exist on most
> architectures). Besides, if this is really a genuine dma_addr_t obtained
> from a DMA API call, you cannot infer it to be related to a CPU physical
> address, or convertible to one at all.

So do you have a better suggestion? The descriptors only have enough
space to store a 40-bit virtual address, which is not enough to fit the
virtual addresses used by Linux for SKBs. This is why I'm instead
relying on the fact that the descriptors can store the 40-bit physical
address, and convert it back to a virtual address, which should be fine
on ARM64 because the entire physical memory is part of the kernel linear
mapping.

> >> +		return (unsigned long)phys_to_virt(phys_addr);
> >> +#else
> >> +		return rx_desc->pp22.buf_cookie_misc & DMA_BIT_MASK(40);
> >> +#endif  
> > 
> > I'm not sure that's the best way of selecting the difference.  
> 
> Given that CONFIG_ARCH_DMA_ADDR_T_64BIT could be enabled on 32-bit LPAE
> systems, indeed it definitely isn't.

Russell proposal of testing the size of a virtual address
pointer instead would solve this I believe, correct?

Thanks,

Thomas
Robin Murphy Feb. 3, 2017, 2:05 p.m. UTC | #4
On 03/02/17 13:24, Thomas Petazzoni wrote:
> Hello,
> 
> On Fri, 6 Jan 2017 14:44:56 +0000, Robin Murphy wrote:
> 
>>>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>>>> +		dma_addr_t dma_addr =
>>>> +			rx_desc->pp22.buf_phys_addr_key_hash & DMA_BIT_MASK(40);
>>>> +		phys_addr_t phys_addr =
>>>> +			dma_to_phys(port->dev->dev.parent, dma_addr);  
>>
>> Ugh, this looks bogus. dma_to_phys(), in the arm64 case at least, is
>> essentially a SWIOTLB internal helper function which has to be
>> implemented in architecture code because reasons. Calling it from a
>> driver is almost certainly wrong (it doesn't even exist on most
>> architectures). Besides, if this is really a genuine dma_addr_t obtained
>> from a DMA API call, you cannot infer it to be related to a CPU physical
>> address, or convertible to one at all.
> 
> So do you have a better suggestion? The descriptors only have enough
> space to store a 40-bit virtual address, which is not enough to fit the
> virtual addresses used by Linux for SKBs. This is why I'm instead
> relying on the fact that the descriptors can store the 40-bit physical
> address, and convert it back to a virtual address, which should be fine
> on ARM64 because the entire physical memory is part of the kernel linear
> mapping.

OK, that has nothing to do with DMA addresses then.

>>>> +		return (unsigned long)phys_to_virt(phys_addr);
>>>> +#else
>>>> +		return rx_desc->pp22.buf_cookie_misc & DMA_BIT_MASK(40);
>>>> +#endif  
>>>
>>> I'm not sure that's the best way of selecting the difference.  
>>
>> Given that CONFIG_ARCH_DMA_ADDR_T_64BIT could be enabled on 32-bit LPAE
>> systems, indeed it definitely isn't.
> 
> Russell proposal of testing the size of a virtual address
> pointer instead would solve this I believe, correct?

AFAICS, even that shouldn't really be necessary - for all VA/PA
combinations of 32/32, 32/40 and 64/40, storing virt_to_phys() of the
SKB VA won't overflow 40 bits, so a corresponding phys_to_virt() at the
other end can't go wrong either. If you really do want to special-case
things based on VA size, though, either CONFIG_64BIT or sizeof(void *)
would indeed be infinitely more useful than the unrelated DMA address
width - I know this driver's never going to run on SPARC64, but that's
one example of where the above logic would lead to precisely the
truncated VA it's trying to avoid.

Robin.

> 
> Thanks,
> 
> Thomas
>
Thomas Petazzoni Feb. 3, 2017, 3:02 p.m. UTC | #5
Hello,

On Fri, 3 Feb 2017 14:05:13 +0000, Robin Murphy wrote:

> > So do you have a better suggestion? The descriptors only have enough
> > space to store a 40-bit virtual address, which is not enough to fit the
> > virtual addresses used by Linux for SKBs. This is why I'm instead
> > relying on the fact that the descriptors can store the 40-bit physical
> > address, and convert it back to a virtual address, which should be fine
> > on ARM64 because the entire physical memory is part of the kernel linear
> > mapping.  
> 
> OK, that has nothing to do with DMA addresses then.

Well, it has to do with DMA in the sense that those buffers are
mapped with dma_map_single(). So the address that is given to us by the
hardware as the "physical address of the RX buffer" is the one that we
have initially given to the hardware and was the result of 
dma_map_single().

> > Russell proposal of testing the size of a virtual address
> > pointer instead would solve this I believe, correct?  
> 
> AFAICS, even that shouldn't really be necessary - for all VA/PA
> combinations of 32/32, 32/40 and 64/40, storing virt_to_phys() of the
> SKB VA won't overflow 40 bits,

I'm already lost here. Why are you talking about virt_to_phys() ? See
above: we have the dma_addr_t returned from dma_map_single(), and we
need to find back the corresponding virtual address, because there is
not enough room in the HW descriptors to store a full 64-bit VA.

> so a corresponding phys_to_virt() at the other end can't go wrong
> either. If you really do want to special-case things based on VA
> size, though, either CONFIG_64BIT or sizeof(void *) would indeed be
> infinitely more useful than the unrelated DMA address width - I know
> this driver's never going to run on SPARC64, but that's one example
> of where the above logic would lead to precisely the truncated VA
> it's trying to avoid.

What is different on SPARC64 here?

The situation we have is the following:

 - On systems where VAs are 32-bit wide, we have enough room to store
   the VA in the HW descriptor. So when we receive a packet, the HW
   descriptor provides us directly with the VA of the network packet,
   and the DMA address of the packet. We can dma_unmap_single() the
   packet, and do its processing.

 - On systems where VAs are 64-bit wide, we don't have enough room to
   store the VA in the HW descriptor. However, on 64-bit systems, the
   entire physical memory is mapped in the kernel linear mapping, so
   phys_to_virt() is valid on any physical address. And we use this
   property to retrieve the full 64-bit VA using the DMA address that
   we get from the HW descriptor.

   Since what we get from the HW descriptor is a DMA address, that's
   why we're using phys_to_virt(dma_to_phys(...)).

Best regards,

Thomas
Russell King (Oracle) Feb. 3, 2017, 3:54 p.m. UTC | #6
On Fri, Feb 03, 2017 at 02:05:13PM +0000, Robin Murphy wrote:
> AFAICS, even that shouldn't really be necessary - for all VA/PA
> combinations of 32/32, 32/40 and 64/40, storing virt_to_phys() of the
> SKB VA won't overflow 40 bits, so a corresponding phys_to_virt() at the
> other end can't go wrong either.

Except for the detail that virt_to_phys()/phys_to_virt() is only defined
for the direct-mapped memory, not for highmem.  That matters a lot for
32-bit platforms.

Now for a bit of a whinge.  Reading through this code is rather annoying
because of what's called a "physical" address which is actually a DMA
address as far as the kernel is concerned - this makes it much harder
when thinking about this issue because it causes all sorts of confusion.
Please can the next revision of the patches start calling things by their
right name - a dma_addr_t is a DMA address, not a physical address, even
though _numerically_ it may be the same thing.  From the point of view
of the kernel, you must not do phys_to_virt() on a dma_addr_t address.
Thanks.

If we're going to start dealing with _real_ physical addresses, then
this is even more important to separate the concept of what's a physical
address and what's a DMA address in this driver.

Taking a step backwards...

How do DMA addresses and this cookie get into the receive ring - from what
I can see, the driver doesn't write these into the receive ring, it's the
hardware that writes it, and the only route I can see that they get there
is via writes performed in mvpp2_bm_pool_put().

Now, from what I can see, the "buf_virt_addr" comes from:

+static void *mvpp2_frag_alloc(const struct mvpp2_bm_pool *pool)
+{
+       if (likely(pool->frag_size <= PAGE_SIZE))
+               return netdev_alloc_frag(pool->frag_size);
+       else
+               return kmalloc(pool->frag_size, GFP_ATOMIC);
+}

via mvpp2_buf_alloc().

Both kmalloc() and netdev_alloc_frag() guarantee that the virtual
address will be in lowmem.

Given that, I would suggest changing mvpp2_bm_pool_put() as follows -
and this is where my point above about separating the notion of "dma
address" and "physical address" becomes very important:

 static inline void mvpp2_bm_pool_put(struct mvpp2_port *port, int pool,
-                                    dma_addr_t buf_phys_addr,
-                                    unsigned long buf_virt_addr)
+                                    dma_addr_t dma, phys_addr_t phys)
 {

and updating it to write "phys" as the existing buf_virt_addr.

In mvpp2_bm_bufs_add():

                buf = mvpp2_buf_alloc(port, bm_pool, &phys_addr, GFP_KERNEL);
                if (!buf)
                        break;
 
                mvpp2_bm_pool_put(port, bm_pool->id, phys_addr,
-                                 (unsigned long)buf);
+                                 virt_to_phys(buf));

which I think means that mvpp2_rxdesc_virt_addr_get() can just become:

 	phys_addr_t cookie;
 
        /* PPv2.1 can only be used on 32 bits architectures, and there
         * are 32 bits in buf_cookie which are enough to store the
         * full virtual address, so things are easy.
         */
        if (port->priv->hw_version == MVPP21)
                cookie = rx_desc->pp21.buf_cookie;
        else
 		cookie = rx_desc->pp22.buf_cookie_misc & FORTY_BIT_MASK;

 	return phys_to_virt(cookie);

I'd suggest against using DMA_BIT_MASK(40) there - because it's not a
DMA address, even though it happens to resolve to the same number.

Again, I may have missed how the addresses end up getting into
buf_cookie_misc, so what I suggest above may not be possible.

I'd also suggest that there is some test in mvpp2_bm_bufs_add() to
verify that the physical addresses and DMA addresses do fit within
the available number of bits - if they don't we could end up scribbling
over memory that we shouldn't be, and it looks like we have a failure
path there to gracefully handle that situation - gracefully compared
to a nasty BUG_ON().
Robin Murphy Feb. 3, 2017, 4:31 p.m. UTC | #7
On 03/02/17 15:02, Thomas Petazzoni wrote:
> Hello,
> 
> On Fri, 3 Feb 2017 14:05:13 +0000, Robin Murphy wrote:
> 
>>> So do you have a better suggestion? The descriptors only have enough
>>> space to store a 40-bit virtual address, which is not enough to fit the
>>> virtual addresses used by Linux for SKBs. This is why I'm instead
>>> relying on the fact that the descriptors can store the 40-bit physical
>>> address, and convert it back to a virtual address, which should be fine
>>> on ARM64 because the entire physical memory is part of the kernel linear
>>> mapping.  
>>
>> OK, that has nothing to do with DMA addresses then.
> 
> Well, it has to do with DMA in the sense that those buffers are
> mapped with dma_map_single(). So the address that is given to us by the
> hardware as the "physical address of the RX buffer" is the one that we
> have initially given to the hardware and was the result of 
> dma_map_single().

Ah, right. DMA addresses are not physical addresses. They may look very
much alike on many platforms, but that cannot be relied upon. On some
platforms, some of the "address bits" actually encode things like bus
attributes. If the device is behind an IOMMU, the DMA address can be
entirely arbitrary, and the only way to convert it back to a CPU address
(physical or virtual) is to walk the IOMMU page table.

>>> Russell proposal of testing the size of a virtual address
>>> pointer instead would solve this I believe, correct?  
>>
>> AFAICS, even that shouldn't really be necessary - for all VA/PA
>> combinations of 32/32, 32/40 and 64/40, storing virt_to_phys() of the
>> SKB VA won't overflow 40 bits,
> 
> I'm already lost here. Why are you talking about virt_to_phys() ? See
> above: we have the dma_addr_t returned from dma_map_single(), and we
> need to find back the corresponding virtual address, because there is
> not enough room in the HW descriptors to store a full 64-bit VA.

That's the wrong approach. You need to keep track of the VA which you
gave to dma_map_single() in the first place; *that* can be packed into
40 bits with a round-trip through a CPU physical address, assuming that
this "VA" field in the hardware descriptor is in addition to the actual
DMA address (and that the hardware descriptor is something different to
the rx_desc struct to which you could presumably just add an extra pointer).

>> so a corresponding phys_to_virt() at the other end can't go wrong
>> either. If you really do want to special-case things based on VA
>> size, though, either CONFIG_64BIT or sizeof(void *) would indeed be
>> infinitely more useful than the unrelated DMA address width - I know
>> this driver's never going to run on SPARC64, but that's one example
>> of where the above logic would lead to precisely the truncated VA
>> it's trying to avoid.
> 
> What is different on SPARC64 here?

void * and phys_addr_t are 64-bit, but dma_addr_t is 32-bit. The more
realistic concern, though, is that this using dma_to_phys() - which is a
DMA-API-internal helper for SWIOTLB - will break the driver's
COMPILE_TEST dependency.

> The situation we have is the following:
> 
>  - On systems where VAs are 32-bit wide, we have enough room to store
>    the VA in the HW descriptor. So when we receive a packet, the HW
>    descriptor provides us directly with the VA of the network packet,
>    and the DMA address of the packet. We can dma_unmap_single() the
>    packet, and do its processing.
> 
>  - On systems where VAs are 64-bit wide, we don't have enough room to
>    store the VA in the HW descriptor. However, on 64-bit systems, the
>    entire physical memory is mapped in the kernel linear mapping, so
>    phys_to_virt() is valid on any physical address. And we use this
>    property to retrieve the full 64-bit VA using the DMA address that
>    we get from the HW descriptor.
> 
>    Since what we get from the HW descriptor is a DMA address, that's
>    why we're using phys_to_virt(dma_to_phys(...)).

Marvell's product page tells me that Armada7k/8k include an IOMMU; at
some point, somebody's surely going to want to use it, and any invalid
assumptions about physical addresses are going to go bang.  You need to
treat CPU and DMA addresses as entirely separate, but since any CPU
address used for streaming DMA - i.e. dma_map_*() - must be a valid
linear map address, virtual and physical are relatively interchangeable
if the latter works out more convenient.

Robin.

> 
> Best regards,
> 
> Thomas
>
Thomas Petazzoni Feb. 4, 2017, 1:59 p.m. UTC | #8
Hello,

On Fri, 3 Feb 2017 15:54:49 +0000, Russell King - ARM Linux wrote:

> Now for a bit of a whinge.  Reading through this code is rather annoying
> because of what's called a "physical" address which is actually a DMA
> address as far as the kernel is concerned - this makes it much harder
> when thinking about this issue because it causes all sorts of confusion.
> Please can the next revision of the patches start calling things by their
> right name - a dma_addr_t is a DMA address, not a physical address, even
> though _numerically_ it may be the same thing.  From the point of view
> of the kernel, you must not do phys_to_virt() on a dma_addr_t address.
> Thanks.

I do know that phys_addr_t is different from dma_addr_t. The case where
you have an IOMMU makes this very obvious. The fact that the driver
isn't completely correct in that respect indeed needs to be fixed.

> Taking a step backwards...
> 
> How do DMA addresses and this cookie get into the receive ring - from what
> I can see, the driver doesn't write these into the receive ring, it's the
> hardware that writes it, and the only route I can see that they get there
> is via writes performed in mvpp2_bm_pool_put().

Correct. Here is a quick summary of my understand of the HW operation.
The HW has a "buffer manager", i.e it can internally manage a list of
buffers. At initialization time, we provide to the HW a list of buffers
by pushing the DMA address and virtual address of each buffer into a
register. Thanks to that the HW then knows about all the buffers we
have given. Then, upon RX of a packet, the HW picks one of those
buffers, and then fills a RX descriptor with the DMA address of the
packet, and the VA in the "cookie" field. The "cookie" field can I
believe be used for whatever we want, it's just "free" space in the
descriptor.

> Now, from what I can see, the "buf_virt_addr" comes from:
> 
> +static void *mvpp2_frag_alloc(const struct mvpp2_bm_pool *pool)
> +{
> +       if (likely(pool->frag_size <= PAGE_SIZE))
> +               return netdev_alloc_frag(pool->frag_size);
> +       else
> +               return kmalloc(pool->frag_size, GFP_ATOMIC);
> +}
> 
> via mvpp2_buf_alloc().
> 
> Both kmalloc() and netdev_alloc_frag() guarantee that the virtual
> address will be in lowmem.

*That* is the thing I didn't realize until now. If the buffers are
always in lowmem, then it's much easier because we can indeed use
virt_to_phys()/phys_to_virt() on them. But it's indeed obvious they are
in lowmem, since we get a void* pointer for them.

> Given that, I would suggest changing mvpp2_bm_pool_put() as follows -
> and this is where my point above about separating the notion of "dma
> address" and "physical address" becomes very important:
> 
>  static inline void mvpp2_bm_pool_put(struct mvpp2_port *port, int pool,
> -                                    dma_addr_t buf_phys_addr,
> -                                    unsigned long buf_virt_addr)
> +                                    dma_addr_t dma, phys_addr_t phys)
>  {
> 
> and updating it to write "phys" as the existing buf_virt_addr.
> 
> In mvpp2_bm_bufs_add():
> 
>                 buf = mvpp2_buf_alloc(port, bm_pool, &phys_addr, GFP_KERNEL);
>                 if (!buf)
>                         break;
>  
>                 mvpp2_bm_pool_put(port, bm_pool->id, phys_addr,
> -                                 (unsigned long)buf);
> +                                 virt_to_phys(buf));
> 
> which I think means that mvpp2_rxdesc_virt_addr_get() can just become:
> 
>  	phys_addr_t cookie;
>  
>         /* PPv2.1 can only be used on 32 bits architectures, and there
>          * are 32 bits in buf_cookie which are enough to store the
>          * full virtual address, so things are easy.
>          */
>         if (port->priv->hw_version == MVPP21)
>                 cookie = rx_desc->pp21.buf_cookie;
>         else
>  		cookie = rx_desc->pp22.buf_cookie_misc & FORTY_BIT_MASK;
> 
>  	return phys_to_virt(cookie);

This makes complete sense. We use the cookie to store the phys_addr_t
rather than the virtual address. I might be missing something, but it
seems like a very good solution. Thanks for the suggestion, I'll try
this!

> I'd suggest against using DMA_BIT_MASK(40) there - because it's not a
> DMA address, even though it happens to resolve to the same number.

Sure, will update this as well.

> Again, I may have missed how the addresses end up getting into
> buf_cookie_misc, so what I suggest above may not be possible.

No, I think you got it right. At least it matches my own understanding
of the HW.

> I'd also suggest that there is some test in mvpp2_bm_bufs_add() to
> verify that the physical addresses and DMA addresses do fit within
> the available number of bits - if they don't we could end up scribbling
> over memory that we shouldn't be, and it looks like we have a failure
> path there to gracefully handle that situation - gracefully compared
> to a nasty BUG_ON().

For the DMA addresses, I have some additional changes on top of my v2
to use dma_set_mask() to ensure that DMA addresses fit in 40 bits. But
that's only for DMA addresses indeed.

Thanks again for your suggestion!

Thomas
David Laight Feb. 6, 2017, 12:43 p.m. UTC | #9
From: Thomas Petazzoni
> Sent: 04 February 2017 14:00
...
> This makes complete sense. We use the cookie to store the phys_addr_t
> rather than the virtual address. I might be missing something, but it
> seems like a very good solution. Thanks for the suggestion, I'll try
> this!

Why not just store an array index for the buffer info?
That would save you having to regenerate kernel virtual
addresses from limited info.

Or realise that the low bits (page offset) of the dma address
and kernel virtual address (and kernel physical for that matter)
will always match. 50 bits might be enough for a kernel virtual address.

	David
diff mbox

Patch

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index a37ff50..0e00ec0 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -773,18 +773,42 @@  struct mvpp21_rx_desc {
 	u32 reserved8;
 };
 
+/* HW TX descriptor for PPv2.2 */
+struct mvpp22_tx_desc {
+	u32 command;
+	u8  packet_offset;
+	u8  phys_txq;
+	u16 data_size;
+	u64 reserved1;
+	u64 buf_phys_addr_ptp;
+	u64 buf_cookie_misc;
+};
+
+/* HW RX descriptor for PPv2.2 */
+struct mvpp22_rx_desc {
+	u32 status;
+	u16 reserved1;
+	u16 data_size;
+	u32 reserved2;
+	u32 reserved3;
+	u64 buf_phys_addr_key_hash;
+	u64 buf_cookie_misc;
+};
+
 /* Opaque type used by the driver to manipulate the HW TX and RX
  * descriptors
  */
 struct mvpp2_tx_desc {
 	union {
 		struct mvpp21_tx_desc pp21;
+		struct mvpp22_tx_desc pp22;
 	};
 };
 
 struct mvpp2_rx_desc {
 	union {
 		struct mvpp21_rx_desc pp21;
+		struct mvpp22_rx_desc pp22;
 	};
 };
 
@@ -991,72 +1015,135 @@  static u32 mvpp2_read(struct mvpp2 *priv, u32 offset)
 static dma_addr_t mvpp2_txdesc_phys_addr_get(struct mvpp2_port *port,
 					     struct mvpp2_tx_desc *tx_desc)
 {
-	return tx_desc->pp21.buf_phys_addr;
+	if (port->priv->hw_version == MVPP21)
+		return tx_desc->pp21.buf_phys_addr;
+	else
+		return tx_desc->pp22.buf_phys_addr_ptp & DMA_BIT_MASK(40);
 }
 
 static void mvpp2_txdesc_phys_addr_set(struct mvpp2_port *port,
 				       struct mvpp2_tx_desc *tx_desc,
 				       dma_addr_t phys_addr)
 {
-	tx_desc->pp21.buf_phys_addr = phys_addr;
+	if (port->priv->hw_version == MVPP21) {
+		tx_desc->pp21.buf_phys_addr = phys_addr;
+	} else {
+		u64 val = (u64)phys_addr;
+
+		tx_desc->pp22.buf_phys_addr_ptp &= ~DMA_BIT_MASK(40);
+		tx_desc->pp22.buf_phys_addr_ptp |= val;
+	}
 }
 
 static size_t mvpp2_txdesc_size_get(struct mvpp2_port *port,
 				    struct mvpp2_tx_desc *tx_desc)
 {
-	return tx_desc->pp21.data_size;
+	if (port->priv->hw_version == MVPP21)
+		return tx_desc->pp21.data_size;
+	else
+		return tx_desc->pp22.data_size;
 }
 
 static void mvpp2_txdesc_size_set(struct mvpp2_port *port,
 				  struct mvpp2_tx_desc *tx_desc,
 				  size_t size)
 {
-	tx_desc->pp21.data_size = size;
+	if (port->priv->hw_version == MVPP21)
+		tx_desc->pp21.data_size = size;
+	else
+		tx_desc->pp22.data_size = size;
 }
 
 static void mvpp2_txdesc_txq_set(struct mvpp2_port *port,
 				 struct mvpp2_tx_desc *tx_desc,
 				 unsigned int txq)
 {
-	tx_desc->pp21.phys_txq = txq;
+	if (port->priv->hw_version == MVPP21)
+		tx_desc->pp21.phys_txq = txq;
+	else
+		tx_desc->pp22.phys_txq = txq;
 }
 
 static void mvpp2_txdesc_cmd_set(struct mvpp2_port *port,
 				 struct mvpp2_tx_desc *tx_desc,
 				 unsigned int command)
 {
-	tx_desc->pp21.command = command;
+	if (port->priv->hw_version == MVPP21)
+		tx_desc->pp21.command = command;
+	else
+		tx_desc->pp22.command = command;
 }
 
 static void mvpp2_txdesc_offset_set(struct mvpp2_port *port,
 				    struct mvpp2_tx_desc *tx_desc,
 				    unsigned int offset)
 {
-	tx_desc->pp21.packet_offset = offset;
+	if (port->priv->hw_version == MVPP21)
+		tx_desc->pp21.packet_offset = offset;
+	else
+		tx_desc->pp22.packet_offset = offset;
 }
 
 static dma_addr_t mvpp2_rxdesc_phys_addr_get(struct mvpp2_port *port,
 					     struct mvpp2_rx_desc *rx_desc)
 {
-	return rx_desc->pp21.buf_phys_addr;
+	if (port->priv->hw_version == MVPP21)
+		return rx_desc->pp21.buf_phys_addr;
+	else
+		return rx_desc->pp22.buf_phys_addr_key_hash & DMA_BIT_MASK(40);
 }
 
 static unsigned long mvpp2_rxdesc_virt_addr_get(struct mvpp2_port *port,
 						struct mvpp2_rx_desc *rx_desc)
 {
-	return rx_desc->pp21.buf_cookie;
+	/* PPv2.1 can only be used on 32 bits architectures, and there
+	 * are 32 bits in buf_cookie which are enough to store the
+	 * full virtual address, so things are easy.
+	 */
+	if (port->priv->hw_version == MVPP21) {
+		return rx_desc->pp21.buf_cookie;
+	} else {
+		/* On PPv2.2, the situation is more complicated,
+		 * because there is only 40 bits to store the virtual
+		 * address, which is not sufficient. So on 64 bits
+		 * systems, we use phys_to_virt() to get the virtual
+		 * address from the physical address, which is fine
+		 * because the kernel linear mapping includes the
+		 * entire 40 bits physical address space. On 32 bits
+		 * systems however, we can't use phys_to_virt(), but
+		 * since virtual addresses are 32 bits only, there is
+		 * enough space in the RX descriptor for the full
+		 * virtual address.
+		 */
+#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
+		dma_addr_t dma_addr =
+			rx_desc->pp22.buf_phys_addr_key_hash & DMA_BIT_MASK(40);
+		phys_addr_t phys_addr =
+			dma_to_phys(port->dev->dev.parent, dma_addr);
+
+		return (unsigned long)phys_to_virt(phys_addr);
+#else
+		return rx_desc->pp22.buf_cookie_misc & DMA_BIT_MASK(40);
+#endif
+	}
 }
 
 static size_t mvpp2_rxdesc_size_get(struct mvpp2_port *port,
 				    struct mvpp2_rx_desc *rx_desc)
 {
-	return rx_desc->pp21.data_size;
+	if (port->priv->hw_version == MVPP21)
+		return rx_desc->pp21.data_size;
+	else
+		return rx_desc->pp22.data_size;
 }
 
 static u32 mvpp2_rxdesc_status_get(struct mvpp2_port *port,
 				   struct mvpp2_rx_desc *rx_desc)
 {
-	return rx_desc->pp21.status;
+	if (port->priv->hw_version == MVPP21)
+		return rx_desc->pp21.status;
+	else
+		return rx_desc->pp22.status;
 }
 
 static void mvpp2_txq_inc_get(struct mvpp2_txq_pcpu *txq_pcpu)