diff mbox

[RFC] powerpc: fsl_pci: fix inbound ATMU entries for systems with >4G RAM

Message ID 20160824194614.4014-1-theidsieck@leenox.de (mailing list archive)
State Superseded, archived
Delegated to: Scott Wood
Headers show

Commit Message

Tillmann Heidsieck Aug. 24, 2016, 7:46 p.m. UTC
For systems with >4G of RAM, the current implementation adds a second
inbound PCIe window starting at 128G this leaves all memory from 4G to
128G inaccessible to inbound PCIe transactions. The according errors
can be observed by using the EDAC driver for MPC85XX.

This patch changes this behaviour by adding the second window starting
at 4G. The current implementation still leaves memory beyond 68G
unmapped as this would require yet another ATMU entry.

Tested on a T4240 with 12G of RAM and an AMD E6760 PCIe card working with
the in-tree radeon driver.

Signed-off-by: Tillmann Heidsieck <theidsieck@leenox.de>
---
 arch/powerpc/sysdev/fsl_pci.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

Comments

Tillmann Heidsieck Aug. 26, 2016, 5:26 a.m. UTC | #1
Hello Scott,

thanks for the fast reply!

On 2016-08-24 23:39, Scott Wood wrote:
[..]
> 
> The second inbound window is at 256G, and it maps all of RAM.  Note 
> that
> for accesses in this window, the PCI address is different from the host
> address.

I probably really misunderstand the manual here ...

1 << 40 = 0x100_00000000 right? So if I put

(1 << 40) >> 44 = 0 in PEXIWBEAR and
(1 << 40) >> 12 = 0x10000000 in PEXIWBAR

the window base (extended) address would be (1 << 40) right? And at the
risk of showing my complete lack of skill doing basic arithmetic ...
isn't (1 << 40) = 128GB? So what am I missing here?

I also read that the maximum allowed window size for PCIe inbound 
windows
is 64G this would result in the need for more ATMU entries in case of 
 >68G
system memory (the question is whether this needs to be fixed because
maybe nobody wants to build such a computer).

> 
>> The according errors can be observed by using the EDAC driver for 
>> MPC85XX.
>> 
>> This patch changes this behaviour by adding the second window starting
>> at 4G. The current implementation still leaves memory beyond 68G
>> unmapped as this would require yet another ATMU entry.
> 
> Windows must be size-aligned, as per the description of PEXIWBARn[WBA].
>  This window is illegal.

OK, I did mess up that one. I fiddled around with the starting address 
and it
seems that the chip might align the window silently, making the window 
start
at 0x0.  This solves the puzzle why this window works for me. However, 
this
means that my solution is still wrong because of the illegal window size 
as
well as that it (potentially) does not map the whole memory.

> 
> By trying to identity-map everything you also break the assumption that
> we don't need swiotlb to avoid the PEXCSRBAR region on PCI devices
> capable of generating 64-bit addresses.

Can you point me to where I might read up on this? I know far to little
about all of this, that's for sure.

> 
>> Tested on a T4240 with 12G of RAM and an AMD E6760 PCIe card working 
>> with
>> the in-tree radeon driver.
> 
> I have no problem using an e1000e card on a t4240 with 24 GiB RAM.
> 
> Looking at the radeon driver I see that there's a possibility of a dma
> mask that is exactly 40 bits.  I think there's an off-by-one error in
> fsl_pci_dma_set_mask().  If you change "dma_mask >=
> DMA_BIT_MASK(MAX_PHYS_ADDR_BITS)" to "dma_mask >
> DMA_BIT_MASK(MAX_PHYS_ADDR_BITS)", does that make radeon work without
> this patch?

This works in the sense that the radeon driver uses only 32bit dma 
addresses
after applying it.
I don't think that is what was intended since the card clearly works 
which
higher addresses.

To elaborate on the problem I am trying to fix. After applying my 
accepted EDAC
patch. I get errors like this (in this case for the snd device which is 
part of
the radeon card)

[    8.429635] PCIe ERR_DR register: 0x00100000
[    8.433893] PCIe ERR_CAP_STAT register: 0x00000005
[    8.438671] PCIe ERR_CAP_R0 register: 0x04000020
[    8.443276] PCIe ERR_CAP_R1 register: 0xff000103
[    8.447882] PCIe ERR_CAP_R2 register: 0x02000000
[    8.452486] PCIe ERR_CAP_R3 register: 0x0080a4ec

which I read as
"There is some incoming PCIe transaction to address 0x2_ec4a8000 and 
there is
no mapping for it".

According to the radeon driver it did put one of the rings to this 
address
(viewed from the CPU).

> 
> 
>> Signed-off-by: Tillmann Heidsieck <theidsieck@leenox.de>
>> ---
>>  arch/powerpc/sysdev/fsl_pci.c | 40 
>> ++++++++++++++++++++--------------------
>>  1 file changed, 20 insertions(+), 20 deletions(-)
>> 
>> diff --git a/arch/powerpc/sysdev/fsl_pci.c 
>> b/arch/powerpc/sysdev/fsl_pci.c
>> index 0ef9df49f0f2..260983037904 100644
>> --- a/arch/powerpc/sysdev/fsl_pci.c
>> +++ b/arch/powerpc/sysdev/fsl_pci.c
>> @@ -349,17 +349,13 @@ static void setup_pci_atmu(struct pci_controller 
>> *hose)
>>  	}
>> 
>>  	sz = min(mem, paddr_lo);
>> -	mem_log = ilog2(sz);
>> +	mem_log = order_base_2(sz);
>> 
>>  	/* PCIe can overmap inbound & outbound since RX & TX are separated 
>> */
>>  	if (early_find_capability(hose, 0, 0, PCI_CAP_ID_EXP)) {
>> -		/* Size window to exact size if power-of-two or one size up */
>> -		if ((1ull << mem_log) != mem) {
>> -			mem_log++;
>> -			if ((1ull << mem_log) > mem)
>> -				pr_info("%s: Setting PCI inbound window "
>> -					"greater than memory size\n", name);
>> -		}
>> +		if ((1ull << mem_log) > mem)
>> +			pr_info("%s: Setting PCI inbound window greater than memory 
>> size\n",
>> +				name);
>> 
>>  		piwar |= ((mem_log - 1) & PIWAR_SZ_MASK);
>> 
> 
> This change is unrelated.

yeah sorry for sneaking that one in here, are you interested in this 
change?
It cleans up the code a little bit by using existing functions. I think 
it
makes the intend more clear but that's up for interpretation ;-)

> 
> BTW, for some reason your patch is not showing up in Patchwork.

Are there some known pitfalls when sending patches to Patchwork?

> 
> -Scott

Thanks for your help
- Tillmann
Scott Wood Aug. 26, 2016, 5:55 a.m. UTC | #2
On 08/26/2016 12:26 AM, Tillmann Heidsieck wrote:
> Hello Scott,
> 
> thanks for the fast reply!
> 
> On 2016-08-24 23:39, Scott Wood wrote:
> [..]
>>
>> The second inbound window is at 256G, and it maps all of RAM.  Note 
>> that
>> for accesses in this window, the PCI address is different from the host
>> address.
> 
> I probably really misunderstand the manual here ...
> 
> 1 << 40 = 0x100_00000000 right? So if I put
> 
> (1 << 40) >> 44 = 0 in PEXIWBEAR and
> (1 << 40) >> 12 = 0x10000000 in PEXIWBAR
> 
> the window base (extended) address would be (1 << 40) right? And at the
> risk of showing my complete lack of skill doing basic arithmetic ...
> isn't (1 << 40) = 128GB? So what am I missing here?

Sorry, I meant 1 TiB, not 256 GiB.  1 << 40 = 1 TiB.

> I also read that the maximum allowed window size for PCIe inbound 
> windows
> is 64G this would result in the need for more ATMU entries in case of 
>  >68G
> system memory (the question is whether this needs to be fixed because
> maybe nobody wants to build such a computer).

Hmm, it does say that.  This code was written when the physical address
size was 36 bits rather than 40.  I wonder if there's a real 64 GiB
limitation or if the manual just wasn't updated when the physical
address size grew... though in any case we should follow what the manual
says unless it's confirmed otherwise.

>> By trying to identity-map everything you also break the assumption that
>> we don't need swiotlb to avoid the PEXCSRBAR region on PCI devices
>> capable of generating 64-bit addresses.
> 
> Can you point me to where I might read up on this? I know far to little
> about all of this, that's for sure.

PEXCSRBAR (called PCICSRBAR in the Linux driver) is an inbound window
intended for MSIs, configured via a config space BAR rather than the
MMIO ATMU registers.  It only accepts 32-bit PCI addresses and can't be
disabled or pointed at anything but CCSR.  Any RAM that aliases the
PEXCSRBAR PCI address can't be DMAed to without either using swiotlb or
a non-identity mapping.

>>> Tested on a T4240 with 12G of RAM and an AMD E6760 PCIe card working 
>>> with
>>> the in-tree radeon driver.
>>
>> I have no problem using an e1000e card on a t4240 with 24 GiB RAM.
>>
>> Looking at the radeon driver I see that there's a possibility of a dma
>> mask that is exactly 40 bits.  I think there's an off-by-one error in
>> fsl_pci_dma_set_mask().  If you change "dma_mask >=
>> DMA_BIT_MASK(MAX_PHYS_ADDR_BITS)" to "dma_mask >
>> DMA_BIT_MASK(MAX_PHYS_ADDR_BITS)", does that make radeon work without
>> this patch?
> 
> This works in the sense that the radeon driver uses only 32bit dma 
> addresses
> after applying it.
> I don't think that is what was intended since the card clearly works 
> which
> higher addresses.

Right, step one is to fix the bug.  Step two is to make things faster. :-)

For the second step I suggest setting the high window's address/size
based on the amount of RAM present (rounded up to a power of 2) rather
than trying to map the entire 40-bit address space.  Thus if you have 12
GiB RAM, you'd get a 16 GiB window at PCI address 16 GiB, and any PCI
device with a DMA mask of at least 35 bits could use direct DMA to the
high window.  I'll put a patch together.

>>
>> This change is unrelated.
> 
> yeah sorry for sneaking that one in here, are you interested in this 
> change?
> It cleans up the code a little bit by using existing functions. I think 
> it
> makes the intend more clear but that's up for interpretation ;-)

Sure.  If you resend it, please CC me at oss@buserror.net rather than
the NXP address.

BTW, we can probably just drop that "Setting PCI inbound window greater
than memory size" message.  We routinely silently map beyond the end of
RAM with the high mapping...

>> BTW, for some reason your patch is not showing up in Patchwork.
> 
> Are there some known pitfalls when sending patches to Patchwork?

It's not the first time I've seen certain people's patches not show up
there, but I don't know what the root cause is.

-Scott
Scott Wood Aug. 26, 2016, 6:41 a.m. UTC | #3
On 08/26/2016 12:55 AM, Scott Wood wrote:
> On 08/26/2016 12:26 AM, Tillmann Heidsieck wrote:
>> On 2016-08-24 23:39, Scott Wood wrote:
>>> BTW, for some reason your patch is not showing up in Patchwork.
>>
>> Are there some known pitfalls when sending patches to Patchwork?
> 
> It's not the first time I've seen certain people's patches not show up
> there, but I don't know what the root cause is.

I do see the patch on Patchwork now; I guess it was just slow.

-Scott
diff mbox

Patch

diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 0ef9df49f0f2..260983037904 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -349,17 +349,13 @@  static void setup_pci_atmu(struct pci_controller *hose)
 	}
 
 	sz = min(mem, paddr_lo);
-	mem_log = ilog2(sz);
+	mem_log = order_base_2(sz);
 
 	/* PCIe can overmap inbound & outbound since RX & TX are separated */
 	if (early_find_capability(hose, 0, 0, PCI_CAP_ID_EXP)) {
-		/* Size window to exact size if power-of-two or one size up */
-		if ((1ull << mem_log) != mem) {
-			mem_log++;
-			if ((1ull << mem_log) > mem)
-				pr_info("%s: Setting PCI inbound window "
-					"greater than memory size\n", name);
-		}
+		if ((1ull << mem_log) > mem)
+			pr_info("%s: Setting PCI inbound window greater than memory size\n",
+				name);
 
 		piwar |= ((mem_log - 1) & PIWAR_SZ_MASK);
 
@@ -379,23 +375,27 @@  static void setup_pci_atmu(struct pci_controller *hose)
 		 * let devices that are 64-bit address capable to work w/o
 		 * SWIOTLB and access the full range of memory
 		 */
-		if (sz != mem) {
-			mem_log = ilog2(mem);
-
-			/* Size window up if we dont fit in exact power-of-2 */
-			if ((1ull << mem_log) != mem)
-				mem_log++;
-
+		if (mem > 0x100000000ULL) {
+			/* ok we mapped 4G in WIN1, now lets see how much memory
+			 * is left un-mapped and calculate the log, also
+			 * make sure we dont have a window lager then 64G
+			 */
+			mem_log = order_base_2(min(mem - sz, 1ULL << 36));
 			piwar = (piwar & ~PIWAR_SZ_MASK) | (mem_log - 1);
 
 			if (setup_inbound) {
-				/* Setup inbound memory window */
-				out_be32(&pci->piw[win_idx].pitar,  0x00000000);
+				/* Setup inbound memory window
+				 * The windows starts at 4G and spans all the
+				 * remaining memory aka (mem - 4G)
+				 */
+				out_be32(&pci->piw[win_idx].pitar,
+					 0x00000000);
 				out_be32(&pci->piw[win_idx].piwbear,
-						pci64_dma_offset >> 44);
+					 0x00000000);
 				out_be32(&pci->piw[win_idx].piwbar,
-						pci64_dma_offset >> 12);
-				out_be32(&pci->piw[win_idx].piwar,  piwar);
+					 0x100000000ULL >> 12);
+				out_be32(&pci->piw[win_idx].piwar,
+					 piwar);
 			}
 
 			/*