Message ID | 700c59731cf97778d3a4.1226448406@localhost.localdomain (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Josh Boyer |
Headers | show |
On Tue, Nov 11, 2008 at 06:06:46PM -0600, Hollis Blanchard wrote: > The current CHIP11 errata truncates the device tree memory node, and subtracts > (hardcoded) 4096 bytes. This breaks kernels with larger PAGE_SIZE, since the > bootmem allocator assumes that total memory is a multiple of PAGE_SIZE. > > Instead, use a device tree memory reservation to reserve only the 256 bytes > actually affected by the errata, leaving the total memory size unaltered. > > Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> libfdt usage changes look fine to me. Acked-by: David Gibson <david@gibson.dropbear.id.au>
On Tue, 2008-11-11 at 18:06 -0600, Hollis Blanchard wrote: > The current CHIP11 errata truncates the device tree memory node, and subtracts > (hardcoded) 4096 bytes. This breaks kernels with larger PAGE_SIZE, since the > bootmem allocator assumes that total memory is a multiple of PAGE_SIZE. > > Instead, use a device tree memory reservation to reserve only the 256 bytes > actually affected by the errata, leaving the total memory size unaltered. > > Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> While I prefer this approach, won't it break kexec ? I don't understand why we don't just have a bit of code in the kernel itself that reserve that page on 44x at boot time and be done with it. It's like we are trying to be too smart and over-engineer the solution. Cheers, Ben.
On Wed, 12 Nov 2008 15:37:43 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Tue, 2008-11-11 at 18:06 -0600, Hollis Blanchard wrote: > > The current CHIP11 errata truncates the device tree memory node, and subtracts > > (hardcoded) 4096 bytes. This breaks kernels with larger PAGE_SIZE, since the > > bootmem allocator assumes that total memory is a multiple of PAGE_SIZE. > > > > Instead, use a device tree memory reservation to reserve only the 256 bytes > > actually affected by the errata, leaving the total memory size unaltered. > > > > Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> > > While I prefer this approach, won't it break kexec ? Break it how? Particularly given that kexec doesn't work on 4xx (yet). > I don't understand why we don't just have a bit of code in the kernel > itself that reserve that page on 44x at boot time and be done with it. > > It's like we are trying to be too smart and over-engineer the solution. I don't think that's it. I think it's more that we're opportunistic and the wrapper is the easiest place to do this, given that U-Boot itself will be doing the reserve for platforms that don't require the wrapper. So we could do the fixup in-kernel, but how do you do that deterministically given that U-Boot might have already done it? josh
On Wed, 2008-11-12 at 06:31 -0500, Josh Boyer wrote: > On Wed, 12 Nov 2008 15:37:43 +1100 > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > > On Tue, 2008-11-11 at 18:06 -0600, Hollis Blanchard wrote: > > > The current CHIP11 errata truncates the device tree memory node, and subtracts > > > (hardcoded) 4096 bytes. This breaks kernels with larger PAGE_SIZE, since the > > > bootmem allocator assumes that total memory is a multiple of PAGE_SIZE. > > > > > > Instead, use a device tree memory reservation to reserve only the 256 bytes > > > actually affected by the errata, leaving the total memory size unaltered. > > > > > > Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> > > > > While I prefer this approach, won't it break kexec ? > > Break it how? Particularly given that kexec doesn't work on 4xx (yet). Allright, wrong wording. It will make kexec more painful since it will have to also create that reserved area in the target DT. > I don't think that's it. I think it's more that we're opportunistic and > the wrapper is the easiest place to do this, given that U-Boot itself > will be doing the reserve for platforms that don't require the > wrapper. > > So we could do the fixup in-kernel, but how do you do that > deterministically given that U-Boot might have already done it? Bah, do you know many RAM chip that will chop off the last 4K ? I still find it a bit tricky to have memory nodes not aligned on nice fat big boundaries tho. Cheers, Ben.
On Wed, 2008-11-12 at 22:52 +1100, Benjamin Herrenschmidt wrote: > On Wed, 2008-11-12 at 06:31 -0500, Josh Boyer wrote: > > On Wed, 12 Nov 2008 15:37:43 +1100 > > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > > > > On Tue, 2008-11-11 at 18:06 -0600, Hollis Blanchard wrote: > > > > The current CHIP11 errata truncates the device tree memory node, and subtracts > > > > (hardcoded) 4096 bytes. This breaks kernels with larger PAGE_SIZE, since the > > > > bootmem allocator assumes that total memory is a multiple of PAGE_SIZE. > > > > > > > > Instead, use a device tree memory reservation to reserve only the 256 bytes > > > > actually affected by the errata, leaving the total memory size unaltered. > > > > > > > > Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> > > > > > > While I prefer this approach, won't it break kexec ? > > > > Break it how? Particularly given that kexec doesn't work on 4xx (yet). > > Allright, wrong wording. It will make kexec more painful since it will > have to also create that reserved area in the target DT. > > > I don't think that's it. I think it's more that we're opportunistic and > > the wrapper is the easiest place to do this, given that U-Boot itself > > will be doing the reserve for platforms that don't require the > > wrapper. > > > > So we could do the fixup in-kernel, but how do you do that > > deterministically given that U-Boot might have already done it? > > Bah, do you know many RAM chip that will chop off the last 4K ? Forget pages. The errata is about the last 256 bytes of physical memory. > I still find it a bit tricky to have memory nodes not aligned on nice > fat big boundaries tho. I don't know what you're referring to. The patch I sent doesn't touch memory nodes, so they are indeed still aligned on nice fat big boundaries. I don't think this is overengineering at all. We can't touch the last 256 bytes, so we mark it reserved, and then we won't. Altering memory nodes is far more complicated and error-prone.
On Wed, 2008-11-12 at 09:11 -0600, Hollis Blanchard wrote: > Forget pages. The errata is about the last 256 bytes of physical > memory. > > > I still find it a bit tricky to have memory nodes not aligned on > nice > > fat big boundaries tho. > > I don't know what you're referring to. The patch I sent doesn't touch > memory nodes, so they are indeed still aligned on nice fat big > boundaries. My last comment was about the approach of modifying the memory node. > I don't think this is overengineering at all. We can't touch the last > 256 bytes, so we mark it reserved, and then we won't. Altering memory > nodes is far more complicated and error-prone. But your approach is going to be painful for kexec which will have to duplicate that logic. Again, why can't we just stick something in the kernel code that reserves the last page ? It could be in prom.c or it could be called by affected 4xx platforms by the platform code, whatever, but the reserve map isn't really meant for that and will not be passed over from kernel to kernel by kexec. Ben.
On Thu, 13 Nov 2008 07:44:56 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Wed, 2008-11-12 at 09:11 -0600, Hollis Blanchard wrote: > > Forget pages. The errata is about the last 256 bytes of physical > > memory. > > > > > I still find it a bit tricky to have memory nodes not aligned on > > nice > > > fat big boundaries tho. > > > > I don't know what you're referring to. The patch I sent doesn't touch > > memory nodes, so they are indeed still aligned on nice fat big > > boundaries. > > My last comment was about the approach of modifying the memory node. > > > I don't think this is overengineering at all. We can't touch the last > > 256 bytes, so we mark it reserved, and then we won't. Altering memory > > nodes is far more complicated and error-prone. > > But your approach is going to be painful for kexec which will have to > duplicate that logic. > > Again, why can't we just stick something in the kernel code that > reserves the last page ? It could be in prom.c or it could be called by > affected 4xx platforms by the platform code, whatever, but the reserve > map isn't really meant for that and will not be passed over from kernel > to kernel by kexec. Again, because newer U-Boot is doing the fixup on memsize for us already. This is why it was done in the wrapper to begin with, since it depends on the version of U-Boot that you happen to be using. If you have a good idea on how to figure that out in-kernel, do the fixup when needed, and not make people's eyes bleed, I'm all for it. josh
On Thu, 2008-11-13 at 07:44 +1100, Benjamin Herrenschmidt wrote: > > Again, why can't we just stick something in the kernel code that > reserves the last page ? It could be in prom.c or it could be called by > affected 4xx platforms by the platform code, whatever, but the reserve > map isn't really meant for that and will not be passed over from kernel > to kernel by kexec. Reserving a page is overkill; only the last 256 bytes are affected. We need to intercept at the LMB level, because allocations are already done there, so by the time we hit bootmem it's way too late. I simply don't see a good place to do this in the kernel. It would have to be before the first lmb_alloc() call, which for safety would put it inside early_init_devtree() -- along with the other lmb_reserve() calls.[1] However, ppc_md.probe() hasn't even been called yet, so there's no way of knowing if we're on an affected system, unless you want to add a special of_scan_flat_dt() call here. I'm open to suggestions, but I don't see a better way than what I already sent. I think the important part is to call lmb_add() for all memory, but lmb_reserve() the last 256 bytes before lmb_alloc() happens. It sounds like kexec must have some knowledge of the platform and device tree already, so is this really a big deal? At any rate, this conversation is somewhat academic, since there is no kexec on 44x... so maybe this can be re-addressed when that becomes a real issue. [1] This is exactly where flat device tree reservations are done, and that's why the patch I submitted works.
diff --git a/arch/powerpc/boot/4xx.c b/arch/powerpc/boot/4xx.c --- a/arch/powerpc/boot/4xx.c +++ b/arch/powerpc/boot/4xx.c @@ -20,8 +20,9 @@ #include "ops.h" #include "reg.h" #include "dcr.h" +#include "libfdt/libfdt.h" -static unsigned long chip_11_errata(unsigned long memsize) +static void chip_11_errata(unsigned long memsize) { unsigned long pvr; @@ -31,13 +32,11 @@ static unsigned long chip_11_errata(unsi case 0x40000850: case 0x400008d0: case 0x200008d0: - memsize -= 4096; + fdt_add_mem_rsv(fdt, memsize - 256, 256); break; default: break; } - - return memsize; } /* Read the 4xx SDRAM controller to get size of system memory. */ @@ -53,7 +52,7 @@ void ibm4xx_sdram_fixup_memsize(void) memsize += SDRAM_CONFIG_BANK_SIZE(bank_config); } - memsize = chip_11_errata(memsize); + chip_11_errata(memsize); dt_fixup_memory(0, memsize); } @@ -219,7 +218,7 @@ void ibm4xx_denali_fixup_memsize(void) bank = 4; /* 4 banks */ memsize = cs * (1 << (col+row)) * bank * dpath; - memsize = chip_11_errata(memsize); + chip_11_errata(memsize); dt_fixup_memory(0, memsize); } diff --git a/arch/powerpc/boot/libfdt-wrapper.c b/arch/powerpc/boot/libfdt-wrapper.c --- a/arch/powerpc/boot/libfdt-wrapper.c +++ b/arch/powerpc/boot/libfdt-wrapper.c @@ -51,7 +51,7 @@ #define devp_offset_find(devp) (((int)(devp))-1) #define devp_offset(devp) (devp ? ((int)(devp))-1 : 0) -static void *fdt; +void *fdt; static void *buf; /* = NULL */ #define EXPAND_GRANULARITY 1024 diff --git a/arch/powerpc/boot/ops.h b/arch/powerpc/boot/ops.h --- a/arch/powerpc/boot/ops.h +++ b/arch/powerpc/boot/ops.h @@ -14,6 +14,7 @@ #include <stddef.h> #include "types.h" #include "string.h" +#include "libfdt_env.h" #define COMMAND_LINE_SIZE 512 #define MAX_PATH_LEN 256 @@ -32,6 +33,9 @@ struct platform_ops { void * (*vmlinux_alloc)(unsigned long size); }; extern struct platform_ops platform_ops; + +/* The device tree itself. Should almost always be accessed via dt_ops. */ +extern void *fdt; /* Device Tree operations */ struct dt_ops {
The current CHIP11 errata truncates the device tree memory node, and subtracts (hardcoded) 4096 bytes. This breaks kernels with larger PAGE_SIZE, since the bootmem allocator assumes that total memory is a multiple of PAGE_SIZE. Instead, use a device tree memory reservation to reserve only the 256 bytes actually affected by the errata, leaving the total memory size unaltered. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> --- Changes from v2: - David pointed out I'd duplicated the fdt_add_mem_rsv() prototype, and that 4xx.c should directly include libfdt/libfdt.h instead. Using large pages results in a huge performance improvement for KVM, and this patch is required to make Ilya's large page patch work. David and/or Josh, please apply.