Message ID | 20081030142632.GA15645@csn.ul.ie (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Mel Gorman writes: > On some ppc64 machines, NVRAM is being corrupted very early in boot (before > console is initialised). The machine reboots and then fails to find yaboot > printing the error "PReP-BOOT: Unable to load PRep image". It's nowhere near > as serious as the ftrace+e1000 problem as the machine is not bricked but it's > fairly scary looking, the machine cannot boot and the fix is non-obvious. To > "fix" the machine; > > 1. Go to OpenFirmware prompt > 2. type dev nvram > 3. type wipe-nvram > > The machine will reboot, reconstruct the NVRAM using some magic and yaboot > work again allowing an older kernel to be used. I bisected the problem down > to this commit. Eek! Which ppc64 machines has this been seen on, and how were they being booted (netboot, yaboot, etc.)? Is it just the Powerstations with their SLOF-based firmware, or is it IBM pSeries machines as well? Paul.
On Fri, Oct 31, 2008 at 07:52:02AM +1100, Paul Mackerras wrote: >Mel Gorman writes: > >> On some ppc64 machines, NVRAM is being corrupted very early in boot (before >> console is initialised). The machine reboots and then fails to find yaboot >> printing the error "PReP-BOOT: Unable to load PRep image". It's nowhere near >> as serious as the ftrace+e1000 problem as the machine is not bricked but it's >> fairly scary looking, the machine cannot boot and the fix is non-obvious. To >> "fix" the machine; >> >> 1. Go to OpenFirmware prompt >> 2. type dev nvram >> 3. type wipe-nvram >> >> The machine will reboot, reconstruct the NVRAM using some magic and yaboot >> work again allowing an older kernel to be used. I bisected the problem down >> to this commit. > >Eek! > >Which ppc64 machines has this been seen on, and how were they being >booted (netboot, yaboot, etc.)? > >Is it just the Powerstations with their SLOF-based firmware, or is it >IBM pSeries machines as well? I'm pretty sure it was with pSeries machines. I saw reports of POWER5 being effected (p520 and p710). I believe one of them resolved the issue by upgrading firmware on the machine. josh
On Thu, 2008-10-30 at 17:05 -0400, Josh Boyer wrote: > On Fri, Oct 31, 2008 at 07:52:02AM +1100, Paul Mackerras wrote: > >Mel Gorman writes: > > > >> On some ppc64 machines, NVRAM is being corrupted very early in boot (before > >> console is initialised). The machine reboots and then fails to find yaboot > >> printing the error "PReP-BOOT: Unable to load PRep image". ... > >Eek! > > > >Which ppc64 machines has this been seen on, and how were they being > >booted (netboot, yaboot, etc.)? > > > >Is it just the Powerstations with their SLOF-based firmware, or is it > >IBM pSeries machines as well? > > I'm pretty sure it was with pSeries machines. I saw reports of POWER5 > being effected (p520 and p710). I believe one of them resolved the > issue by upgrading firmware on the machine. This is true of a p720 (CHRP IBM,9124-720) that I was testing on. With upgraded firmware, the problem is gone.
On Fri, Oct 31, 2008 at 07:52:02AM +1100, Paul Mackerras wrote: > Mel Gorman writes: > > > On some ppc64 machines, NVRAM is being corrupted very early in boot (before > > console is initialised). The machine reboots and then fails to find yaboot > > printing the error "PReP-BOOT: Unable to load PRep image". It's nowhere near > > as serious as the ftrace+e1000 problem as the machine is not bricked but it's > > fairly scary looking, the machine cannot boot and the fix is non-obvious. To > > "fix" the machine; > > > > 1. Go to OpenFirmware prompt > > 2. type dev nvram > > 3. type wipe-nvram > > > > The machine will reboot, reconstruct the NVRAM using some magic and yaboot > > work again allowing an older kernel to be used. I bisected the problem down > > to this commit. > > Eek! > > Which ppc64 machines has this been seen on, and how were they being > booted (netboot, yaboot, etc.)? > Yaboot in my case and I've heard it affected a DVD installation. I don't know for sure if it affects netboot but as I think it's something the kernel is doing, it probably doesn't matter how it gets loaded? > Is it just the Powerstations with their SLOF-based firmware, or is it > IBM pSeries machines as well? > To be honest, I haven't been brave enough to try this on a Powerstation yet as I only have the one and I don't know if it's a) affected or b) fixable with the same workaround. It was an IBM pSeries that was affected in my case and a few people have hit the problem on pSeries AFARIK. It's been pointed out that it can be "fixed" by upgrading the firmware but surely we can avoid breaking the machine in the first place?
Mel Gorman writes: > Yaboot in my case and I've heard it affected a DVD installation. I don't > know for sure if it affects netboot but as I think it's something the > kernel is doing, it probably doesn't matter how it gets loaded? What changed in that commit was the contents of a couple of structures that the firmware looks at to see what the kernel wants from firmware. Specifically the change was to say that the kernel (or really the zImage wrapper) would like the firmware to be based at the 32MB point (which is what AIX uses) rather than 12MB (which was the default on older machines). So, as I understand it, it's not anything the kernel is actively doing, it's how the firmware is reacting to what the kernel says it wants. And since we are requesting the same value as AIX (as far as I know) I'm really surprised it caused problems. We can revert that commit, but I still need to solve the problem that the distros are facing, namely that their installer kernel + initramfs images are now bigger than 12MB and can't be loaded if the firmware is based at 12MB. That's why I really want to understand the problem in more detail. > It's been pointed out that it can be "fixed" by upgrading the firmware but > surely we can avoid breaking the machine in the first place? Have you upgraded the firmware on the machine you saw this problem on? If not, would you be willing to run some tests for me? Paul.
Mel Gorman writes: > Yaboot in my case and I've heard it affected a DVD installation. I don't > know for sure if it affects netboot but as I think it's something the > kernel is doing, it probably doesn't matter how it gets loaded? I do need to know whether it was the vmlinux or the zImage.pseries that you were loading with yaboot. That commit you identified affects the contents of an ELF note in the zImage.pseries that firmware looks at, as well as a structure in the kernel itself that gets passed as an argument to a call to firmware. If you were loading a vmlinux with yaboot when you saw the corruption occur then that narrows things down a bit. Paul.
On Fri, Oct 31, 2008 at 10:10:55PM +1100, Paul Mackerras wrote: > Mel Gorman writes: > > > Yaboot in my case and I've heard it affected a DVD installation. I don't > > know for sure if it affects netboot but as I think it's something the > > kernel is doing, it probably doesn't matter how it gets loaded? > > What changed in that commit was the contents of a couple of structures > that the firmware looks at to see what the kernel wants from > firmware. Specifically the change was to say that the kernel (or > really the zImage wrapper) would like the firmware to be based at the > 32MB point (which is what AIX uses) rather than 12MB (which was the > default on older machines). > > So, as I understand it, it's not anything the kernel is actively > doing, it's how the firmware is reacting to what the kernel says it > wants. And since we are requesting the same value as AIX (as far as I > know) I'm really surprised it caused problems. > Same here, it sounds like an innocent change. While it is possible that AIX could not work on this machine, it seems a bit unlikely. > We can revert that commit, but I still need to solve the problem that > the distros are facing, namely that their installer kernel + initramfs > images are now bigger than 12MB and can't be loaded if the firmware is > based at 12MB. That's why I really want to understand the problem in > more detail. > > > It's been pointed out that it can be "fixed" by upgrading the firmware but > > surely we can avoid breaking the machine in the first place? > > Have you upgraded the firmware on the machine you saw this problem on? No. Luckily for us, it was scheduled to be upgraded but it got delayed :). I've asked the guy to go somewhere else for a while so I should be able to keep the machine in the state it's currently in. > If not, would you be willing to run some tests for me? > Of course.
On Fri, 2008-10-31 at 22:18 +1100, Paul Mackerras wrote: > Mel Gorman writes: > > > Yaboot in my case and I've heard it affected a DVD installation. I don't > > know for sure if it affects netboot but as I think it's something the > > kernel is doing, it probably doesn't matter how it gets loaded? > > I do need to know whether it was the vmlinux or the zImage.pseries > that you were loading with yaboot. That commit you identified affects > the contents of an ELF note in the zImage.pseries that firmware looks > at, as well as a structure in the kernel itself that gets passed as an > argument to a call to firmware. If you were loading a vmlinux with > yaboot when you saw the corruption occur then that narrows things down > a bit. Unless missed something, I think it's narrowed already. When loaded from yaboot, there is no relevant difference between zImage and vmlinux here. IE. yaboot parses the ELF header of the zImage itself and ignores the special notes anyway so only the CAS firmware call is relevant in both cases, no ? Cheers, Ben.
On Fri, Oct 31, 2008 at 10:18:38PM +1100, Paul Mackerras wrote: > Mel Gorman writes: > > > Yaboot in my case and I've heard it affected a DVD installation. I don't > > know for sure if it affects netboot but as I think it's something the > > kernel is doing, it probably doesn't matter how it gets loaded? > > I do need to know whether it was the vmlinux or the zImage.pseries > that you were loading with yaboot. That commit you identified affects > the contents of an ELF note in the zImage.pseries that firmware looks > at, as well as a structure in the kernel itself that gets passed as an > argument to a call to firmware. If you were loading a vmlinux with > yaboot when you saw the corruption occur then that narrows things down > a bit. > It's the vmlinux file I am seeing problems with.
Benjamin Herrenschmidt writes: > Unless missed something, I think it's narrowed already. When loaded from > yaboot, there is no relevant difference between zImage and vmlinux here. > IE. yaboot parses the ELF header of the zImage itself and ignores the > special notes anyway so only the CAS firmware call is relevant in both > cases, no ? Good point. However, it would be the parse-elf-header firmware call, rather than the CAS firmware call, since 91a00302 modified the fake_elf structure (to make it consistent with the CAS structure) but not the CAS structure. Paul.
diff --git a/arch/powerpc/boot/addnote.c b/arch/powerpc/boot/addnote.c index b1e5611..dcc9ab2 100644 --- a/arch/powerpc/boot/addnote.c +++ b/arch/powerpc/boot/addnote.c @@ -11,7 +11,12 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * - * Usage: addnote zImage + * Usage: addnote zImage [note.elf] + * + * If note.elf is supplied, it is the name of an ELF file that contains + * an RPA note to use instead of the built-in one. Alternatively, the + * note.elf file may be empty, in which case the built-in RPA note is + * used (this is to simplify how this is invoked from the wrapper script). */ #include <stdio.h> #include <stdlib.h> @@ -43,27 +48,29 @@ char rpaname[] = "IBM,RPA-Client-Config"; */ #define N_RPA_DESCR 8 unsigned int rpanote[N_RPA_DESCR] = { - 0, /* lparaffinity */ - 64, /* min_rmo_size */ + 1, /* lparaffinity */ + 128, /* min_rmo_size */ 0, /* min_rmo_percent */ - 40, /* max_pft_size */ + 46, /* max_pft_size */ 1, /* splpar */ -1, /* min_load */ - 0, /* new_mem_def */ - 1, /* ignore_my_client_config */ + 1, /* new_mem_def */ + 0, /* ignore_my_client_config */ }; #define ROUNDUP(len) (((len) + 3) & ~3) unsigned char buf[512]; +unsigned char notebuf[512]; -#define GET_16BE(off) ((buf[off] << 8) + (buf[(off)+1])) -#define GET_32BE(off) ((GET_16BE(off) << 16) + GET_16BE((off)+2)) +#define GET_16BE(b, off) (((b)[off] << 8) + ((b)[(off)+1])) +#define GET_32BE(b, off) ((GET_16BE((b), (off)) << 16) + \ + GET_16BE((b), (off)+2)) -#define PUT_16BE(off, v) (buf[off] = ((v) >> 8) & 0xff, \ - buf[(off) + 1] = (v) & 0xff) -#define PUT_32BE(off, v) (PUT_16BE((off), (v) >> 16), \ - PUT_16BE((off) + 2, (v))) +#define PUT_16BE(b, off, v) ((b)[off] = ((v) >> 8) & 0xff, \ + (b)[(off) + 1] = (v) & 0xff) +#define PUT_32BE(b, off, v) (PUT_16BE((b), (off), (v) >> 16), \ + PUT_16BE((b), (off) + 2, (v))) /* Structure of an ELF file */ #define E_IDENT 0 /* ELF header */ @@ -88,15 +95,71 @@ unsigned char buf[512]; unsigned char elf_magic[4] = { 0x7f, 'E', 'L', 'F' }; +unsigned char *read_rpanote(const char *fname, int *nnp) +{ + int notefd, nr, i; + int ph, ps, np; + int note, notesize; + + notefd = open(fname, O_RDONLY); + if (notefd < 0) { + perror(fname); + exit(1); + } + nr = read(notefd, notebuf, sizeof(notebuf)); + if (nr < 0) { + perror("read note"); + exit(1); + } + if (nr == 0) /* empty file */ + return NULL; + if (nr < E_HSIZE || + memcmp(¬ebuf[E_IDENT+EI_MAGIC], elf_magic, 4) != 0 || + notebuf[E_IDENT+EI_CLASS] != ELFCLASS32 || + notebuf[E_IDENT+EI_DATA] != ELFDATA2MSB) + goto notelf; + close(notefd); + + /* now look for the RPA-note */ + ph = GET_32BE(notebuf, E_PHOFF); + ps = GET_16BE(notebuf, E_PHENTSIZE); + np = GET_16BE(notebuf, E_PHNUM); + if (ph < E_HSIZE || ps < PH_HSIZE || np < 1) + goto notelf; + + for (i = 0; i < np; ++i, ph += ps) { + if (GET_32BE(notebuf, ph + PH_TYPE) != PT_NOTE) + continue; + note = GET_32BE(notebuf, ph + PH_OFFSET); + notesize = GET_32BE(notebuf, ph + PH_FILESZ); + if (notesize < 34 || note + notesize > nr) + continue; + if (GET_32BE(notebuf, note) != strlen(rpaname) + 1 || + GET_32BE(notebuf, note + 8) != 0x12759999 || + strcmp((char *)¬ebuf[note + 12], rpaname) != 0) + continue; + /* looks like an RPA note, return it */ + *nnp = notesize; + return ¬ebuf[note]; + } + /* no RPA note found */ + return NULL; + + notelf: + fprintf(stderr, "%s is not a big-endian 32-bit ELF image\n", fname); + exit(1); +} + int main(int ac, char **av) { int fd, n, i; int ph, ps, np; int nnote, nnote2, ns; + unsigned char *rpap; - if (ac != 2) { - fprintf(stderr, "Usage: %s elf-file\n", av[0]); + if (ac != 2 && ac != 3) { + fprintf(stderr, "Usage: %s elf-file [rpanote.elf]\n", av[0]); exit(1); } fd = open(av[1], O_RDWR); @@ -107,6 +170,7 @@ main(int ac, char **av) nnote = 12 + ROUNDUP(strlen(arch) + 1) + sizeof(descr); nnote2 = 12 + ROUNDUP(strlen(rpaname) + 1) + sizeof(rpanote); + rpap = NULL; n = read(fd, buf, sizeof(buf)); if (n < 0) { @@ -124,16 +188,19 @@ main(int ac, char **av) exit(1); } - ph = GET_32BE(E_PHOFF); - ps = GET_16BE(E_PHENTSIZE); - np = GET_16BE(E_PHNUM); + if (ac == 3) + rpap = read_rpanote(av[2], &nnote2); + + ph = GET_32BE(buf, E_PHOFF); + ps = GET_16BE(buf, E_PHENTSIZE); + np = GET_16BE(buf, E_PHNUM); if (ph < E_HSIZE || ps < PH_HSIZE || np < 1) goto notelf; if (ph + (np + 2) * ps + nnote + nnote2 > n) goto nospace; for (i = 0; i < np; ++i) { - if (GET_32BE(ph + PH_TYPE) == PT_NOTE) { + if (GET_32BE(buf, ph + PH_TYPE) == PT_NOTE) { fprintf(stderr, "%s already has a note entry\n", av[1]); exit(0); @@ -148,37 +215,42 @@ main(int ac, char **av) /* fill in the program header entry */ ns = ph + 2 * ps; - PUT_32BE(ph + PH_TYPE, PT_NOTE); - PUT_32BE(ph + PH_OFFSET, ns); - PUT_32BE(ph + PH_FILESZ, nnote); + PUT_32BE(buf, ph + PH_TYPE, PT_NOTE); + PUT_32BE(buf, ph + PH_OFFSET, ns); + PUT_32BE(buf, ph + PH_FILESZ, nnote); /* fill in the note area we point to */ /* XXX we should probably make this a proper section */ - PUT_32BE(ns, strlen(arch) + 1); - PUT_32BE(ns + 4, N_DESCR * 4); - PUT_32BE(ns + 8, 0x1275); + PUT_32BE(buf, ns, strlen(arch) + 1); + PUT_32BE(buf, ns + 4, N_DESCR * 4); + PUT_32BE(buf, ns + 8, 0x1275); strcpy((char *) &buf[ns + 12], arch); ns += 12 + strlen(arch) + 1; for (i = 0; i < N_DESCR; ++i, ns += 4) - PUT_32BE(ns, descr[i]); + PUT_32BE(buf, ns, descr[i]); /* fill in the second program header entry and the RPA note area */ ph += ps; - PUT_32BE(ph + PH_TYPE, PT_NOTE); - PUT_32BE(ph + PH_OFFSET, ns); - PUT_32BE(ph + PH_FILESZ, nnote2); + PUT_32BE(buf, ph + PH_TYPE, PT_NOTE); + PUT_32BE(buf, ph + PH_OFFSET, ns); + PUT_32BE(buf, ph + PH_FILESZ, nnote2); /* fill in the note area we point to */ - PUT_32BE(ns, strlen(rpaname) + 1); - PUT_32BE(ns + 4, sizeof(rpanote)); - PUT_32BE(ns + 8, 0x12759999); - strcpy((char *) &buf[ns + 12], rpaname); - ns += 12 + ROUNDUP(strlen(rpaname) + 1); - for (i = 0; i < N_RPA_DESCR; ++i, ns += 4) - PUT_32BE(ns, rpanote[i]); + if (rpap) { + /* RPA note supplied in file, just copy the whole thing over */ + memcpy(buf + ns, rpap, nnote2); + } else { + PUT_32BE(buf, ns, strlen(rpaname) + 1); + PUT_32BE(buf, ns + 4, sizeof(rpanote)); + PUT_32BE(buf, ns + 8, 0x12759999); + strcpy((char *) &buf[ns + 12], rpaname); + ns += 12 + ROUNDUP(strlen(rpaname) + 1); + for (i = 0; i < N_RPA_DESCR; ++i, ns += 4) + PUT_32BE(buf, ns, rpanote[i]); + } /* Update the number of program headers */ - PUT_16BE(E_PHNUM, np + 2); + PUT_16BE(buf, E_PHNUM, np + 2); /* write back */ lseek(fd, (long) 0, SEEK_SET); diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper index 965c237..ee0dc41 100755 --- a/arch/powerpc/boot/wrapper +++ b/arch/powerpc/boot/wrapper @@ -307,7 +307,9 @@ fi # post-processing needed for some platforms case "$platform" in pseries|chrp) - $objbin/addnote "$ofile" + ${CROSS}objcopy -O binary -j .fakeelf "$kernel" "$ofile".rpanote + $objbin/addnote "$ofile" "$ofile".rpanote + rm -r "$ofile".rpanote ;; coff) ${CROSS}objcopy -O aixcoff-rs6000 --set-start "$entry" "$ofile" diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 7cf274a..2fdbc18 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -732,7 +732,7 @@ static struct fake_elf { u32 ignore_me; } rpadesc; } rpanote; -} fake_elf = { +} fake_elf __section(.fakeelf) = { .elfhdr = { .e_ident = { 0x7f, 'E', 'L', 'F', ELFCLASS32, ELFDATA2MSB, EV_CURRENT }, @@ -774,13 +774,13 @@ static struct fake_elf { .type = 0x12759999, .name = "IBM,RPA-Client-Config", .rpadesc = { - .lpar_affinity = 0, - .min_rmo_size = 64, /* in megabytes */ + .lpar_affinity = 1, + .min_rmo_size = 128, /* in megabytes */ .min_rmo_percent = 0, - .max_pft_size = 48, /* 2^48 bytes max PFT size */ + .max_pft_size = 46, /* 2^46 bytes max PFT size */ .splpar = 1, .min_load = ~0U, - .new_mem_def = 0 + .new_mem_def = 1 } } }; diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index e6927fb..b39c27e 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -203,6 +203,9 @@ SECTIONS *(.rela*) } + /* Fake ELF header containing RPA note; for addnote */ + .fakeelf : AT(ADDR(.fakeelf) - LOAD_OFFSET) { *(.fakeelf) } + /* freed after init ends here */ . = ALIGN(PAGE_SIZE); __init_end = .;