Message ID | 20220504065536.3534488-1-aik@ozlabs.ru |
---|---|
State | New |
Headers | show |
Series | [qemu] spapr: Use address from elf parser for kernel address | expand |
Alexey Kardashevskiy <aik@ozlabs.ru> writes: > tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. > > QEMU loads the kernel at 0x400000 by default which works most of > the time as Linux kernels are relocatable, 64bit and compiled with "-pie" > (position independent code). This works for a little endian zImage too. > > However a big endian zImage is compiled without -pie, is 32bit, linked to > 0x4000000 so current QEMU ends up loading it at > 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. > > This uses the kernel address returned from load_elf(). > If the default kernel_addr is used, there is no change in behavior (as > translate_kernel_address() takes care of this), which is: > LE/BE vmlinux and LE zImage boot, BE zImage does not. > If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU > prints a warning and BE zImage boots. I think we can fix this without needing a different command line for BE zImage (apart from x-vof, which is a separate matter). If you look at translate_kernel_address, it cannot really work when the ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue, so if we fix that function like this... static uint64_t translate_kernel_address(void *opaque, uint64_t addr) { SpaprMachineState *spapr = opaque; return addr ? addr : spapr->kernel_addr; } ...then we could always use the ELF PhysAddr if it is different from 0 and only use the default load addr if the ELF PhysAddr is 0. If the user gives kernel_addr on the cmdline, we honor that, even if puts the kernel over the firmware (we have code to detect that). > @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) > exit(1); > } > > + if (spapr->kernel_addr != loaded_addr) { This could be: if (spapr->kernel_addr == KERNEL_LOAD_ADDR && spapr->kernel_addr != loaded_addr) { So the precedence would be: 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage falls here; 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall here; 3- kernel_addr. The user is probably hacking something, just use what they gave us. QEMU will yell if they load the kernel over the fw. > + warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", > + spapr->kernel_addr, loaded_addr); > + spapr->kernel_addr = loaded_addr; > + } > + > /* load initrd */ > if (initrd_filename) { > /* Try to locate the initrd in the gap between the kernel
On 5/5/22 05:16, Fabiano Rosas wrote: > Alexey Kardashevskiy <aik@ozlabs.ru> writes: > >> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. >> >> QEMU loads the kernel at 0x400000 by default which works most of >> the time as Linux kernels are relocatable, 64bit and compiled with "-pie" >> (position independent code). This works for a little endian zImage too. >> >> However a big endian zImage is compiled without -pie, is 32bit, linked to >> 0x4000000 so current QEMU ends up loading it at >> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. >> >> This uses the kernel address returned from load_elf(). >> If the default kernel_addr is used, there is no change in behavior (as >> translate_kernel_address() takes care of this), which is: >> LE/BE vmlinux and LE zImage boot, BE zImage does not. >> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU >> prints a warning and BE zImage boots. > > I think we can fix this without needing a different command line for BE > zImage (apart from x-vof, which is a separate matter). > > If you look at translate_kernel_address, it cannot really work when the > ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue, > so if we fix that function like this... > > static uint64_t translate_kernel_address(void *opaque, uint64_t addr) > { > SpaprMachineState *spapr = opaque; > > return addr ? addr : spapr->kernel_addr; > } The qemu elf loader is supposed to handle relocations which should be calling this hook more than once, now I wonder why it is not doing so. > ...then we could always use the ELF PhysAddr if it is different from 0 > and only use the default load addr if the ELF PhysAddr is 0. If the user > gives kernel_addr on the cmdline, we honor that, even if puts the kernel > over the firmware (we have code to detect that). ELF address is 0 for LE zImage only, vmlinux BE/LE uses 0xc000000000000000. And we are already chopping all these tops bits off in translate_kernel_address() and I do not really know why _exactly_ it is 0x0fffffff and not let's say 0x7fffffff. > >> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) >> exit(1); >> } >> >> + if (spapr->kernel_addr != loaded_addr) { > > This could be: > > if (spapr->kernel_addr == KERNEL_LOAD_ADDR && > spapr->kernel_addr != loaded_addr) { > > So the precedence would be: > > 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage > falls here; > > 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall > here; > > 3- kernel_addr. The user is probably hacking something, just use what > they gave us. QEMU will yell if they load the kernel over the fw. imho too complicated. What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)? I am basically fixing a bug when we ignore where load_elf() loaded the ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the ELF was loaded where we want it to be. Everything else can be done but on top of this. >> + warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", >> + spapr->kernel_addr, loaded_addr); >> + spapr->kernel_addr = loaded_addr; >> + } >> + >> /* load initrd */ >> if (initrd_filename) { >> /* Try to locate the initrd in the gap between the kernel
On Thu, 5 May 2022 at 03:31, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: > > > > On 5/5/22 05:16, Fabiano Rosas wrote: > > Alexey Kardashevskiy <aik@ozlabs.ru> writes: > > > >> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. > >> > >> QEMU loads the kernel at 0x400000 by default which works most of > >> the time as Linux kernels are relocatable, 64bit and compiled with "-pie" > >> (position independent code). This works for a little endian zImage too. > >> > >> However a big endian zImage is compiled without -pie, is 32bit, linked to > >> 0x4000000 so current QEMU ends up loading it at > >> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. > >> > >> This uses the kernel address returned from load_elf(). > >> If the default kernel_addr is used, there is no change in behavior (as > >> translate_kernel_address() takes care of this), which is: > >> LE/BE vmlinux and LE zImage boot, BE zImage does not. > >> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU > >> prints a warning and BE zImage boots. > > > > I think we can fix this without needing a different command line for BE > > zImage (apart from x-vof, which is a separate matter). > > > > If you look at translate_kernel_address, it cannot really work when the > > ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue, > > so if we fix that function like this... > > > > static uint64_t translate_kernel_address(void *opaque, uint64_t addr) > > { > > SpaprMachineState *spapr = opaque; > > > > return addr ? addr : spapr->kernel_addr; > > } > > > The qemu elf loader is supposed to handle relocations which should be > calling this hook more than once, now I wonder why it is not doing so. > > > > ...then we could always use the ELF PhysAddr if it is different from 0 > > and only use the default load addr if the ELF PhysAddr is 0. If the user > > gives kernel_addr on the cmdline, we honor that, even if puts the kernel > > over the firmware (we have code to detect that). > > > ELF address is 0 for LE zImage only, vmlinux BE/LE uses > 0xc000000000000000. And we are already chopping all these tops bits off > in translate_kernel_address() and I do not really know why _exactly_ it > is 0x0fffffff and not let's say 0x7fffffff. > > > > > >> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) > >> exit(1); > >> } > >> > >> + if (spapr->kernel_addr != loaded_addr) { > > > > This could be: > > > > if (spapr->kernel_addr == KERNEL_LOAD_ADDR && > > spapr->kernel_addr != loaded_addr) { > > > > So the precedence would be: > > > > 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage > > falls here; > > > > 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall > > here; > > > > 3- kernel_addr. The user is probably hacking something, just use what > > they gave us. QEMU will yell if they load the kernel over the fw. > > > imho too complicated. > > What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is > KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)? > > I am basically fixing a bug when we ignore where load_elf() loaded the > ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the > ELF was loaded where we want it to be. Everything else can be done but > on top of this. It would be good to fix this so we don't need to specify kernel-addr=0. I only recently learnt the pseries machine doesn't support loading the zImage: https://github.com/linuxppc/issues/issues/402 So whatever the fix is, writing down what is expected to work and what isn't would be useful. I tested your patch and it worked with this command line: qemu-system-ppc64 -M pseries,kernel-addr=0,x-vof=on -nographic -kernel arch/powerpc/boot/zImage.pseries -serial mon:stdio -nodefaults Tested-by: Joel Stanley <joel@jms.id.au> Cheers, Joel > > > >> + warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", > >> + spapr->kernel_addr, loaded_addr); > >> + spapr->kernel_addr = loaded_addr; > >> + } > >> + > >> /* load initrd */ > >> if (initrd_filename) { > >> /* Try to locate the initrd in the gap between the kernel >
On 5/5/22 14:16, Joel Stanley wrote: > On Thu, 5 May 2022 at 03:31, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: >> >> >> >> On 5/5/22 05:16, Fabiano Rosas wrote: >>> Alexey Kardashevskiy <aik@ozlabs.ru> writes: >>> >>>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. >>>> >>>> QEMU loads the kernel at 0x400000 by default which works most of >>>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie" >>>> (position independent code). This works for a little endian zImage too. >>>> >>>> However a big endian zImage is compiled without -pie, is 32bit, linked to >>>> 0x4000000 so current QEMU ends up loading it at >>>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. >>>> >>>> This uses the kernel address returned from load_elf(). >>>> If the default kernel_addr is used, there is no change in behavior (as >>>> translate_kernel_address() takes care of this), which is: >>>> LE/BE vmlinux and LE zImage boot, BE zImage does not. >>>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU >>>> prints a warning and BE zImage boots. >>> >>> I think we can fix this without needing a different command line for BE >>> zImage (apart from x-vof, which is a separate matter). >>> >>> If you look at translate_kernel_address, it cannot really work when the >>> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue, >>> so if we fix that function like this... >>> >>> static uint64_t translate_kernel_address(void *opaque, uint64_t addr) >>> { >>> SpaprMachineState *spapr = opaque; >>> >>> return addr ? addr : spapr->kernel_addr; >>> } >> >> >> The qemu elf loader is supposed to handle relocations which should be >> calling this hook more than once, now I wonder why it is not doing so. >> >> >>> ...then we could always use the ELF PhysAddr if it is different from 0 >>> and only use the default load addr if the ELF PhysAddr is 0. If the user >>> gives kernel_addr on the cmdline, we honor that, even if puts the kernel >>> over the firmware (we have code to detect that). >> >> >> ELF address is 0 for LE zImage only, vmlinux BE/LE uses >> 0xc000000000000000. And we are already chopping all these tops bits off >> in translate_kernel_address() and I do not really know why _exactly_ it >> is 0x0fffffff and not let's say 0x7fffffff. >> >> >>> >>>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) >>>> exit(1); >>>> } >>>> >>>> + if (spapr->kernel_addr != loaded_addr) { >>> >>> This could be: >>> >>> if (spapr->kernel_addr == KERNEL_LOAD_ADDR && >>> spapr->kernel_addr != loaded_addr) { >>> >>> So the precedence would be: >>> >>> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage >>> falls here; >>> >>> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall >>> here; >>> >>> 3- kernel_addr. The user is probably hacking something, just use what >>> they gave us. QEMU will yell if they load the kernel over the fw. >> >> >> imho too complicated. >> >> What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is >> KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)? >> >> I am basically fixing a bug when we ignore where load_elf() loaded the >> ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the >> ELF was loaded where we want it to be. Everything else can be done but >> on top of this. > > It would be good to fix this so we don't need to specify kernel-addr=0. This means the pseries code in QEMU needs to read the ELF header and figure out if it is position independent and what is the base address. And because it is Linux which is special, just reading the ELF header is not enough, need more heuristics (there is some already in translate_kernel_address()). LE vmlinux is 64bit EXEC type and entry=0xc000000000000000 BE vmlinux is 64bit EXEC type and entry=0xc000000000000000 LE zImage is 64bit DYN type and entry=0x0 BE zImage is 32bit EXEC type and entry=0x4000000 And the default address for these in QEMU is 0x400000. Asking kernel-addr=0 and vof=on looks like a small evil :) And also worth mentioning that with this hack it should be possible to boot grub.elf via -kernel which might be interesting for debugging, and that thing is linked to 0x200000 or so, and probably also 32bit BE (I do not have one handy). > I only recently learnt the pseries machine doesn't support loading the zImage: > > https://github.com/linuxppc/issues/issues/402 > > So whatever the fix is, writing down what is expected to work and what > isn't would be useful. > > I tested your patch and it worked with this command line: > > qemu-system-ppc64 -M pseries,kernel-addr=0,x-vof=on -nographic > -kernel arch/powerpc/boot/zImage.pseries -serial mon:stdio -nodefaults > > Tested-by: Joel Stanley <joel@jms.id.au> Cool thanks! > > Cheers, > > Joel > >> >> >>>> + warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", >>>> + spapr->kernel_addr, loaded_addr); >>>> + spapr->kernel_addr = loaded_addr; >>>> + } >>>> + >>>> /* load initrd */ >>>> if (initrd_filename) { >>>> /* Try to locate the initrd in the gap between the kernel >>
Alexey Kardashevskiy <aik@ozlabs.ru> writes: > On 5/5/22 05:16, Fabiano Rosas wrote: >> Alexey Kardashevskiy <aik@ozlabs.ru> writes: >> >>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. >>> >>> QEMU loads the kernel at 0x400000 by default which works most of >>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie" >>> (position independent code). This works for a little endian zImage too. >>> >>> However a big endian zImage is compiled without -pie, is 32bit, linked to >>> 0x4000000 so current QEMU ends up loading it at >>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. >>> >>> This uses the kernel address returned from load_elf(). >>> If the default kernel_addr is used, there is no change in behavior (as >>> translate_kernel_address() takes care of this), which is: >>> LE/BE vmlinux and LE zImage boot, BE zImage does not. >>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU >>> prints a warning and BE zImage boots. >> >> I think we can fix this without needing a different command line for BE >> zImage (apart from x-vof, which is a separate matter). >> >> If you look at translate_kernel_address, it cannot really work when the >> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue, >> so if we fix that function like this... >> >> static uint64_t translate_kernel_address(void *opaque, uint64_t addr) >> { >> SpaprMachineState *spapr = opaque; >> >> return addr ? addr : spapr->kernel_addr; >> } > > > The qemu elf loader is supposed to handle relocations which should be > calling this hook more than once, now I wonder why it is not doing so. For relocations, it seems to only call translate_fn for s390x. >> ...then we could always use the ELF PhysAddr if it is different from 0 >> and only use the default load addr if the ELF PhysAddr is 0. If the user >> gives kernel_addr on the cmdline, we honor that, even if puts the kernel >> over the firmware (we have code to detect that). > > > ELF address is 0 for LE zImage only, vmlinux BE/LE uses > 0xc000000000000000. And we are already chopping all these tops bits off > in translate_kernel_address() and I do not really know why _exactly_ it > is 0x0fffffff and not let's say 0x7fffffff. Oh, I am not talking about the ELF entry point. And that is not what load_elf passes to translate_kernel_address either. It is the ELF PhysAddr: $ powerpc64le-buildroot-linux-gnu-readelf -We vmlinux | tail Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x010000 0xc000000000000000 0x0000000000000000 0x28d84d4 0x2a54ea8 RWE 0x10000 So it is 0x400000 for BE zImage and 0 for vmlinux. >> >>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) >>> exit(1); >>> } >>> >>> + if (spapr->kernel_addr != loaded_addr) { >> >> This could be: >> >> if (spapr->kernel_addr == KERNEL_LOAD_ADDR && >> spapr->kernel_addr != loaded_addr) { >> >> So the precedence would be: >> >> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage >> falls here; >> >> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall >> here; >> >> 3- kernel_addr. The user is probably hacking something, just use what >> they gave us. QEMU will yell if they load the kernel over the fw. > > > imho too complicated. > > What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is > KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)? Good point. It should always be 3. It is a bad user interface to have a command line option and arbitrarily ignore it. Be it 0x0 or 0x400000. > I am basically fixing a bug when we ignore where load_elf() loaded the > ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the > ELF was loaded where we want it to be. That bug is already accounted for, that is why we have translate_kernel_address: /* address_offset is hack for kernel images that are linked at the wrong physical address. */ if (translate_fn) { addr = translate_fn(translate_opaque, ph->p_paddr); So we're actively telling load_elf to load the kernel at the wrong place when we do: (ph->p_addr & 0x0fffffff) + spapr->kernel_addr It just happened to work so far because the vmlinux PhysAddr is 0. When you set kernel-addr=0 you are simply working around this other bug because: (ph->p_addr & 0x0fffffff) + 0 == ph->p_addr I just want to remove this indirection and use the ELF PhysAddr directly. > Everything else can be done but on top of this. If you want to merge this I could send another patch on top of it later, no worries. I realise that this a larger change that will need more testing. >>> + warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", >>> + spapr->kernel_addr, loaded_addr); >>> + spapr->kernel_addr = loaded_addr; >>> + } >>> + >>> /* load initrd */ >>> if (initrd_filename) { >>> /* Try to locate the initrd in the gap between the kernel
On 06/05/2022 01:50, Fabiano Rosas wrote: > Alexey Kardashevskiy <aik@ozlabs.ru> writes: > >> On 5/5/22 05:16, Fabiano Rosas wrote: >>> Alexey Kardashevskiy <aik@ozlabs.ru> writes: >>> >>>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. >>>> >>>> QEMU loads the kernel at 0x400000 by default which works most of >>>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie" >>>> (position independent code). This works for a little endian zImage too. >>>> >>>> However a big endian zImage is compiled without -pie, is 32bit, linked to >>>> 0x4000000 so current QEMU ends up loading it at >>>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. >>>> >>>> This uses the kernel address returned from load_elf(). >>>> If the default kernel_addr is used, there is no change in behavior (as >>>> translate_kernel_address() takes care of this), which is: >>>> LE/BE vmlinux and LE zImage boot, BE zImage does not. >>>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU >>>> prints a warning and BE zImage boots. >>> >>> I think we can fix this without needing a different command line for BE >>> zImage (apart from x-vof, which is a separate matter). >>> >>> If you look at translate_kernel_address, it cannot really work when the >>> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue, >>> so if we fix that function like this... >>> >>> static uint64_t translate_kernel_address(void *opaque, uint64_t addr) >>> { >>> SpaprMachineState *spapr = opaque; >>> >>> return addr ? addr : spapr->kernel_addr; >>> } >> >> >> The qemu elf loader is supposed to handle relocations which should be >> calling this hook more than once, now I wonder why it is not doing so. > > For relocations, it seems to only call translate_fn for s390x. Agrh. You made me read this :) It is quite weird code and yeah 390-only :-/ >>> ...then we could always use the ELF PhysAddr if it is different from 0 >>> and only use the default load addr if the ELF PhysAddr is 0. If the user >>> gives kernel_addr on the cmdline, we honor that, even if puts the kernel >>> over the firmware (we have code to detect that). >> >> >> ELF address is 0 for LE zImage only, vmlinux BE/LE uses >> 0xc000000000000000. And we are already chopping all these tops bits off >> in translate_kernel_address() and I do not really know why _exactly_ it >> is 0x0fffffff and not let's say 0x7fffffff. > > Oh, I am not talking about the ELF entry point. And that is not what > load_elf passes to translate_kernel_address either. It is the ELF > PhysAddr: > > $ powerpc64le-buildroot-linux-gnu-readelf -We vmlinux | tail > > Program Headers: > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > LOAD 0x010000 0xc000000000000000 0x0000000000000000 0x28d84d4 0x2a54ea8 RWE 0x10000 > > So it is 0x400000 for BE zImage and 0 for vmlinux. Ah right. Me wrong. btw potentially there can be more program sections. [fstn1-p1 ~]$ readelf -l /home/aik/p/slof/board-qemu/llfw/stage1.elf Elf file type is EXEC (Executable file) Entry point 0x100 There are 2 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000000100 0x0000000000000100 0x0000000000000100 0x0000000000008110 0x0000000000010290 RWE 0x4000 LOAD 0x0000000000008210 0x0000000000010400 0x0000000000010400 0x0000000000000010 0x0000000000000010 RW 0x8 grub might be similar. > >>> >>>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) >>>> exit(1); >>>> } >>>> >>>> + if (spapr->kernel_addr != loaded_addr) { >>> >>> This could be: >>> >>> if (spapr->kernel_addr == KERNEL_LOAD_ADDR && >>> spapr->kernel_addr != loaded_addr) { >>> >>> So the precedence would be: >>> >>> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage >>> falls here; >>> >>> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall >>> here; >>> >>> 3- kernel_addr. The user is probably hacking something, just use what >>> they gave us. QEMU will yell if they load the kernel over the fw. >> >> >> imho too complicated. >> >> What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is >> KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)? > > Good point. It should always be 3. It is a bad user interface to have a > command line option and arbitrarily ignore it. Be it 0x0 or 0x400000. > >> I am basically fixing a bug when we ignore where load_elf() loaded the >> ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the >> ELF was loaded where we want it to be. > > That bug is already accounted for, that is why we have > translate_kernel_address: > > /* address_offset is hack for kernel images that are > linked at the wrong physical address. */ > if (translate_fn) { > addr = translate_fn(translate_opaque, ph->p_paddr); > > So we're actively telling load_elf to load the kernel at the wrong place > when we do: > > (ph->p_addr & 0x0fffffff) + spapr->kernel_addr > > It just happened to work so far because the vmlinux PhysAddr is 0. > > When you set kernel-addr=0 you are simply working around this other bug > because: > > (ph->p_addr & 0x0fffffff) + 0 == ph->p_addr > > I just want to remove this indirection and use the ELF PhysAddr > directly. That's alright but not in translate_kernel_address(). May be I should rename kernel-addr to kernel-offset (which it really is) or hack load_elf() so it would take the desired location and work out the offset inside (and ditch the translate_fn hook) but either way we'll need heuristics (or the user should know) to know if the image is self-relocatable or not as otherwise it just won't boot. >> Everything else can be done but on top of this. > > If you want to merge this I could send another patch on top of it later, > no worries. I realise that this a larger change that will need more > testing. I want to have some easy-to-explain way of booting BE zImage :) > >>>> + warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", >>>> + spapr->kernel_addr, loaded_addr); >>>> + spapr->kernel_addr = loaded_addr; >>>> + } >>>> + >>>> /* load initrd */ >>>> if (initrd_filename) { >>>> /* Try to locate the initrd in the gap between the kernel
Alexey Kardashevskiy <aik@ozlabs.ru> writes: > On 06/05/2022 01:50, Fabiano Rosas wrote: >> Alexey Kardashevskiy <aik@ozlabs.ru> writes: >> >>> On 5/5/22 05:16, Fabiano Rosas wrote: >>>> Alexey Kardashevskiy <aik@ozlabs.ru> writes: >>>> >>>>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. >>>>> >>>>> QEMU loads the kernel at 0x400000 by default which works most of >>>>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie" >>>>> (position independent code). This works for a little endian zImage too. >>>>> >>>>> However a big endian zImage is compiled without -pie, is 32bit, linked to >>>>> 0x4000000 so current QEMU ends up loading it at >>>>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. >>>>> >>>>> This uses the kernel address returned from load_elf(). >>>>> If the default kernel_addr is used, there is no change in behavior (as >>>>> translate_kernel_address() takes care of this), which is: >>>>> LE/BE vmlinux and LE zImage boot, BE zImage does not. >>>>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU >>>>> prints a warning and BE zImage boots. >>>> >>>> I think we can fix this without needing a different command line for BE >>>> zImage (apart from x-vof, which is a separate matter). >>>> >>>> If you look at translate_kernel_address, it cannot really work when the >>>> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue, >>>> so if we fix that function like this... >>>> >>>> static uint64_t translate_kernel_address(void *opaque, uint64_t addr) >>>> { >>>> SpaprMachineState *spapr = opaque; >>>> >>>> return addr ? addr : spapr->kernel_addr; >>>> } >>> >>> >>> The qemu elf loader is supposed to handle relocations which should be >>> calling this hook more than once, now I wonder why it is not doing so. >> >> For relocations, it seems to only call translate_fn for s390x. > > > Agrh. You made me read this :) It is quite weird code and yeah 390-only :-/ > > >>>> ...then we could always use the ELF PhysAddr if it is different from 0 >>>> and only use the default load addr if the ELF PhysAddr is 0. If the user >>>> gives kernel_addr on the cmdline, we honor that, even if puts the kernel >>>> over the firmware (we have code to detect that). >>> >>> >>> ELF address is 0 for LE zImage only, vmlinux BE/LE uses >>> 0xc000000000000000. And we are already chopping all these tops bits off >>> in translate_kernel_address() and I do not really know why _exactly_ it >>> is 0x0fffffff and not let's say 0x7fffffff. >> >> Oh, I am not talking about the ELF entry point. And that is not what >> load_elf passes to translate_kernel_address either. It is the ELF >> PhysAddr: >> >> $ powerpc64le-buildroot-linux-gnu-readelf -We vmlinux | tail >> >> Program Headers: >> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align >> LOAD 0x010000 0xc000000000000000 0x0000000000000000 0x28d84d4 0x2a54ea8 RWE 0x10000 >> >> So it is 0x400000 for BE zImage and 0 for vmlinux. > > Ah right. Me wrong. > > btw potentially there can be more program sections. > > [fstn1-p1 ~]$ readelf -l /home/aik/p/slof/board-qemu/llfw/stage1.elf > > Elf file type is EXEC (Executable file) > Entry point 0x100 > There are 2 program headers, starting at offset 64 > > Program Headers: > Type Offset VirtAddr PhysAddr > FileSiz MemSiz Flags Align > LOAD 0x0000000000000100 0x0000000000000100 0x0000000000000100 > 0x0000000000008110 0x0000000000010290 RWE 0x4000 > LOAD 0x0000000000008210 0x0000000000010400 0x0000000000010400 > 0x0000000000000010 0x0000000000000010 RW 0x8 > > > grub might be similar. > > >> >>>> >>>>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) >>>>> exit(1); >>>>> } >>>>> >>>>> + if (spapr->kernel_addr != loaded_addr) { >>>> >>>> This could be: >>>> >>>> if (spapr->kernel_addr == KERNEL_LOAD_ADDR && >>>> spapr->kernel_addr != loaded_addr) { >>>> >>>> So the precedence would be: >>>> >>>> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage >>>> falls here; >>>> >>>> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall >>>> here; >>>> >>>> 3- kernel_addr. The user is probably hacking something, just use what >>>> they gave us. QEMU will yell if they load the kernel over the fw. >>> >>> >>> imho too complicated. >>> >>> What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is >>> KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)? >> >> Good point. It should always be 3. It is a bad user interface to have a >> command line option and arbitrarily ignore it. Be it 0x0 or 0x400000. >> >>> I am basically fixing a bug when we ignore where load_elf() loaded the >>> ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the >>> ELF was loaded where we want it to be. >> >> That bug is already accounted for, that is why we have >> translate_kernel_address: >> >> /* address_offset is hack for kernel images that are >> linked at the wrong physical address. */ >> if (translate_fn) { >> addr = translate_fn(translate_opaque, ph->p_paddr); >> >> So we're actively telling load_elf to load the kernel at the wrong place >> when we do: >> >> (ph->p_addr & 0x0fffffff) + spapr->kernel_addr >> >> It just happened to work so far because the vmlinux PhysAddr is 0. >> >> When you set kernel-addr=0 you are simply working around this other bug >> because: >> >> (ph->p_addr & 0x0fffffff) + 0 == ph->p_addr >> >> I just want to remove this indirection and use the ELF PhysAddr >> directly. > > > That's alright but not in translate_kernel_address(). May be I should > rename kernel-addr to kernel-offset (which it really is) or hack > load_elf() so it would take the desired location and work out the offset > inside (and ditch the translate_fn hook) but either way we'll need > heuristics (or the user should know) to know if the image is > self-relocatable or not as otherwise it just won't boot. > > > >>> Everything else can be done but on top of this. >> >> If you want to merge this I could send another patch on top of it later, >> no worries. I realise that this a larger change that will need more >> testing. > > > I want to have some easy-to-explain way of booting BE zImage :) Ok, I guess I can live with this kernel-addr=0 ugliness. We can deal with translate_kernel_address another time. Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
On 5/4/22 03:55, Alexey Kardashevskiy wrote: > tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. > > QEMU loads the kernel at 0x400000 by default which works most of > the time as Linux kernels are relocatable, 64bit and compiled with "-pie" > (position independent code). This works for a little endian zImage too. > > However a big endian zImage is compiled without -pie, is 32bit, linked to > 0x4000000 so current QEMU ends up loading it at > 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. > > This uses the kernel address returned from load_elf(). > If the default kernel_addr is used, there is no change in behavior (as > translate_kernel_address() takes care of this), which is: > LE/BE vmlinux and LE zImage boot, BE zImage does not. > If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU > prints a warning and BE zImage boots. > > Note #1: SLOF (x-vof=off) still cannot boot a big endian zImage as > SLOF enables MSR_SF for everything loaded by QEMU and this leads to early > crash of 32bit zImage. > > Note #2: BE/LE vmlinux images set MSR_SF in early boot so these just work; > a LE zImage restores MSR_SF after every CI call and we are lucky enough > not to crash before the first CI call. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- Queued in gitlab.com/danielhb/qemu/tree/ppc-next. Thanks, Daniel > > We could probably change SLOF to always clear MSR_SF before jumping to > the kernel but this is 1) SLOF fix 2) not quite sure if it brings > lots of value. > > > I really wish I had this when tested this fix: > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220406070038.3704604-1-aik@ozlabs.ru/ > > --- > hw/ppc/spapr.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index a4372ba1891e..89f18f6564bd 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2971,14 +2971,16 @@ static void spapr_machine_init(MachineState *machine) > } > > if (kernel_filename) { > + uint64_t loaded_addr = 0; > + > spapr->kernel_size = load_elf(kernel_filename, NULL, > translate_kernel_address, spapr, > - NULL, NULL, NULL, NULL, 1, > + NULL, &loaded_addr, NULL, NULL, 1, > PPC_ELF_MACHINE, 0, 0); > if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) { > spapr->kernel_size = load_elf(kernel_filename, NULL, > translate_kernel_address, spapr, > - NULL, NULL, NULL, NULL, 0, > + NULL, &loaded_addr, NULL, NULL, 0, > PPC_ELF_MACHINE, 0, 0); > spapr->kernel_le = spapr->kernel_size > 0; > } > @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) > exit(1); > } > > + if (spapr->kernel_addr != loaded_addr) { > + warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", > + spapr->kernel_addr, loaded_addr); > + spapr->kernel_addr = loaded_addr; > + } > + > /* load initrd */ > if (initrd_filename) { > /* Try to locate the initrd in the gap between the kernel
Alexey, I had to amend your commit due to Gitlab CI complaining about ... On 5/4/22 03:55, Alexey Kardashevskiy wrote: > tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. > > QEMU loads the kernel at 0x400000 by default which works most of > the time as Linux kernels are relocatable, 64bit and compiled with "-pie" > (position independent code). This works for a little endian zImage too. > > However a big endian zImage is compiled without -pie, is 32bit, linked to > 0x4000000 so current QEMU ends up loading it at > 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. > > This uses the kernel address returned from load_elf(). > If the default kernel_addr is used, there is no change in behavior (as > translate_kernel_address() takes care of this), which is: > LE/BE vmlinux and LE zImage boot, BE zImage does not. > If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU > prints a warning and BE zImage boots. > > Note #1: SLOF (x-vof=off) still cannot boot a big endian zImage as > SLOF enables MSR_SF for everything loaded by QEMU and this leads to early > crash of 32bit zImage. > > Note #2: BE/LE vmlinux images set MSR_SF in early boot so these just work; > a LE zImage restores MSR_SF after every CI call and we are lucky enough > not to crash before the first CI call. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > > We could probably change SLOF to always clear MSR_SF before jumping to > the kernel but this is 1) SLOF fix 2) not quite sure if it brings > lots of value. > > > I really wish I had this when tested this fix: > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220406070038.3704604-1-aik@ozlabs.ru/ > > --- > hw/ppc/spapr.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index a4372ba1891e..89f18f6564bd 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2971,14 +2971,16 @@ static void spapr_machine_init(MachineState *machine) > } > > if (kernel_filename) { > + uint64_t loaded_addr = 0; > + > spapr->kernel_size = load_elf(kernel_filename, NULL, > translate_kernel_address, spapr, > - NULL, NULL, NULL, NULL, 1, > + NULL, &loaded_addr, NULL, NULL, 1, > PPC_ELF_MACHINE, 0, 0); > if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) { > spapr->kernel_size = load_elf(kernel_filename, NULL, > translate_kernel_address, spapr, > - NULL, NULL, NULL, NULL, 0, > + NULL, &loaded_addr, NULL, NULL, 0, > PPC_ELF_MACHINE, 0, 0); > spapr->kernel_le = spapr->kernel_size > 0; > } > @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) > exit(1); > } > > + if (spapr->kernel_addr != loaded_addr) { > + warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", > + spapr->kernel_addr, loaded_addr); ... this code. This is problematic when compiling in a 32 bit environment because the definition of long (long) unsigned differs from the usual 64 bit env we use: ../hw/ppc/spapr.c: In function 'spapr_machine_init': ../hw/ppc/spapr.c:2998:25: error: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'uint64_t' {aka 'long long unsigned int'} [-Werror=format=] 2998 | warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2999 | spapr->kernel_addr, loaded_addr); | ~~~~~~~~~~~~~~~~~~ | | | uint64_t {aka long long unsigned int} ../hw/ppc/spapr.c:2998:25: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'uint64_t' {aka 'long long unsigned int'} [-Werror=format=] 2998 | warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2999 | spapr->kernel_addr, loaded_addr); | ~~~~~~~~~~~ | | | uint64_t {aka long long unsigned int} cc1: all warnings being treated as errors I've fixed it by doing the following: diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 156e799ae9..8d5bdfc20f 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2995,7 +2995,7 @@ static void spapr_machine_init(MachineState *machine) } if (spapr->kernel_addr != loaded_addr) { - warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", + warn_report("spapr: kernel_addr changed from 0x%"PRIx64 + " to 0x%"PRIx64, spapr->kernel_addr, loaded_addr); spapr->kernel_addr = loaded_addr; } If you're ok with this fixup we can keep it as is. Otherwise feel free to send another version. Thanks, Daniel > + spapr->kernel_addr = loaded_addr; > + } > + > /* load initrd */ > if (initrd_filename) { > /* Try to locate the initrd in the gap between the kernel
On 5/18/22 04:58, Daniel Henrique Barboza wrote: > Alexey, > > I had to amend your commit due to Gitlab CI complaining about ... > > On 5/4/22 03:55, Alexey Kardashevskiy wrote: >> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. >> >> QEMU loads the kernel at 0x400000 by default which works most of >> the time as Linux kernels are relocatable, 64bit and compiled with "-pie" >> (position independent code). This works for a little endian zImage too. >> >> However a big endian zImage is compiled without -pie, is 32bit, linked to >> 0x4000000 so current QEMU ends up loading it at >> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. >> >> This uses the kernel address returned from load_elf(). >> If the default kernel_addr is used, there is no change in behavior (as >> translate_kernel_address() takes care of this), which is: >> LE/BE vmlinux and LE zImage boot, BE zImage does not. >> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU >> prints a warning and BE zImage boots. >> >> Note #1: SLOF (x-vof=off) still cannot boot a big endian zImage as >> SLOF enables MSR_SF for everything loaded by QEMU and this leads to early >> crash of 32bit zImage. >> >> Note #2: BE/LE vmlinux images set MSR_SF in early boot so these just >> work; >> a LE zImage restores MSR_SF after every CI call and we are lucky enough >> not to crash before the first CI call. >> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> >> We could probably change SLOF to always clear MSR_SF before jumping to >> the kernel but this is 1) SLOF fix 2) not quite sure if it brings >> lots of value. >> >> >> I really wish I had this when tested this fix: >> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220406070038.3704604-1-aik@ozlabs.ru/ >> >> --- >> hw/ppc/spapr.c | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >> index a4372ba1891e..89f18f6564bd 100644 >> --- a/hw/ppc/spapr.c >> +++ b/hw/ppc/spapr.c >> @@ -2971,14 +2971,16 @@ static void spapr_machine_init(MachineState >> *machine) >> } >> if (kernel_filename) { >> + uint64_t loaded_addr = 0; >> + >> spapr->kernel_size = load_elf(kernel_filename, NULL, >> translate_kernel_address, spapr, >> - NULL, NULL, NULL, NULL, 1, >> + NULL, &loaded_addr, NULL, NULL, 1, >> PPC_ELF_MACHINE, 0, 0); >> if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) { >> spapr->kernel_size = load_elf(kernel_filename, NULL, >> translate_kernel_address, >> spapr, >> - NULL, NULL, NULL, NULL, 0, >> + NULL, &loaded_addr, NULL, >> NULL, 0, >> PPC_ELF_MACHINE, 0, 0); >> spapr->kernel_le = spapr->kernel_size > 0; >> } >> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState >> *machine) >> exit(1); >> } >> + if (spapr->kernel_addr != loaded_addr) { >> + warn_report("spapr: kernel_addr changed from 0x%lx to >> 0x%lx", >> + spapr->kernel_addr, loaded_addr); > > > ... this code. This is problematic when compiling in a 32 bit > environment because > the definition of long (long) unsigned differs from the usual 64 bit env > we use: > > > > ../hw/ppc/spapr.c: In function 'spapr_machine_init': > ../hw/ppc/spapr.c:2998:25: error: format '%lx' expects argument of type > 'long unsigned int', but argument 2 has type 'uint64_t' {aka 'long long > unsigned int'} [-Werror=format=] > 2998 | warn_report("spapr: kernel_addr changed from 0x%lx > to 0x%lx", > | > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > 2999 | spapr->kernel_addr, loaded_addr); > | ~~~~~~~~~~~~~~~~~~ > | | > | uint64_t {aka long long unsigned int} > ../hw/ppc/spapr.c:2998:25: error: format '%lx' expects argument of type > 'long unsigned int', but argument 3 has type 'uint64_t' {aka 'long long > unsigned int'} [-Werror=format=] > 2998 | warn_report("spapr: kernel_addr changed from 0x%lx > to 0x%lx", > | > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > 2999 | spapr->kernel_addr, loaded_addr); > | ~~~~~~~~~~~ > | | > | uint64_t {aka long > long unsigned int} > cc1: all warnings being treated as errors > > > I've fixed it by doing the following: > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 156e799ae9..8d5bdfc20f 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2995,7 +2995,7 @@ static void spapr_machine_init(MachineState *machine) > } > > if (spapr->kernel_addr != loaded_addr) { > - warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", > + warn_report("spapr: kernel_addr changed from 0x%"PRIx64 > + " to 0x%"PRIx64, > spapr->kernel_addr, loaded_addr); > spapr->kernel_addr = loaded_addr; > } > > > > If you're ok with this fixup we can keep it as is. Otherwise feel free > to send > another version. I am totally fine with this change, sorry I have not compile tested it, just assumed this cannot fail :-/ > > > Thanks, > > > Daniel > > > > > >> + spapr->kernel_addr = loaded_addr; >> + } >> + >> /* load initrd */ >> if (initrd_filename) { >> /* Try to locate the initrd in the gap between the kernel
On 5/17/22 23:51, Alexey Kardashevskiy wrote: > > > On 5/18/22 04:58, Daniel Henrique Barboza wrote: >> Alexey, >> >> I had to amend your commit due to Gitlab CI complaining about ... >> >> On 5/4/22 03:55, Alexey Kardashevskiy wrote: >>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. >>> >>> QEMU loads the kernel at 0x400000 by default which works most of >>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie" >>> (position independent code). This works for a little endian zImage too. >>> >>> However a big endian zImage is compiled without -pie, is 32bit, linked to >>> 0x4000000 so current QEMU ends up loading it at >>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. >>> >>> This uses the kernel address returned from load_elf(). >>> If the default kernel_addr is used, there is no change in behavior (as >>> translate_kernel_address() takes care of this), which is: >>> LE/BE vmlinux and LE zImage boot, BE zImage does not. >>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU >>> prints a warning and BE zImage boots. >>> >>> Note #1: SLOF (x-vof=off) still cannot boot a big endian zImage as >>> SLOF enables MSR_SF for everything loaded by QEMU and this leads to early >>> crash of 32bit zImage. >>> >>> Note #2: BE/LE vmlinux images set MSR_SF in early boot so these just work; >>> a LE zImage restores MSR_SF after every CI call and we are lucky enough >>> not to crash before the first CI call. >>> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>> --- >>> >>> We could probably change SLOF to always clear MSR_SF before jumping to >>> the kernel but this is 1) SLOF fix 2) not quite sure if it brings >>> lots of value. >>> >>> >>> I really wish I had this when tested this fix: >>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220406070038.3704604-1-aik@ozlabs.ru/ >>> >>> --- >>> hw/ppc/spapr.c | 12 ++++++++++-- >>> 1 file changed, 10 insertions(+), 2 deletions(-) >>> >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >>> index a4372ba1891e..89f18f6564bd 100644 >>> --- a/hw/ppc/spapr.c >>> +++ b/hw/ppc/spapr.c >>> @@ -2971,14 +2971,16 @@ static void spapr_machine_init(MachineState *machine) >>> } >>> if (kernel_filename) { >>> + uint64_t loaded_addr = 0; >>> + >>> spapr->kernel_size = load_elf(kernel_filename, NULL, >>> translate_kernel_address, spapr, >>> - NULL, NULL, NULL, NULL, 1, >>> + NULL, &loaded_addr, NULL, NULL, 1, >>> PPC_ELF_MACHINE, 0, 0); >>> if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) { >>> spapr->kernel_size = load_elf(kernel_filename, NULL, >>> translate_kernel_address, spapr, >>> - NULL, NULL, NULL, NULL, 0, >>> + NULL, &loaded_addr, NULL, NULL, 0, >>> PPC_ELF_MACHINE, 0, 0); >>> spapr->kernel_le = spapr->kernel_size > 0; >>> } >>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) >>> exit(1); >>> } >>> + if (spapr->kernel_addr != loaded_addr) { >>> + warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", >>> + spapr->kernel_addr, loaded_addr); >> >> >> ... this code. This is problematic when compiling in a 32 bit environment because >> the definition of long (long) unsigned differs from the usual 64 bit env we use: >> >> >> >> ../hw/ppc/spapr.c: In function 'spapr_machine_init': >> ../hw/ppc/spapr.c:2998:25: error: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'uint64_t' {aka 'long long unsigned int'} [-Werror=format=] >> 2998 | warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", >> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> 2999 | spapr->kernel_addr, loaded_addr); >> | ~~~~~~~~~~~~~~~~~~ >> | | >> | uint64_t {aka long long unsigned int} >> ../hw/ppc/spapr.c:2998:25: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'uint64_t' {aka 'long long unsigned int'} [-Werror=format=] >> 2998 | warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", >> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> 2999 | spapr->kernel_addr, loaded_addr); >> | ~~~~~~~~~~~ >> | | >> | uint64_t {aka long long unsigned int} >> cc1: all warnings being treated as errors >> >> >> I've fixed it by doing the following: >> >> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >> index 156e799ae9..8d5bdfc20f 100644 >> --- a/hw/ppc/spapr.c >> +++ b/hw/ppc/spapr.c >> @@ -2995,7 +2995,7 @@ static void spapr_machine_init(MachineState *machine) >> } >> >> if (spapr->kernel_addr != loaded_addr) { >> - warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", >> + warn_report("spapr: kernel_addr changed from 0x%"PRIx64 >> + " to 0x%"PRIx64, >> spapr->kernel_addr, loaded_addr); >> spapr->kernel_addr = loaded_addr; >> } >> >> >> >> If you're ok with this fixup we can keep it as is. Otherwise feel free to send >> another version. > > > I am totally fine with this change, sorry I have not compile tested it, just assumed this cannot fail :-/ Nah it's fine. You would only be able to see this error if you happen to compile it with a 32 bit environment. Not sure if this is something you use to do here and there. I sure don't. Daniel > > >> >> >> Thanks, >> >> >> Daniel >> >> >> >> >> >>> + spapr->kernel_addr = loaded_addr; >>> + } >>> + >>> /* load initrd */ >>> if (initrd_filename) { >>> /* Try to locate the initrd in the gap between the kernel >
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index a4372ba1891e..89f18f6564bd 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2971,14 +2971,16 @@ static void spapr_machine_init(MachineState *machine) } if (kernel_filename) { + uint64_t loaded_addr = 0; + spapr->kernel_size = load_elf(kernel_filename, NULL, translate_kernel_address, spapr, - NULL, NULL, NULL, NULL, 1, + NULL, &loaded_addr, NULL, NULL, 1, PPC_ELF_MACHINE, 0, 0); if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) { spapr->kernel_size = load_elf(kernel_filename, NULL, translate_kernel_address, spapr, - NULL, NULL, NULL, NULL, 0, + NULL, &loaded_addr, NULL, NULL, 0, PPC_ELF_MACHINE, 0, 0); spapr->kernel_le = spapr->kernel_size > 0; } @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) exit(1); } + if (spapr->kernel_addr != loaded_addr) { + warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", + spapr->kernel_addr, loaded_addr); + spapr->kernel_addr = loaded_addr; + } + /* load initrd */ if (initrd_filename) { /* Try to locate the initrd in the gap between the kernel
tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. QEMU loads the kernel at 0x400000 by default which works most of the time as Linux kernels are relocatable, 64bit and compiled with "-pie" (position independent code). This works for a little endian zImage too. However a big endian zImage is compiled without -pie, is 32bit, linked to 0x4000000 so current QEMU ends up loading it at 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. This uses the kernel address returned from load_elf(). If the default kernel_addr is used, there is no change in behavior (as translate_kernel_address() takes care of this), which is: LE/BE vmlinux and LE zImage boot, BE zImage does not. If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU prints a warning and BE zImage boots. Note #1: SLOF (x-vof=off) still cannot boot a big endian zImage as SLOF enables MSR_SF for everything loaded by QEMU and this leads to early crash of 32bit zImage. Note #2: BE/LE vmlinux images set MSR_SF in early boot so these just work; a LE zImage restores MSR_SF after every CI call and we are lucky enough not to crash before the first CI call. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- We could probably change SLOF to always clear MSR_SF before jumping to the kernel but this is 1) SLOF fix 2) not quite sure if it brings lots of value. I really wish I had this when tested this fix: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220406070038.3704604-1-aik@ozlabs.ru/ --- hw/ppc/spapr.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)