ppc: Stop dumping state on all exceptions in linux-user

Message ID	1470211348.12584.68.camel@kernel.crashing.org
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> Message-ID: <1470211348.12584.68.camel@kernel.crashing.org> From: Benjamin Herrenschmidt <benh@kernel.crashing.org> To: qemu-ppc@nongnu.org Date: Wed, 03 Aug 2016 18:02:28 +1000 Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] [PATCH] ppc: Stop dumping state on all exceptions in linux-user Precedence: list Cc: Anton Blanchard <anton@au1.ibm.com>, qemu-devel@nongnu.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

Benjamin Herrenschmidt Aug. 3, 2016, 8:02 a.m. UTC

Other archs don't do it, some programs catch signals just fine
and those dumps just clutter the output. Keep the dumps for cases
that aren't supposed to happen such as unknown codes.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 linux-user/main.c | 14 --------------
 1 file changed, 14 deletions(-)

Peter Maydell Aug. 3, 2016, 11:05 a.m. UTC | #1

On 3 August 2016 at 09:02, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> Other archs don't do it, some programs catch signals just fine
> and those dumps just clutter the output. Keep the dumps for cases
> that aren't supposed to happen such as unknown codes.
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>  linux-user/main.c | 14 --------------
>  1 file changed, 14 deletions(-)
>
> diff --git a/linux-user/main.c b/linux-user/main.c
> index eb9975c..8fbc5a6 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -1651,9 +1651,6 @@ void cpu_loop(CPUPPCState *env)
>                        "Aborting\n");
>              break;
>          case POWERPC_EXCP_DSI:      /* Data storage exception                */
> -            EXCP_DUMP(env, "Invalid data memory access: 0x" TARGET_FMT_lx "\n",
> -                      env->spr[SPR_DAR]);
> -            /* XXX: check this. Seems bugged */

Are you removing these XXX comments because you've checked the
error code cases below? If so it would be useful to say so
in the commit message...

>              switch (env->error_code & 0xFF000000) {
>              case 0x40000000:
>              case 0x42000000:
> @@ -1684,9 +1681,6 @@ void cpu_loop(CPUPPCState *env)
>              queue_signal(env, info.si_signo, &info);
>              break;
>          case POWERPC_EXCP_ISI:      /* Instruction storage exception         */
> -            EXCP_DUMP(env, "Invalid instruction fetch: 0x\n" TARGET_FMT_lx
> -                      "\n", env->spr[SPR_SRR0]);
> -            /* XXX: check this */
>              switch (env->error_code & 0xFF000000) {
>              case 0x40000000:
>                  info.si_signo = TARGET_SIGSEGV;

thanks
-- PMM

Benjamin Herrenschmidt Aug. 3, 2016, 11:28 a.m. UTC | #2

On Wed, 2016-08-03 at 12:05 +0100, Peter Maydell wrote:
> On 3 August 2016 at 09:02, Benjamin Herrenschmidt
> > <benh@kernel.crashing.org> wrote:
> > 
> > Other archs don't do it, some programs catch signals just fine
> > and those dumps just clutter the output. Keep the dumps for cases
> > that aren't supposed to happen such as unknown codes.
> > 
> > > > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > ---
> >  linux-user/main.c | 14 --------------
> >  1 file changed, 14 deletions(-)
> > 
> > diff --git a/linux-user/main.c b/linux-user/main.c
> > index eb9975c..8fbc5a6 100644
> > --- a/linux-user/main.c
> > +++ b/linux-user/main.c
> > @@ -1651,9 +1651,6 @@ void cpu_loop(CPUPPCState *env)
> >                        "Aborting\n");
> >              break;
> >          case POWERPC_EXCP_DSI:      /* Data storage exception                */
> > -            EXCP_DUMP(env, "Invalid data memory access: 0x" TARGET_FMT_lx "\n",
> > -                      env->spr[SPR_DAR]);
> > -            /* XXX: check this. Seems bugged */
> 
> Are you removing these XXX comments because you've checked the
> error code cases below? If so it would be useful to say so
> in the commit message...

Well I fix some of it in another patch...

To the best of my understanding we only ever generate
0x40000000 and 0x42000000 via ppc_cpu_handle_mmu_fault()
in user_only_helper.c unless there's another path to DSI
in user mode that I missed.

That being said, the comment isn't useful anyway. Nobody
looks at it :-)

I can respin with the comment back in or do put some blurb
in the cset comment, let me know.

BTW. I noticed, the generic path from the original signal
coming from the host kernel through to here loses the
information that would allow us to differenciate between
a map error (no mapping) or a protection fault.

It's a bit unfortunate, not sure if we can do much about it
though without stopping to hijack the existing MMU translate
hook in the CPU class and using a user mode dediated one that
takes that sort of info as an argument.

Cheers,
Ben.

Peter Maydell Aug. 3, 2016, 11:32 a.m. UTC | #3

On 3 August 2016 at 12:28, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> To the best of my understanding we only ever generate
> 0x40000000 and 0x42000000 via ppc_cpu_handle_mmu_fault()
> in user_only_helper.c unless there's another path to DSI
> in user mode that I missed.
>
> That being said, the comment isn't useful anyway. Nobody
> looks at it :-)

Well, it means when somebody does come along to clean
up the code or just to read it they have some idea of
whether the code is believed-to-be-good or not.

> I can respin with the comment back in or do put some blurb
> in the cset comment, let me know.

I think if you're going to remove the comment you should
specifically say you're doing it because you believe the
code is actually correct.

> BTW. I noticed, the generic path from the original signal
> coming from the host kernel through to here loses the
> information that would allow us to differenciate between
> a map error (no mapping) or a protection fault.
>
> It's a bit unfortunate, not sure if we can do much about it
> though without stopping to hijack the existing MMU translate
> hook in the CPU class and using a user mode dediated one that
> takes that sort of info as an argument.

Yeah, there's an LTP test that fails because of this (we
send a SIGSEGV when we should be sending a SIGBUS). It's
a bit painful to fix though, since as you say we've
effectively thrown away some information. I'm inclined to
put this in the big pile of "bugs we could fix if we
ever implemented linux-user-with-softmmu" and otherwise
ignore it, unless you have a real-world program that
needs this and makes some kind of bodge fix worthwhile.

thanks
-- PMM

Benjamin Herrenschmidt Aug. 3, 2016, 11:39 a.m. UTC | #4

On Wed, 2016-08-03 at 12:32 +0100, Peter Maydell wrote:
> Yeah, there's an LTP test that fails because of this (we
> send a SIGSEGV when we should be sending a SIGBUS). It's
> a bit painful to fix though, since as you say we've
> effectively thrown away some information. I'm inclined to
> put this in the big pile of "bugs we could fix if we
> ever implemented linux-user-with-softmmu" and otherwise
> ignore it, unless you have a real-world program that
> needs this and makes some kind of bodge fix worthwhile.

Nope, not really...

If we care, a simpler fix would be to add a "translate_user_fault" hook
to the CPU class that takes more info about the original signal than
handle_mmu_fault does, and call it when available (with a fallback)
from user-exec.c

That does mean going through all the cpu_signal_handler() variants in
there though to make them extract more useful info.

Not sure it's worthwhile...

As far user-with-softmmu, I'm not too sure... softmmu significantly
increases the overhead of load and stores. Maybe after we add 128-bit
integers to TGC to alleviate that a bit ? :-)

Cheers,
Ben
.

Richard Henderson Aug. 6, 2016, 9:53 a.m. UTC | #5

On 08/03/2016 05:09 PM, Benjamin Herrenschmidt wrote:
> As far user-with-softmmu, I'm not too sure... softmmu significantly
> increases the overhead of load and stores. Maybe after we add 128-bit
> integers to TGC to alleviate that a bit ? :-)

It wouldn't be mandatory, but there are certain bugs we can't fix without it. 
The big issues to be fixed with softmmu are

(1) Host page size > guest page size.

E.g. there are many programs (i386, sparc, etc, all with 4k pages) that you 
can't even load, much less run, on a ppc64 host using a 64k page size.

(2) Host virtual address space bits != guest virtual address space bits

My alpha emulation has run into this.  A real Alpha guest has a 44-bit address 
space, but an x86_64 host has a 48-bit address space.  The x86_64 kernel cannot 
be persuaded to reliably map memory below (1ul << 44), so I have to pretend 
than Alpha has a 48-bit address space.  (Indeed, I set this to 63 bits, so that 
it works for even wider va, like on ppc64 and sparc64.)

More theoretically, if the guest uses high bits for some purpose (e.g. ia64 
segmentation in the top 3 bits), and the host doesn't have a full 64-bit 
virtual address space, then we cannot even map the program, since we cannot set 
bits 61-63 to non-zero values.

r~

Benjamin Herrenschmidt Aug. 7, 2016, 12:50 a.m. UTC | #6

On Sat, 2016-08-06 at 15:23 +0530, Richard Henderson wrote:
> On 08/03/2016 05:09 PM, Benjamin Herrenschmidt wrote:
> > 
> > As far user-with-softmmu, I'm not too sure... softmmu significantly
> > increases the overhead of load and stores. Maybe after we add 128-bit
> > integers to TGC to alleviate that a bit ? :-)
> 
> It wouldn't be mandatory, but there are certain bugs we can't fix without it. 
> The big issues to be fixed with softmmu are
> 
> (1) Host page size > guest page size.
> 
> E.g. there are many programs (i386, sparc, etc, all with 4k pages) that you 
> can't even load, much less run, on a ppc64 host using a 64k page size.

Can't we advertise the host page size to the guest ? Or there are too many
compiled-in assumptions ?

> > (2) Host virtual address space bits != guest virtual address space bits
> 
> My alpha emulation has run into this.  A real Alpha guest has a 44-bit address 
> space, but an x86_64 host has a 48-bit address space.  The x86_64 kernel cannot 
> be persuaded to reliably map memory below (1ul << 44), so I have to pretend 
> than Alpha has a 48-bit address space.  (Indeed, I set this to 63 bits, so that 
> it works for even wider va, like on ppc64 and sparc64.)

You can't just set a no-access VMA covering the top of the address space ? Are
alpha programs relying on the fact that they won't get addresses above 44 ?

> > More theoretically, if the guest uses high bits for some purpose (e.g. ia64 
> segmentation in the top 3 bits), and the host doesn't have a full 64-bit 
> virtual address space, then we cannot even map the program, since we cannot set 
> bits 61-63 to non-zero values.

I see. We could definitely have the option then.

Cheers,
Ben.

Richard Henderson Aug. 8, 2016, 6:59 a.m. UTC | #7

On 08/07/2016 06:20 AM, Benjamin Herrenschmidt wrote:
> On Sat, 2016-08-06 at 15:23 +0530, Richard Henderson wrote:
>> On 08/03/2016 05:09 PM, Benjamin Herrenschmidt wrote:
>>>
>>> As far user-with-softmmu, I'm not too sure... softmmu significantly
>>> increases the overhead of load and stores. Maybe after we add 128-bit
>>> integers to TGC to alleviate that a bit ? :-)
>>
>> It wouldn't be mandatory, but there are certain bugs we can't fix without it.
>> The big issues to be fixed with softmmu are
>>
>> (1) Host page size > guest page size.
>>
>> E.g. there are many programs (i386, sparc, etc, all with 4k pages) that you
>> can't even load, much less run, on a ppc64 host using a 64k page size.
>
> Can't we advertise the host page size to the guest ? Or there are too many
> compiled-in assumptions ?

Sometimes.  However:

Each guest has a set of page sizes that are legal.  The largest of these is 
baked into the static linker via MAXPAGESIZE.  If the host page size is larger 
than this guest maximum, the layout of the guest binary may be impossible to 
satisfy on the host.  There are examples of this even within linux-user-test-0.3.

For those guests whose set of page sizes contains only one size, the page size 
is often baked into executables via the constant PAGE_SIZE, rather than 
performing a runtime query on getpagesize(2).

r~

ppc: Stop dumping state on all exceptions in linux-user

Commit Message

Comments

Patch