[03/29] target-sparc: add UA2005 TTE bit #defines

Message ID	1475316333-9776-4-git-send-email-atar4qemu@gmail.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Artyom Tarasenko <atar4qemu@gmail.com> To: qemu-devel@nongnu.org Date: Sat, 1 Oct 2016 12:05:07 +0200 Message-Id: <1475316333-9776-4-git-send-email-atar4qemu@gmail.com> In-Reply-To: <1475316333-9776-1-git-send-email-atar4qemu@gmail.com> References: <1475316333-9776-1-git-send-email-atar4qemu@gmail.com> Subject: [Qemu-devel] [PATCH 03/29] target-sparc: add UA2005 TTE bit #defines Precedence: list Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>, Artyom Tarasenko <atar4qemu@gmail.com>, Richard Henderson <rth@twiddle.net> Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

Artyom Tarasenko Oct. 1, 2016, 10:05 a.m. UTC

Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
---
 target-sparc/cpu.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Richard Henderson Oct. 10, 2016, 9:22 p.m. UTC | #1

On 10/01/2016 05:05 AM, Artyom Tarasenko wrote:
>  #define TTE_VALID_BIT       (1ULL << 63)
>  #define TTE_NFO_BIT         (1ULL << 60)
> +#define TTE_NFO_BIT_UA2005  (1ULL << 62)
>  #define TTE_USED_BIT        (1ULL << 41)
> +#define TTE_USED_BIT_UA2005 (1ULL << 47)
>  #define TTE_LOCKED_BIT      (1ULL <<  6)
> +#define TTE_LOCKED_BIT_UA2005 (1ULL <<  61)
>  #define TTE_SIDEEFFECT_BIT  (1ULL <<  3)
> +#define TTE_SIDEEFFECT_BIT_UA2005 (1ULL <<  11)
>  #define TTE_PRIV_BIT        (1ULL <<  2)
> +#define TTE_PRIV_BIT_UA2005 (1ULL <<  8)
>  #define TTE_W_OK_BIT        (1ULL <<  1)
> +#define TTE_W_OK_BIT_UA2005 (1ULL <<  6)
>  #define TTE_GLOBAL_BIT      (1ULL <<  0)

Hmm.  Would it make more sense to reorg these as

   TTE_US1_*
   TTE_UA2005_*

with some duplication for the bits that are shared?
As is, it's pretty hard to tell which actually change...


r~

Artyom Tarasenko Oct. 10, 2016, 9:45 p.m. UTC | #2

10 окт. 2016 г. 23:22 пользователь "Richard Henderson" <rth@twiddle.net>
написал:
>
> On 10/01/2016 05:05 AM, Artyom Tarasenko wrote:
>>
>>  #define TTE_VALID_BIT       (1ULL << 63)
>>  #define TTE_NFO_BIT         (1ULL << 60)
>> +#define TTE_NFO_BIT_UA2005  (1ULL << 62)
>>  #define TTE_USED_BIT        (1ULL << 41)
>> +#define TTE_USED_BIT_UA2005 (1ULL << 47)
>>  #define TTE_LOCKED_BIT      (1ULL <<  6)
>> +#define TTE_LOCKED_BIT_UA2005 (1ULL <<  61)
>>  #define TTE_SIDEEFFECT_BIT  (1ULL <<  3)
>> +#define TTE_SIDEEFFECT_BIT_UA2005 (1ULL <<  11)
>>  #define TTE_PRIV_BIT        (1ULL <<  2)
>> +#define TTE_PRIV_BIT_UA2005 (1ULL <<  8)
>>  #define TTE_W_OK_BIT        (1ULL <<  1)
>> +#define TTE_W_OK_BIT_UA2005 (1ULL <<  6)
>>  #define TTE_GLOBAL_BIT      (1ULL <<  0)
>
>
> Hmm.  Would it make more sense to reorg these as
>
>   TTE_US1_*
>   TTE_UA2005_*
>
> with some duplication for the bits that are shared?
> As is, it's pretty hard to tell which actually change...

All of them :-)
I'm not sure about renaming: the US1 format is still used in T1 on the read
access.

On the other hand, it's not used in T2. And then again we don't have the T2
emulation yet.

Artyom

Richard Henderson Oct. 11, 2016, 5:50 a.m. UTC | #3

On 10/10/2016 04:45 PM, Artyom Tarasenko wrote:
>> Hmm.  Would it make more sense to reorg these as
>>
>>   TTE_US1_*
>>   TTE_UA2005_*
>>
>> with some duplication for the bits that are shared?
>> As is, it's pretty hard to tell which actually change...
>
> All of them :-)
> I'm not sure about renaming: the US1 format is still used in T1 on the read
> access.
>
> On the other hand, it's not used in T2. And then again we don't have the T2
> emulation yet.

Oh my.  Different on T2 as well?

I wonder if it would make sense to have different functions with which to fill 
in the CPUClass hooks (or invent new SPARCCPUClass hooks as necessary) for the 
major entry points.

E.g. sparc_cpu_handle_mmu_fault or get_physical_address could be hooked, so 
that the choice of how to handle the tlb miss is chosen at startup time, and 
not during each fault.  One can arrange subroutines as necessary to share code 
between the alternate routines, such as when T1 needs to use parts of US1.

Similarly for out-of-line ASI handling, which is already beyond messy, with 
handling for all cpus thrown in the same switch statement.

r~

Artyom Tarasenko Oct. 11, 2016, 1:51 p.m. UTC | #4

On Tue, Oct 11, 2016 at 7:50 AM, Richard Henderson <rth@twiddle.net> wrote:
> On 10/10/2016 04:45 PM, Artyom Tarasenko wrote:
>>>
>>> Hmm.  Would it make more sense to reorg these as
>>>
>>>   TTE_US1_*
>>>   TTE_UA2005_*
>>>
>>> with some duplication for the bits that are shared?
>>> As is, it's pretty hard to tell which actually change...
>>
>>
>> All of them :-)
>> I'm not sure about renaming: the US1 format is still used in T1 on the
>> read
>> access.
>>
>> On the other hand, it's not used in T2. And then again we don't have the
>> T2
>> emulation yet.
>
>
> Oh my.  Different on T2 as well?

T2 has more used bits, and can not use the US1 format, I think.

> I wonder if it would make sense to have different functions with which to
> fill in the CPUClass hooks (or invent new SPARCCPUClass hooks as necessary)
> for the major entry points.
>
> E.g. sparc_cpu_handle_mmu_fault or get_physical_address could be hooked, so
> that the choice of how to handle the tlb miss is chosen at startup time, and
> not during each fault.  One can arrange subroutines as necessary to share
> code between the alternate routines, such as when T1 needs to use parts of
> US1.

Yes, I plan to do it once I get to T2 emulation.

> Similarly for out-of-line ASI handling, which is already beyond messy, with
> handling for all cpus thrown in the same switch statement.

Yes. I think we need to split SPARCv9 standard ASIs from CPU-specific
ones, call cpu-specific handlers first and standard handler
afterwards.
But not in this series.

Artyom

Richard Henderson Oct. 11, 2016, 3:08 p.m. UTC | #5

On 10/11/2016 08:51 AM, Artyom Tarasenko wrote:
> On Tue, Oct 11, 2016 at 7:50 AM, Richard Henderson <rth@twiddle.net> wrote:
>> On 10/10/2016 04:45 PM, Artyom Tarasenko wrote:
>>>>
>>>> Hmm.  Would it make more sense to reorg these as
>>>>
>>>>   TTE_US1_*
>>>>   TTE_UA2005_*
>>>>
>>>> with some duplication for the bits that are shared?
>>>> As is, it's pretty hard to tell which actually change...
>>>
>>>
>>> All of them :-)
>>> I'm not sure about renaming: the US1 format is still used in T1 on the
>>> read
>>> access.
>>>
>>> On the other hand, it's not used in T2. And then again we don't have the
>>> T2
>>> emulation yet.
>>
>>
>> Oh my.  Different on T2 as well?
>
> T2 has more used bits, and can not use the US1 format, I think.
>
>> I wonder if it would make sense to have different functions with which to
>> fill in the CPUClass hooks (or invent new SPARCCPUClass hooks as necessary)
>> for the major entry points.
>>
>> E.g. sparc_cpu_handle_mmu_fault or get_physical_address could be hooked, so
>> that the choice of how to handle the tlb miss is chosen at startup time, and
>> not during each fault.  One can arrange subroutines as necessary to share
>> code between the alternate routines, such as when T1 needs to use parts of
>> US1.
>
> Yes, I plan to do it once I get to T2 emulation.

Ok.

>> Similarly for out-of-line ASI handling, which is already beyond messy, with
>> handling for all cpus thrown in the same switch statement.
>
> Yes. I think we need to split SPARCv9 standard ASIs from CPU-specific
> ones, call cpu-specific handlers first and standard handler
> afterwards.
> But not in this series.

Fair enough.

What I would most like to see, for QEMU, is an artificial sun4v compatible 
machine that implements a "hardware" page table walk.  I.e. no use of 
SparcTLBEntry, but walking the page tables directly.

Because QEMU can then satisfy a page lookup internally, without having to 
longjmp out of a memory reference in progress in order to restart the cpu for 
the software TLB miss handler, the emulation runs about 30-50% faster.  At 
least that has been my experience emulating Alpha vs MIPS.

It would require custom roms, but those should be fairly easy to modify from 
the existing source.


r~

Artyom Tarasenko Oct. 12, 2016, 11:18 a.m. UTC | #6

On Tue, Oct 11, 2016 at 5:08 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 10/11/2016 08:51 AM, Artyom Tarasenko wrote:
>>
>> On Tue, Oct 11, 2016 at 7:50 AM, Richard Henderson <rth@twiddle.net>
>> wrote:
>>>
>>> On 10/10/2016 04:45 PM, Artyom Tarasenko wrote:
>>>>>
>>>>>
>>>>> Hmm.  Would it make more sense to reorg these as
>>>>>
>>>>>   TTE_US1_*
>>>>>   TTE_UA2005_*
>>>>>
>>>>> with some duplication for the bits that are shared?
>>>>> As is, it's pretty hard to tell which actually change...
>>>>
>>>>
>>>>
>>>> All of them :-)
>>>> I'm not sure about renaming: the US1 format is still used in T1 on the
>>>> read
>>>> access.
>>>>
>>>> On the other hand, it's not used in T2. And then again we don't have the
>>>> T2
>>>> emulation yet.
>>>
>>>
>>>
>>> Oh my.  Different on T2 as well?
>>
>>
>> T2 has more used bits, and can not use the US1 format, I think.
>>
>>> I wonder if it would make sense to have different functions with which to
>>> fill in the CPUClass hooks (or invent new SPARCCPUClass hooks as
>>> necessary)
>>> for the major entry points.
>>>
>>> E.g. sparc_cpu_handle_mmu_fault or get_physical_address could be hooked,
>>> so
>>> that the choice of how to handle the tlb miss is chosen at startup time,
>>> and
>>> not during each fault.  One can arrange subroutines as necessary to share
>>> code between the alternate routines, such as when T1 needs to use parts
>>> of
>>> US1.
>>
>>
>> Yes, I plan to do it once I get to T2 emulation.
>
>
> Ok.
>
>>> Similarly for out-of-line ASI handling, which is already beyond messy,
>>> with
>>> handling for all cpus thrown in the same switch statement.
>>
>>
>> Yes. I think we need to split SPARCv9 standard ASIs from CPU-specific
>> ones, call cpu-specific handlers first and standard handler
>> afterwards.
>> But not in this series.
>
>
> Fair enough.
>
> What I would most like to see, for QEMU, is an artificial sun4v compatible
> machine that implements a "hardware" page table walk.  I.e. no use of
> SparcTLBEntry, but walking the page tables directly.
>
> Because QEMU can then satisfy a page lookup internally, without having to
> longjmp out of a memory reference in progress in order to restart the cpu
> for the software TLB miss handler, the emulation runs about 30-50% faster.
> At least that has been my experience emulating Alpha vs MIPS.
>
> It would require custom roms, but those should be fairly easy to modify from
> the existing source.
>

Maybe it's even possible without the modifications. For instance,
implement the table walk compatible with the current hypervisor, and
then just add possibility to overlay hypervisor call using some CPU
feature flag.

Richard Henderson Oct. 12, 2016, 1:25 p.m. UTC | #7

On 10/12/2016 06:18 AM, Artyom Tarasenko wrote:
>> What I would most like to see, for QEMU, is an artificial sun4v compatible
>> machine that implements a "hardware" page table walk.  I.e. no use of
>> SparcTLBEntry, but walking the page tables directly.
>>
>> Because QEMU can then satisfy a page lookup internally, without having to
>> longjmp out of a memory reference in progress in order to restart the cpu
>> for the software TLB miss handler, the emulation runs about 30-50% faster.
>> At least that has been my experience emulating Alpha vs MIPS.
>>
>> It would require custom roms, but those should be fairly easy to modify from
>> the existing source.
>>
>
> Maybe it's even possible without the modifications. For instance,
> implement the table walk compatible with the current hypervisor, and
> then just add possibility to overlay hypervisor call using some CPU
> feature flag.

Maybe so.  What we lack is being given direct access to the page table base. 
But we know that the CPU structure is in the hypervisor shadow register 0, and 
that offset CPU_ROOT is the page table base.

As long as we're willing to hard-code these two facts concerning any rom we 
care to load, we could in fact implement the tlb miss success path inside QEMU. 
  We would let the rom re-do the work for the tlb miss failure path, on the way 
to raising the exception with the supervisor.

r~

[03/29] target-sparc: add UA2005 TTE bit #defines

Commit Message

Comments

Patch