[v3,5/5] fdt: Pass the resulting device tree to QEMU

Message ID	20171003051523.17650-6-aik@ozlabs.ru
State	Superseded
Headers	show Return-Path: <slof-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org> From: Alexey Kardashevskiy <aik@ozlabs.ru> To: slof@lists.ozlabs.org Date: Tue, 3 Oct 2017 16:15:23 +1100 Message-Id: <20171003051523.17650-6-aik@ozlabs.ru> In-Reply-To: <20171003051523.17650-1-aik@ozlabs.ru> References: <20171003051523.17650-1-aik@ozlabs.ru> Subject: [SLOF] [PATCH slof v3 5/5] fdt: Pass the resulting device tree to QEMU Precedence: list Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: slof-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "SLOF" <slof-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org>
Series	fdt: Pass the resulting device tree to QEMU \| expand [v3,0/5] fdt: Pass the resulting device tree to QEMU [v3,1/5] Revert "board-qemu: add private hcall to inform host on "phandle" update" [v3,2/5] Revert "libhvcall: drop unused KVMPPC_H_REPORT_MC_ERR and KVMPPC_H_NMI_MCE defines" [v3,3/5] Revert "rtas: Improve error handling in instantiate-rtas" [v3,4/5] Revert "Add private HCALL to inform updated RTAS base and entry" [v3,5/5] fdt: Pass the resulting device tree to QEMU

Alexey Kardashevskiy Oct. 3, 2017, 5:15 a.m. UTC

This creates flatten device tree and passes it to QEMU via a custom
hypercall right before jumping to RTAS.

On a machine with 256 CPUs and 256 virtual Intel E1000 devices the blob
is 360KB (356KB structs and 20KB of strings), building such a tree takes
~2s on a POWER8 box. A simple tree with 1 CPU and a couple of devices
takes 38ms and creates 16KB blob.

This preloads strings with 40 property names from CPU and PCI device nodes
and the strings lookup only searches within these. Without string reusing
at all, the strings blob is 200KB and rendering time is 1.7sec; with
unlimited reusing, the strings blob is 4KB and rendering time is 2.8sec.

While we are here, fix the version handling in fdt-init. It only matters
a little for the fdt-debug==1 case though.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

Changes:
v3:
* fixed stack handling after hcall returned
* fixed format versions in both rendering and parsing paths
* rebased on top of removed unused hvcalls
* renamed used variables to have fdtfl- prefixes as there are already
some for parsing the initial dt

v2:
* fixed comments from review
* added strings cache
* changed last_compat_vers from 0x17 to 0x16 as suggested by dwg

---

I tested the blob by storing it from QEMU to a file and decompiling it;
this produces error which I do not really
understand as the name of the root is an empty string (literaly:
00 00 00 01  00 00 00 00) and yet this error:

aik@fstn1-p1:~$ dtc -f -I dtb -O dts -o dbg.dts dbg.dtb
ERROR (name_properties): "name" property in / is incorrect ("/" instead of base node name)
Warning: Input tree has errors, output forced
---
 lib/libhvcall/libhvcall.h |   3 +-
 board-qemu/slof/fdt.fs    | 284 +++++++++++++++++++++++++++++++++++++++++++++-
 board-qemu/slof/rtas.fs   |   7 ++
 lib/libhvcall/hvcall.code |   5 +
 lib/libhvcall/hvcall.in   |   1 +
 5 files changed, 297 insertions(+), 3 deletions(-)

David Gibson Oct. 3, 2017, 6:28 a.m. UTC | #1

On Tue, Oct 03, 2017 at 04:15:23PM +1100, Alexey Kardashevskiy wrote:
> This creates flatten device tree and passes it to QEMU via a custom
> hypercall right before jumping to RTAS.

Uh.. "right before jumping to RTAS" isn't very clear to me.  It's
called at quiesce time, right, which we anticipate is the last thing
SLOF will do before being wiped away by the OS.  That's not really
connected to RTAS AFAICT.

> On a machine with 256 CPUs and 256 virtual Intel E1000 devices the blob
> is 360KB (356KB structs and 20KB of strings), building such a tree takes
> ~2s on a POWER8 box. A simple tree with 1 CPU and a couple of devices
> takes 38ms and creates 16KB blob.

Hrm.  360/16 * 38ms = 855ms < 1s.  Which suggests we have something
that's slower than O(n) here, which is concerning.  256 cpus and 256
devices is largish, but it's not a ludicrous size.

> This preloads strings with 40 property names from CPU and PCI device nodes
> and the strings lookup only searches within these. Without string reusing
> at all, the strings blob is 200KB and rendering time is 1.7sec; with
> unlimited reusing, the strings blob is 4KB and rendering time is
> 2.8sec.

Blech, that's a prety ugly set of tradeoffs.  When I suggested this
approach I hadn't thought it would take so long to flatten the tree in
Forth.

> 
> While we are here, fix the version handling in fdt-init. It only matters
> a little for the fdt-debug==1 case though.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> Changes:
> v3:
> * fixed stack handling after hcall returned
> * fixed format versions in both rendering and parsing paths
> * rebased on top of removed unused hvcalls
> * renamed used variables to have fdtfl- prefixes as there are already
> some for parsing the initial dt
> 
> v2:
> * fixed comments from review
> * added strings cache
> * changed last_compat_vers from 0x17 to 0x16 as suggested by dwg
> 
> ---
> 
> I tested the blob by storing it from QEMU to a file and decompiling it;
> this produces error which I do not really
> understand as the name of the root is an empty string (literaly:
> 00 00 00 01  00 00 00 00) and yet this error:
> 
> aik@fstn1-p1:~$ dtc -f -I dtb -O dts -o dbg.dts dbg.dtb
> ERROR (name_properties): "name" property in / is incorrect ("/" instead of base node name)
> Warning: Input tree has errors, output forced

The reason for this error is it's not to do with the "node name" as
given after the FDT_BEGIN_TAG, but with the "name" property.  That's
not required in FDT, but if supplied should match that node name
exactly.  IIRC, however, for human readability runtime OF treats
'name' of the root node as being '/'.

Arguably that is a bug in dtc - '/' as the name property in the root
is probably acceptable.  On the other hand, you should probably just
omit the 'name' properties when you flatten the tree.

> ---
>  lib/libhvcall/libhvcall.h |   3 +-
>  board-qemu/slof/fdt.fs    | 284 +++++++++++++++++++++++++++++++++++++++++++++-
>  board-qemu/slof/rtas.fs   |   7 ++
>  lib/libhvcall/hvcall.code |   5 +
>  lib/libhvcall/hvcall.in   |   1 +
>  5 files changed, 297 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/libhvcall/libhvcall.h b/lib/libhvcall/libhvcall.h
> index 6356a62..3fa4398 100644
> --- a/lib/libhvcall/libhvcall.h
> +++ b/lib/libhvcall/libhvcall.h
> @@ -24,7 +24,8 @@
>  #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
>  /* Client Architecture support */
>  #define KVMPPC_H_CAS            (KVMPPC_HCALL_BASE + 0x2)
> -#define KVMPPC_HCALL_MAX        KVMPPC_H_CAS
> +#define KVMPPC_H_UPDATE_DT      (KVMPPC_HCALL_BASE + 0x3)
> +#define KVMPPC_HCALL_MAX        KVMPPC_H_UPDATE_DT
>  
>  #ifndef __ASSEMBLY__
>  
> diff --git a/board-qemu/slof/fdt.fs b/board-qemu/slof/fdt.fs
> index 851645e..548cc25 100644
> --- a/board-qemu/slof/fdt.fs
> +++ b/board-qemu/slof/fdt.fs
> @@ -27,7 +27,7 @@ struct
>    4 field >fdth_boot_cpu
>    4 field >fdth_string_size
>    4 field >fdth_struct_size
> -drop
> +constant /fdth
>  
>  h# d00dfeed constant OF_DT_HEADER
>  h#        1 constant OF_DT_BEGIN_NODE
> @@ -69,7 +69,7 @@ fdt-start fdt-init
>          dup >fdth_version l@ 3 >= IF
>              ."  strings size     : 0x" dup >fdth_string_size l@ . cr
>          THEN
> -        dup >fdth_version l@ 17 >= IF
> +        dup >fdth_version l@ 11 >= IF
>              ."  struct size      : 0x" dup >fdth_struct_size l@ . cr
>          THEN
>      THEN
> @@ -439,4 +439,284 @@ r> drop
>      fdt-cas-fix?
>  ;
>  
> +VARIABLE fdtfl-struct
> +VARIABLE fdtfl-struct-here
> +VARIABLE fdtfl-strings
> +VARIABLE fdtfl-strings-cache
> +VARIABLE fdtfl-strings-here
> +VARIABLE fdtfl-strings-reused \ debug only
> +VARIABLE fdt-ms \ debug only
> +
> +: fdt-skip-string ( cur -- cur )
> +    BEGIN
> +        dup c@
> +    WHILE
> +        1+
> +    REPEAT
> +    4 + -4 and
> +;
> +
> +: zstring=  ( str len zstr -- flag )
> +    2dup + c@ 0<> IF
> +        3drop false
> +        EXIT
> +    THEN
> +    swap comp 0=
> +;
> +
> +: fdt-find-string ( name namelen -- nameoff true | false )
> +    fdtfl-strings @
> +    BEGIN
> +        dup fdtfl-strings-cache @ <
> +    WHILE
> +        3dup zstring= IF
> +            fdtfl-strings @ -
> +            -rot
> +            2drop
> +            true
> +            EXIT
> +        THEN
> +        fdt-skip-string
> +    REPEAT
> +    3drop
> +    false
> +;
> +
> +: fdt-str-allot ( len -- ) fdtfl-strings-here @ + to fdtfl-strings-here ;
> +: fdt-str-c, ( char -- ) fdtfl-strings-here @ 1 fdt-str-allot c! ;
> +: fdt-str-align  ( -- )
> +    fdtfl-strings-here @
> +    dup dup 4 #aligned swap -   ( here bytes-to-erase )
> +    dup -rot
> +    erase
> +    fdt-str-allot
> +;
> +: fdt-str-bytes, ( data len -- ) fdtfl-strings-here @ over fdt-str-allot swap move ;
> +: fdt-str-ztr, ( str len -- ) fdt-str-bytes, 0 fdt-str-c, ;
> +
> +: fdt-add-string ( name namelen -- nameoff )
> +    fdtfl-strings-here @ -rot
> +    fdt-str-ztr,
> +    fdt-str-align
> +    fdtfl-strings @ -
> +;
> +
> +: fdt-get-string ( name namelen -- nameoff )
> +    2dup fdt-find-string IF
> +        -rot 2drop
> +        fdt-debug IF
> +           1 fdtfl-strings-reused +!
> +        THEN
> +        EXIT
> +    THEN
> +    fdt-add-string
> +;
> +
> +: fdt-allot ( len -- ) fdtfl-struct-here @ + to fdtfl-struct-here ;
> +: fdt-c, ( char -- ) fdtfl-struct-here @ 1 fdt-allot c! ;
> +: fdt-align  ( -- )
> +    fdtfl-struct-here @
> +    dup dup 4 #aligned swap -   ( here bytes-to-erase )
> +    dup -rot
> +    erase
> +    fdt-allot
> +;
> +: fdt-bytes, ( data len -- ) fdtfl-struct-here @ over fdt-allot swap move ;
> +: fdt-ztr, ( str len -- ) fdt-bytes, 0 fdt-c, ;
> +: fdt-l, ( token -- ) fdtfl-struct-here @ l! /l fdt-allot ;
> +
> +: fdt-begin-node ( name namelen -- )
> +    OF_DT_BEGIN_NODE fdt-l,
> +    2dup 1 = swap c@ [char] / = and  \ is it "/"?
> +    IF
> +        2drop s" " \ dtc is still unhappy though
> +    THEN
> +    fdt-ztr,
> +    fdt-align
> +;
> +
> +: fdt-end-node ( -- ) OF_DT_END_NODE fdt-l, ;
> +
> +: fdt-prop ( prop len name namelen -- )
> +    OF_DT_PROP fdt-l,
> +
> +    \ get string offset
> +    fdt-get-string      ( prop len nameoff )
> +
> +    \ store len and nameoff
> +    over fdt-l,
> +    fdt-l,              ( prop len )
> +
> +    \ now store the bytes
> +    fdt-bytes,
> +    fdt-align
> +;
> +
> +: fdt-end ( -- ) OF_DT_END fdt-l, ;
> +
> +: fdt-properties ( phandle -- )
> +    dup encode-int s" phandle" fdt-prop
> +    >r
> +    s" "
> +    BEGIN
> +        r@ next-property
> +    WHILE
> +        2dup
> +        2dup r@ get-property
> +        not IF
> +            2swap fdt-prop
> +        THEN
> +    REPEAT
> +    r>
> +    drop
> +;
> +
> +: fdt-flatten-node ( node --  )
> +    fdt-debug 1 > IF dup node>path type cr THEN
> +    dup node>qname fdt-begin-node
> +    dup fdt-properties
> +    child
> +    BEGIN
> +    dup
> +    WHILE
> +        dup recurse
> +        peer
> +    REPEAT
> +    drop
> +    fdt-end-node
> +;
> +
> +: fdtfl-strings-preload ( -- )
> +    s" reg" fdt-add-string drop
> +    s" status" fdt-add-string drop
> +    s" 64-bit" fdt-add-string drop
> +    s" phandle" fdt-add-string drop
> +    s" ibm,vmx" fdt-add-string drop
> +    s" ibm,dfp" fdt-add-string drop
> +    s" slb-size" fdt-add-string drop
> +    s" ibm,purr" fdt-add-string drop
> +    s" vendor-id" fdt-add-string drop
> +    s" device-id" fdt-add-string drop
> +    s" min-grant" fdt-add-string drop
> +    s" class-code" fdt-add-string drop
> +    s" compatible" fdt-add-string drop
> +    s" interrupts" fdt-add-string drop
> +    s" cpu-version" fdt-add-string drop
> +    s" #size-cells" fdt-add-string drop
> +    s" ibm,req#msi" fdt-add-string drop
> +    s" revision-id" fdt-add-string drop
> +    s" device_type" fdt-add-string drop
> +    s" max-latency" fdt-add-string drop
> +    s" ibm,chip-id" fdt-add-string drop
> +    s" ibm,pft-size" fdt-add-string drop
> +    s" ibm,slb-size" fdt-add-string drop
> +    s" devsel-speed" fdt-add-string drop
> +    s" ibm,loc-code" fdt-add-string drop
> +    s" subsystem-id" fdt-add-string drop
> +    s" d-cache-size" fdt-add-string drop
> +    s" i-cache-size" fdt-add-string drop
> +    s" #address-cells" fdt-add-string drop
> +    s" clock-frequency" fdt-add-string drop
> +    s" cache-line-size" fdt-add-string drop
> +    s" ibm,pa-features" fdt-add-string drop
> +    s" ibm,my-drc-index" fdt-add-string drop
> +    s" d-cache-line-size" fdt-add-string drop
> +    s" i-cache-line-size" fdt-add-string drop
> +    s" assigned-addresses" fdt-add-string drop
> +    s" d-cache-block-size" fdt-add-string drop
> +    s" i-cache-block-size" fdt-add-string drop
> +    s" timebase-frequency" fdt-add-string drop
> +    s" subsystem-vendor-id" fdt-add-string drop
> +    s" ibm,segment-page-sizes" fdt-add-string drop
> +    s" ibm,ppc-interrupt-server#s" fdt-add-string drop
> +    s" ibm,processor-segment-sizes" fdt-add-string drop
> +    s" ibm,ppc-interrupt-gserver#s" fdt-add-string drop
> +;
> +
> +: fdt-append-blob ( bytes cur blob -- cur )
> +    3dup -rot swap move
> +    drop +
> +;
> +
> +: fdt-flatten-tree ( root -- tree )
> +    200000 alloc-mem dup fdtfl-struct-here ! fdtfl-struct !
> +    200000 alloc-mem dup fdtfl-strings-here ! fdtfl-strings !
> +
> +    fdt-debug IF
> +        0 fdtfl-strings-reused !
> +        milliseconds fdt-ms !
> +    THEN
> +
> +    \ Preload strings cache
> +    fdtfl-strings-preload
> +    fdtfl-strings-here @ fdtfl-strings-cache !
> +    \ Render the blobs
> +    fdt-flatten-node
> +    fdt-end
> +
> +    \ Calculate strings and struct sizes
> +    fdtfl-struct-here @ fdtfl-struct @ -
> +    fdtfl-strings-here @ fdtfl-strings @ - ( struct-len strings-len )
> +
> +    2dup + /fdth +
> +    10 + \ Reserve 16 bytes and an empty reserved block
> +
> +    fdt-debug IF
> +        3dup
> +        ." FDT flat size=" .d cr
> +        ." Strings size=" .d cr
> +        ." Struct size=" .d cr
> +        ." Reused strings=" fdtfl-strings-reused @ .d cr
> +        milliseconds fdt-ms @ -
> +        ." Took " .d ." ms" cr
> +    THEN
> +
> +    \ Allocate flatten DT blob
> +    dup alloc-mem                   ( struct-len strings-len total-len fdt )
> +    >r                              ( struct-len strings-len total-len r: fdt )
> +
> +    \ Write header
> +    OF_DT_HEADER        r@ >fdth_magic l!
> +    dup                 r@ >fdth_tsize l!
> +    /fdth 10 + 2 pick + r@ >fdth_struct_off l!
> +    /fdth 10 +          r@ >fdth_string_off l!
> +    /fdth               r@ >fdth_rsvmap_off l!
> +    11                  r@ >fdth_version l!
> +    10                  r@ >fdth_compat_vers l!
> +    0                   r@ >fdth_boot_cpu l!
> +    over                r@ >fdth_string_size l!
> +    2 pick              r@ >fdth_struct_size l!
> +                                    ( struct-len strings-len total-len r: fdt )
> +
> +    drop                            ( struct-len strings-len r: fdt )
> +    r@ /fdth +                      ( struct-len strings-len cur r: fdt )
> +
> +    \ Write the reserved entry
> +    0 over !
> +    cell+
> +    0 over !
> +    cell+                           ( struct-len strings-len cur r: fdt )
> +
> +    \ Write strings and struct blobs
> +    fdtfl-strings @ fdt-append-blob
> +    fdtfl-struct @ fdt-append-blob
> +    drop
> +
> +    \ Free temporary blobs
> +    fdtfl-struct @ 200000 free-mem
> +    fdtfl-strings @ 200000 free-mem
> +
> +    \ Return fdt
> +    r>
> +;
> +
> +: fdt-flatten-tree-free ( tree )
> +    dup >fdth_tsize l@ free-mem
> +;
> +
> +: fdt ( -- )
> +    " /" find-node
> +    fdt-flatten-tree
> +;
> +
>  s" /" find-node fdt-fix-phandles
> diff --git a/board-qemu/slof/rtas.fs b/board-qemu/slof/rtas.fs
> index b17157e..219bcda 100644
> --- a/board-qemu/slof/rtas.fs
> +++ b/board-qemu/slof/rtas.fs
> @@ -98,6 +98,13 @@ find-qemu-rtas
>  ;
>  
>  : rtas-quiesce ( -- )
> +    " /" find-node
> +    fdt-flatten-tree
> +    dup hv-update-dt ?dup IF
> +        \ Ignore hcall not implemented error, print error otherwise
> +        dup -2 <> IF ." HV-UPDATE-DT error: " . cr ELSE drop THEN
> +    THEN
> +    fdt-flatten-tree-free
>      " quiesce" rtas-get-token rtas-cb rtas>token l!
>      0 rtas-cb rtas>nargs l!
>      0 rtas-cb rtas>nret l!
> diff --git a/lib/libhvcall/hvcall.code b/lib/libhvcall/hvcall.code
> index 744469f..5918c90 100644
> --- a/lib/libhvcall/hvcall.code
> +++ b/lib/libhvcall/hvcall.code
> @@ -123,3 +123,8 @@ PRIM(check_X2d_and_X2d_patch_X2d_sc1)
>  
>  	patch_broken_sc1((void*)start, (void*)end, (void*)patch_ins);
>  MIRP
> +
> +PRIM(hv_X2d_update_X2d_dt)
> +	unsigned long dt = TOS.u;
> +	TOS.u = hv_generic(KVMPPC_H_UPDATE_DT, dt);
> +MIRP
> diff --git a/lib/libhvcall/hvcall.in b/lib/libhvcall/hvcall.in
> index e99d6d1..9193162 100644
> --- a/lib/libhvcall/hvcall.in
> +++ b/lib/libhvcall/hvcall.in
> @@ -30,4 +30,5 @@ cod(RX!)
>  
>  cod(hv-logical-memop)
>  cod(hv-cas)
> +cod(hv-update-dt)
>  cod(get-print-version)

Greg Kurz Oct. 3, 2017, 8:48 a.m. UTC | #2

On Tue,  3 Oct 2017 16:15:23 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> This creates flatten device tree and passes it to QEMU via a custom
> hypercall right before jumping to RTAS.
> 
> On a machine with 256 CPUs and 256 virtual Intel E1000 devices the blob
> is 360KB (356KB structs and 20KB of strings), building such a tree takes
> ~2s on a POWER8 box. A simple tree with 1 CPU and a couple of devices
> takes 38ms and creates 16KB blob.
> 
> This preloads strings with 40 property names from CPU and PCI device nodes
> and the strings lookup only searches within these. Without string reusing
> at all, the strings blob is 200KB and rendering time is 1.7sec; with
> unlimited reusing, the strings blob is 4KB and rendering time is 2.8sec.
> 
> While we are here, fix the version handling in fdt-init. It only matters
> a little for the fdt-debug==1 case though.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> Changes:
> v3:
> * fixed stack handling after hcall returned
> * fixed format versions in both rendering and parsing paths
> * rebased on top of removed unused hvcalls
> * renamed used variables to have fdtfl- prefixes as there are already
> some for parsing the initial dt
> 

Yeah and this was confusing CAS because ibm,client-architecture-support
uses fdt-struct, but wasn't accessing the right one... it took me some
time to understand what was happening :-\

Maybe we should have word in fdt.fs to avoid using fdt-struct in some
other file ?

> v2:
> * fixed comments from review
> * added strings cache
> * changed last_compat_vers from 0x17 to 0x16 as suggested by dwg
> 
> ---
> 
> I tested the blob by storing it from QEMU to a file and decompiling it;
> this produces error which I do not really
> understand as the name of the root is an empty string (literaly:
> 00 00 00 01  00 00 00 00) and yet this error:
> 
> aik@fstn1-p1:~$ dtc -f -I dtb -O dts -o dbg.dts dbg.dtb
> ERROR (name_properties): "name" property in / is incorrect ("/" instead of base node name)
> Warning: Input tree has errors, output forced
> ---
>  lib/libhvcall/libhvcall.h |   3 +-
>  board-qemu/slof/fdt.fs    | 284 +++++++++++++++++++++++++++++++++++++++++++++-
>  board-qemu/slof/rtas.fs   |   7 ++
>  lib/libhvcall/hvcall.code |   5 +
>  lib/libhvcall/hvcall.in   |   1 +
>  5 files changed, 297 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/libhvcall/libhvcall.h b/lib/libhvcall/libhvcall.h
> index 6356a62..3fa4398 100644
> --- a/lib/libhvcall/libhvcall.h
> +++ b/lib/libhvcall/libhvcall.h
> @@ -24,7 +24,8 @@
>  #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
>  /* Client Architecture support */
>  #define KVMPPC_H_CAS            (KVMPPC_HCALL_BASE + 0x2)
> -#define KVMPPC_HCALL_MAX        KVMPPC_H_CAS
> +#define KVMPPC_H_UPDATE_DT      (KVMPPC_HCALL_BASE + 0x3)
> +#define KVMPPC_HCALL_MAX        KVMPPC_H_UPDATE_DT
>  
>  #ifndef __ASSEMBLY__
>  
> diff --git a/board-qemu/slof/fdt.fs b/board-qemu/slof/fdt.fs
> index 851645e..548cc25 100644
> --- a/board-qemu/slof/fdt.fs
> +++ b/board-qemu/slof/fdt.fs
> @@ -27,7 +27,7 @@ struct
>    4 field >fdth_boot_cpu
>    4 field >fdth_string_size
>    4 field >fdth_struct_size
> -drop
> +constant /fdth
>  
>  h# d00dfeed constant OF_DT_HEADER
>  h#        1 constant OF_DT_BEGIN_NODE
> @@ -69,7 +69,7 @@ fdt-start fdt-init
>          dup >fdth_version l@ 3 >= IF
>              ."  strings size     : 0x" dup >fdth_string_size l@ . cr
>          THEN
> -        dup >fdth_version l@ 17 >= IF
> +        dup >fdth_version l@ 11 >= IF
>              ."  struct size      : 0x" dup >fdth_struct_size l@ . cr
>          THEN
>      THEN
> @@ -439,4 +439,284 @@ r> drop
>      fdt-cas-fix?
>  ;
>  
> +VARIABLE fdtfl-struct
> +VARIABLE fdtfl-struct-here
> +VARIABLE fdtfl-strings
> +VARIABLE fdtfl-strings-cache
> +VARIABLE fdtfl-strings-here
> +VARIABLE fdtfl-strings-reused \ debug only
> +VARIABLE fdt-ms \ debug only
> +
> +: fdt-skip-string ( cur -- cur )
> +    BEGIN
> +        dup c@
> +    WHILE
> +        1+
> +    REPEAT
> +    4 + -4 and
> +;
> +
> +: zstring=  ( str len zstr -- flag )
> +    2dup + c@ 0<> IF
> +        3drop false
> +        EXIT
> +    THEN
> +    swap comp 0=
> +;
> +
> +: fdt-find-string ( name namelen -- nameoff true | false )
> +    fdtfl-strings @
> +    BEGIN
> +        dup fdtfl-strings-cache @ <
> +    WHILE
> +        3dup zstring= IF
> +            fdtfl-strings @ -
> +            -rot
> +            2drop
> +            true
> +            EXIT
> +        THEN
> +        fdt-skip-string
> +    REPEAT
> +    3drop
> +    false
> +;
> +
> +: fdt-str-allot ( len -- ) fdtfl-strings-here @ + to fdtfl-strings-here ;
> +: fdt-str-c, ( char -- ) fdtfl-strings-here @ 1 fdt-str-allot c! ;
> +: fdt-str-align  ( -- )
> +    fdtfl-strings-here @
> +    dup dup 4 #aligned swap -   ( here bytes-to-erase )
> +    dup -rot
> +    erase
> +    fdt-str-allot
> +;
> +: fdt-str-bytes, ( data len -- ) fdtfl-strings-here @ over fdt-str-allot swap move ;
> +: fdt-str-ztr, ( str len -- ) fdt-str-bytes, 0 fdt-str-c, ;
> +
> +: fdt-add-string ( name namelen -- nameoff )
> +    fdtfl-strings-here @ -rot
> +    fdt-str-ztr,
> +    fdt-str-align
> +    fdtfl-strings @ -
> +;
> +
> +: fdt-get-string ( name namelen -- nameoff )
> +    2dup fdt-find-string IF
> +        -rot 2drop
> +        fdt-debug IF
> +           1 fdtfl-strings-reused +!
> +        THEN
> +        EXIT
> +    THEN
> +    fdt-add-string
> +;
> +
> +: fdt-allot ( len -- ) fdtfl-struct-here @ + to fdtfl-struct-here ;
> +: fdt-c, ( char -- ) fdtfl-struct-here @ 1 fdt-allot c! ;
> +: fdt-align  ( -- )
> +    fdtfl-struct-here @
> +    dup dup 4 #aligned swap -   ( here bytes-to-erase )
> +    dup -rot
> +    erase
> +    fdt-allot
> +;
> +: fdt-bytes, ( data len -- ) fdtfl-struct-here @ over fdt-allot swap move ;
> +: fdt-ztr, ( str len -- ) fdt-bytes, 0 fdt-c, ;
> +: fdt-l, ( token -- ) fdtfl-struct-here @ l! /l fdt-allot ;
> +
> +: fdt-begin-node ( name namelen -- )
> +    OF_DT_BEGIN_NODE fdt-l,
> +    2dup 1 = swap c@ [char] / = and  \ is it "/"?
> +    IF
> +        2drop s" " \ dtc is still unhappy though
> +    THEN
> +    fdt-ztr,
> +    fdt-align
> +;
> +
> +: fdt-end-node ( -- ) OF_DT_END_NODE fdt-l, ;
> +
> +: fdt-prop ( prop len name namelen -- )
> +    OF_DT_PROP fdt-l,
> +
> +    \ get string offset
> +    fdt-get-string      ( prop len nameoff )
> +
> +    \ store len and nameoff
> +    over fdt-l,
> +    fdt-l,              ( prop len )
> +
> +    \ now store the bytes
> +    fdt-bytes,
> +    fdt-align
> +;
> +
> +: fdt-end ( -- ) OF_DT_END fdt-l, ;
> +
> +: fdt-properties ( phandle -- )
> +    dup encode-int s" phandle" fdt-prop
> +    >r
> +    s" "
> +    BEGIN
> +        r@ next-property
> +    WHILE
> +        2dup
> +        2dup r@ get-property
> +        not IF
> +            2swap fdt-prop
> +        THEN
> +    REPEAT
> +    r>
> +    drop
> +;
> +
> +: fdt-flatten-node ( node --  )
> +    fdt-debug 1 > IF dup node>path type cr THEN
> +    dup node>qname fdt-begin-node
> +    dup fdt-properties
> +    child
> +    BEGIN
> +    dup
> +    WHILE
> +        dup recurse
> +        peer
> +    REPEAT
> +    drop
> +    fdt-end-node
> +;
> +
> +: fdtfl-strings-preload ( -- )
> +    s" reg" fdt-add-string drop
> +    s" status" fdt-add-string drop
> +    s" 64-bit" fdt-add-string drop
> +    s" phandle" fdt-add-string drop
> +    s" ibm,vmx" fdt-add-string drop
> +    s" ibm,dfp" fdt-add-string drop
> +    s" slb-size" fdt-add-string drop
> +    s" ibm,purr" fdt-add-string drop
> +    s" vendor-id" fdt-add-string drop
> +    s" device-id" fdt-add-string drop
> +    s" min-grant" fdt-add-string drop
> +    s" class-code" fdt-add-string drop
> +    s" compatible" fdt-add-string drop
> +    s" interrupts" fdt-add-string drop
> +    s" cpu-version" fdt-add-string drop
> +    s" #size-cells" fdt-add-string drop
> +    s" ibm,req#msi" fdt-add-string drop
> +    s" revision-id" fdt-add-string drop
> +    s" device_type" fdt-add-string drop
> +    s" max-latency" fdt-add-string drop
> +    s" ibm,chip-id" fdt-add-string drop
> +    s" ibm,pft-size" fdt-add-string drop
> +    s" ibm,slb-size" fdt-add-string drop
> +    s" devsel-speed" fdt-add-string drop
> +    s" ibm,loc-code" fdt-add-string drop
> +    s" subsystem-id" fdt-add-string drop
> +    s" d-cache-size" fdt-add-string drop
> +    s" i-cache-size" fdt-add-string drop
> +    s" #address-cells" fdt-add-string drop
> +    s" clock-frequency" fdt-add-string drop
> +    s" cache-line-size" fdt-add-string drop
> +    s" ibm,pa-features" fdt-add-string drop
> +    s" ibm,my-drc-index" fdt-add-string drop
> +    s" d-cache-line-size" fdt-add-string drop
> +    s" i-cache-line-size" fdt-add-string drop
> +    s" assigned-addresses" fdt-add-string drop
> +    s" d-cache-block-size" fdt-add-string drop
> +    s" i-cache-block-size" fdt-add-string drop
> +    s" timebase-frequency" fdt-add-string drop
> +    s" subsystem-vendor-id" fdt-add-string drop
> +    s" ibm,segment-page-sizes" fdt-add-string drop
> +    s" ibm,ppc-interrupt-server#s" fdt-add-string drop
> +    s" ibm,processor-segment-sizes" fdt-add-string drop
> +    s" ibm,ppc-interrupt-gserver#s" fdt-add-string drop
> +;
> +
> +: fdt-append-blob ( bytes cur blob -- cur )
> +    3dup -rot swap move
> +    drop +
> +;
> +
> +: fdt-flatten-tree ( root -- tree )
> +    200000 alloc-mem dup fdtfl-struct-here ! fdtfl-struct !
> +    200000 alloc-mem dup fdtfl-strings-here ! fdtfl-strings !
> +
> +    fdt-debug IF
> +        0 fdtfl-strings-reused !
> +        milliseconds fdt-ms !
> +    THEN
> +
> +    \ Preload strings cache
> +    fdtfl-strings-preload
> +    fdtfl-strings-here @ fdtfl-strings-cache !
> +    \ Render the blobs
> +    fdt-flatten-node
> +    fdt-end
> +
> +    \ Calculate strings and struct sizes
> +    fdtfl-struct-here @ fdtfl-struct @ -
> +    fdtfl-strings-here @ fdtfl-strings @ - ( struct-len strings-len )
> +
> +    2dup + /fdth +
> +    10 + \ Reserve 16 bytes and an empty reserved block
> +
> +    fdt-debug IF
> +        3dup
> +        ." FDT flat size=" .d cr
> +        ." Strings size=" .d cr
> +        ." Struct size=" .d cr
> +        ." Reused strings=" fdtfl-strings-reused @ .d cr
> +        milliseconds fdt-ms @ -
> +        ." Took " .d ." ms" cr
> +    THEN
> +
> +    \ Allocate flatten DT blob
> +    dup alloc-mem                   ( struct-len strings-len total-len fdt )
> +    >r                              ( struct-len strings-len total-len r: fdt )
> +
> +    \ Write header
> +    OF_DT_HEADER        r@ >fdth_magic l!
> +    dup                 r@ >fdth_tsize l!
> +    /fdth 10 + 2 pick + r@ >fdth_struct_off l!
> +    /fdth 10 +          r@ >fdth_string_off l!
> +    /fdth               r@ >fdth_rsvmap_off l!
> +    11                  r@ >fdth_version l!
> +    10                  r@ >fdth_compat_vers l!
> +    0                   r@ >fdth_boot_cpu l!
> +    over                r@ >fdth_string_size l!
> +    2 pick              r@ >fdth_struct_size l!
> +                                    ( struct-len strings-len total-len r: fdt )
> +
> +    drop                            ( struct-len strings-len r: fdt )
> +    r@ /fdth +                      ( struct-len strings-len cur r: fdt )
> +
> +    \ Write the reserved entry
> +    0 over !
> +    cell+
> +    0 over !
> +    cell+                           ( struct-len strings-len cur r: fdt )
> +
> +    \ Write strings and struct blobs
> +    fdtfl-strings @ fdt-append-blob
> +    fdtfl-struct @ fdt-append-blob
> +    drop
> +
> +    \ Free temporary blobs
> +    fdtfl-struct @ 200000 free-mem
> +    fdtfl-strings @ 200000 free-mem
> +
> +    \ Return fdt
> +    r>
> +;
> +
> +: fdt-flatten-tree-free ( tree )
> +    dup >fdth_tsize l@ free-mem
> +;
> +
> +: fdt ( -- )
> +    " /" find-node
> +    fdt-flatten-tree
> +;
> +
>  s" /" find-node fdt-fix-phandles
> diff --git a/board-qemu/slof/rtas.fs b/board-qemu/slof/rtas.fs
> index b17157e..219bcda 100644
> --- a/board-qemu/slof/rtas.fs
> +++ b/board-qemu/slof/rtas.fs
> @@ -98,6 +98,13 @@ find-qemu-rtas
>  ;
>  
>  : rtas-quiesce ( -- )
> +    " /" find-node
> +    fdt-flatten-tree
> +    dup hv-update-dt ?dup IF
> +        \ Ignore hcall not implemented error, print error otherwise
> +        dup -2 <> IF ." HV-UPDATE-DT error: " . cr ELSE drop THEN
> +    THEN
> +    fdt-flatten-tree-free
>      " quiesce" rtas-get-token rtas-cb rtas>token l!
>      0 rtas-cb rtas>nargs l!
>      0 rtas-cb rtas>nret l!
> diff --git a/lib/libhvcall/hvcall.code b/lib/libhvcall/hvcall.code
> index 744469f..5918c90 100644
> --- a/lib/libhvcall/hvcall.code
> +++ b/lib/libhvcall/hvcall.code
> @@ -123,3 +123,8 @@ PRIM(check_X2d_and_X2d_patch_X2d_sc1)
>  
>  	patch_broken_sc1((void*)start, (void*)end, (void*)patch_ins);
>  MIRP
> +
> +PRIM(hv_X2d_update_X2d_dt)
> +	unsigned long dt = TOS.u;
> +	TOS.u = hv_generic(KVMPPC_H_UPDATE_DT, dt);
> +MIRP
> diff --git a/lib/libhvcall/hvcall.in b/lib/libhvcall/hvcall.in
> index e99d6d1..9193162 100644
> --- a/lib/libhvcall/hvcall.in
> +++ b/lib/libhvcall/hvcall.in
> @@ -30,4 +30,5 @@ cod(RX!)
>  
>  cod(hv-logical-memop)
>  cod(hv-cas)
> +cod(hv-update-dt)
>  cod(get-print-version)

Alexey Kardashevskiy Oct. 3, 2017, 8:53 a.m. UTC | #3

On 03/10/17 19:48, Greg Kurz wrote:
> On Tue,  3 Oct 2017 16:15:23 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> This creates flatten device tree and passes it to QEMU via a custom
>> hypercall right before jumping to RTAS.
>>
>> On a machine with 256 CPUs and 256 virtual Intel E1000 devices the blob
>> is 360KB (356KB structs and 20KB of strings), building such a tree takes
>> ~2s on a POWER8 box. A simple tree with 1 CPU and a couple of devices
>> takes 38ms and creates 16KB blob.
>>
>> This preloads strings with 40 property names from CPU and PCI device nodes
>> and the strings lookup only searches within these. Without string reusing
>> at all, the strings blob is 200KB and rendering time is 1.7sec; with
>> unlimited reusing, the strings blob is 4KB and rendering time is 2.8sec.
>>
>> While we are here, fix the version handling in fdt-init. It only matters
>> a little for the fdt-debug==1 case though.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> Changes:
>> v3:
>> * fixed stack handling after hcall returned
>> * fixed format versions in both rendering and parsing paths
>> * rebased on top of removed unused hvcalls
>> * renamed used variables to have fdtfl- prefixes as there are already
>> some for parsing the initial dt
>>
> 
> Yeah and this was confusing CAS because ibm,client-architecture-support
> uses fdt-struct, but wasn't accessing the right one... it took me some
> time to understand what was happening :-\
> 
> Maybe we should have word in fdt.fs to avoid using fdt-struct in some
> other file ?

This would not help - these variables are global.

Alexey Kardashevskiy Oct. 3, 2017, 9:09 a.m. UTC | #4

On 03/10/17 17:28, David Gibson wrote:
> On Tue, Oct 03, 2017 at 04:15:23PM +1100, Alexey Kardashevskiy wrote:
>> This creates flatten device tree and passes it to QEMU via a custom
>> hypercall right before jumping to RTAS.
> 
> Uh.. "right before jumping to RTAS" isn't very clear to me.  It's
> called at quiesce time, right, which we anticipate is the last thing
> SLOF will do before being wiped away by the OS.  That's not really
> connected to RTAS AFAICT.
> 

I did not think much on this paragraph, sorry :)


>> On a machine with 256 CPUs and 256 virtual Intel E1000 devices the blob
>> is 360KB (356KB structs and 20KB of strings), building such a tree takes
>> ~2s on a POWER8 box. A simple tree with 1 CPU and a couple of devices
>> takes 38ms and creates 16KB blob.
> 
> Hrm.  360/16 * 38ms = 855ms < 1s.

My simple case does not do PCI, with a PCI bridge and a E1000 it is 45ms.
But I see your point. It is j

>  Which suggests we have something
> that's slower than O(n) here, which is concerning.  256 cpus and 256
> devices is largish, but it's not a ludicrous size.

With 255 PCI devices, there are quite many bridges with property names
which I do not preload, that could explain.


>> This preloads strings with 40 property names from CPU and PCI device nodes
>> and the strings lookup only searches within these. Without string reusing
>> at all, the strings blob is 200KB and rendering time is 1.7sec; with
>> unlimited reusing, the strings blob is 4KB and rendering time is
>> 2.8sec.
> 
> Blech, that's a prety ugly set of tradeoffs.  When I suggested this
> approach I hadn't thought it would take so long to flatten the tree in
> Forth.

Fetching the tree to the guest kernel takes close to 10 seconds though,
using the client interface. Just to compare.

I can look at what Segher suggested when mentioned a word list, not sure
what this is and how it can help but worth to try I guess.


>>
>> While we are here, fix the version handling in fdt-init. It only matters
>> a little for the fdt-debug==1 case though.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> Changes:
>> v3:
>> * fixed stack handling after hcall returned
>> * fixed format versions in both rendering and parsing paths
>> * rebased on top of removed unused hvcalls
>> * renamed used variables to have fdtfl- prefixes as there are already
>> some for parsing the initial dt
>>
>> v2:
>> * fixed comments from review
>> * added strings cache
>> * changed last_compat_vers from 0x17 to 0x16 as suggested by dwg
>>
>> ---
>>
>> I tested the blob by storing it from QEMU to a file and decompiling it;
>> this produces error which I do not really
>> understand as the name of the root is an empty string (literaly:
>> 00 00 00 01  00 00 00 00) and yet this error:
>>
>> aik@fstn1-p1:~$ dtc -f -I dtb -O dts -o dbg.dts dbg.dtb
>> ERROR (name_properties): "name" property in / is incorrect ("/" instead of base node name)
>> Warning: Input tree has errors, output forced
> 
> The reason for this error is it's not to do with the "node name" as
> given after the FDT_BEGIN_TAG, but with the "name" property.  That's
> not required in FDT, but if supplied should match that node name
> exactly.  IIRC, however, for human readability runtime OF treats
> 'name' of the root node as being '/'.
> 
> Arguably that is a bug in dtc - '/' as the name property in the root
> is probably acceptable.  On the other hand, you should probably just
> omit the 'name' properties when you flatten the tree.

Ok, will do this.

Greg Kurz Oct. 3, 2017, 9:22 a.m. UTC | #5

On Tue, 3 Oct 2017 19:53:23 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 03/10/17 19:48, Greg Kurz wrote:
> > On Tue,  3 Oct 2017 16:15:23 +1100
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >   
> >> This creates flatten device tree and passes it to QEMU via a custom
> >> hypercall right before jumping to RTAS.
> >>
> >> On a machine with 256 CPUs and 256 virtual Intel E1000 devices the blob
> >> is 360KB (356KB structs and 20KB of strings), building such a tree takes
> >> ~2s on a POWER8 box. A simple tree with 1 CPU and a couple of devices
> >> takes 38ms and creates 16KB blob.
> >>
> >> This preloads strings with 40 property names from CPU and PCI device nodes
> >> and the strings lookup only searches within these. Without string reusing
> >> at all, the strings blob is 200KB and rendering time is 1.7sec; with
> >> unlimited reusing, the strings blob is 4KB and rendering time is 2.8sec.
> >>
> >> While we are here, fix the version handling in fdt-init. It only matters
> >> a little for the fdt-debug==1 case though.
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >>
> >> Changes:
> >> v3:
> >> * fixed stack handling after hcall returned
> >> * fixed format versions in both rendering and parsing paths
> >> * rebased on top of removed unused hvcalls
> >> * renamed used variables to have fdtfl- prefixes as there are already
> >> some for parsing the initial dt
> >>  
> > 
> > Yeah and this was confusing CAS because ibm,client-architecture-support
> > uses fdt-struct, but wasn't accessing the right one... it took me some
> > time to understand what was happening :-\
> > 
> > Maybe we should have word in fdt.fs to avoid using fdt-struct in some
> > other file ?  
> 
> This would not help - these variables are global.
> 

They do have global scope indeed, but it doesn't mean they necessarily have
to be used as such... 


>

Alexey Kardashevskiy Oct. 3, 2017, 12:13 p.m. UTC | #6

On 03/10/17 16:15, Alexey Kardashevskiy wrote:
> This creates flatten device tree and passes it to QEMU via a custom
> hypercall right before jumping to RTAS.
> 
> On a machine with 256 CPUs and 256 virtual Intel E1000 devices the blob
> is 360KB (356KB structs and 20KB of strings), building such a tree takes
> ~2s on a POWER8 box. A simple tree with 1 CPU and a couple of devices
> takes 38ms and creates 16KB blob.
> 
> This preloads strings with 40 property names from CPU and PCI device nodes
> and the strings lookup only searches within these. Without string reusing
> at all, the strings blob is 200KB and rendering time is 1.7sec; with
> unlimited reusing, the strings blob is 4KB and rendering time is 2.8sec.
> 
> While we are here, fix the version handling in fdt-init. It only matters
> a little for the fdt-debug==1 case though.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> Changes:
> v3:
> * fixed stack handling after hcall returned
> * fixed format versions in both rendering and parsing paths
> * rebased on top of removed unused hvcalls
> * renamed used variables to have fdtfl- prefixes as there are already
> some for parsing the initial dt
> 
> v2:
> * fixed comments from review
> * added strings cache
> * changed last_compat_vers from 0x17 to 0x16 as suggested by dwg
> 
> ---
> 
> I tested the blob by storing it from QEMU to a file and decompiling it;
> this produces error which I do not really
> understand as the name of the root is an empty string (literaly:
> 00 00 00 01  00 00 00 00) and yet this error:
> 
> aik@fstn1-p1:~$ dtc -f -I dtb -O dts -o dbg.dts dbg.dtb
> ERROR (name_properties): "name" property in / is incorrect ("/" instead of base node name)
> Warning: Input tree has errors, output forced
> ---
>  lib/libhvcall/libhvcall.h |   3 +-
>  board-qemu/slof/fdt.fs    | 284 +++++++++++++++++++++++++++++++++++++++++++++-
>  board-qemu/slof/rtas.fs   |   7 ++
>  lib/libhvcall/hvcall.code |   5 +
>  lib/libhvcall/hvcall.in   |   1 +
>  5 files changed, 297 insertions(+), 3 deletions(-)
> 

[...]

> diff --git a/board-qemu/slof/fdt.fs b/board-qemu/slof/fdt.fs
> index 851645e..548cc25 100644
> --- a/board-qemu/slof/fdt.fs
> +++ b/board-qemu/slof/fdt.fs
> @@ -27,7 +27,7 @@ struct

[...]

> +: fdt-properties ( phandle -- )
> +    dup encode-int s" phandle" fdt-prop
> +    >r
> +    s" "
> +    BEGIN
> +        r@ next-property
> +    WHILE
> +        2dup
> +        2dup r@ get-property
> +        not IF
> +            2swap fdt-prop
> +        THEN
> +    REPEAT
> +    r>
> +    drop
> +;


Doing as below still works but brings the time for the huge guest (256
CPUs, 256 E1000) down to 366ms as it does not search for properties every
time but just iterates through them, I picked this from ".properties". So
we are back in business :) But I'd really love someone to explain me how
that works as I am unable to parse this kind of internals myself... Segher,
please.


: fdt-properties ( phandle -- )
    dup encode-int s" phandle" fdt-prop

    node>properties @ cell+ @
    BEGIN
        dup
    WHILE
        dup

        link>
        dup >name name>string 2>r
        execute
        2r>
        fdt-prop

        @
    REPEAT

    drop
;

David Gibson Oct. 4, 2017, 1:02 a.m. UTC | #7

On Tue, Oct 03, 2017 at 11:13:51PM +1100, Alexey Kardashevskiy wrote:
> On 03/10/17 16:15, Alexey Kardashevskiy wrote:
> > This creates flatten device tree and passes it to QEMU via a custom
> > hypercall right before jumping to RTAS.
> > 
> > On a machine with 256 CPUs and 256 virtual Intel E1000 devices the blob
> > is 360KB (356KB structs and 20KB of strings), building such a tree takes
> > ~2s on a POWER8 box. A simple tree with 1 CPU and a couple of devices
> > takes 38ms and creates 16KB blob.
> > 
> > This preloads strings with 40 property names from CPU and PCI device nodes
> > and the strings lookup only searches within these. Without string reusing
> > at all, the strings blob is 200KB and rendering time is 1.7sec; with
> > unlimited reusing, the strings blob is 4KB and rendering time is 2.8sec.
> > 
> > While we are here, fix the version handling in fdt-init. It only matters
> > a little for the fdt-debug==1 case though.
> > 
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> > 
> > Changes:
> > v3:
> > * fixed stack handling after hcall returned
> > * fixed format versions in both rendering and parsing paths
> > * rebased on top of removed unused hvcalls
> > * renamed used variables to have fdtfl- prefixes as there are already
> > some for parsing the initial dt
> > 
> > v2:
> > * fixed comments from review
> > * added strings cache
> > * changed last_compat_vers from 0x17 to 0x16 as suggested by dwg
> > 
> > ---
> > 
> > I tested the blob by storing it from QEMU to a file and decompiling it;
> > this produces error which I do not really
> > understand as the name of the root is an empty string (literaly:
> > 00 00 00 01  00 00 00 00) and yet this error:
> > 
> > aik@fstn1-p1:~$ dtc -f -I dtb -O dts -o dbg.dts dbg.dtb
> > ERROR (name_properties): "name" property in / is incorrect ("/" instead of base node name)
> > Warning: Input tree has errors, output forced
> > ---
> >  lib/libhvcall/libhvcall.h |   3 +-
> >  board-qemu/slof/fdt.fs    | 284 +++++++++++++++++++++++++++++++++++++++++++++-
> >  board-qemu/slof/rtas.fs   |   7 ++
> >  lib/libhvcall/hvcall.code |   5 +
> >  lib/libhvcall/hvcall.in   |   1 +
> >  5 files changed, 297 insertions(+), 3 deletions(-)
> > 
> 
> [...]
> 
> > diff --git a/board-qemu/slof/fdt.fs b/board-qemu/slof/fdt.fs
> > index 851645e..548cc25 100644
> > --- a/board-qemu/slof/fdt.fs
> > +++ b/board-qemu/slof/fdt.fs
> > @@ -27,7 +27,7 @@ struct
> 
> [...]
> 
> > +: fdt-properties ( phandle -- )
> > +    dup encode-int s" phandle" fdt-prop
> > +    >r
> > +    s" "
> > +    BEGIN
> > +        r@ next-property
> > +    WHILE
> > +        2dup
> > +        2dup r@ get-property
> > +        not IF
> > +            2swap fdt-prop
> > +        THEN
> > +    REPEAT
> > +    r>
> > +    drop
> > +;
> 
> 
> Doing as below still works but brings the time for the huge guest (256
> CPUs, 256 E1000) down to 366ms as it does not search for properties every

Ah, much better.  That's a cost I can live with.

> time but just iterates through them, I picked this from ".properties". So
> we are back in business :) But I'd really love someone to explain me how
> that works as I am unable to parse this kind of internals myself... Segher,
> please.
> 
> 
> : fdt-properties ( phandle -- )
>     dup encode-int s" phandle" fdt-prop
> 
>     node>properties @ cell+ @
>     BEGIN
>         dup
>     WHILE
>         dup
> 
>         link>
>         dup >name name>string 2>r
>         execute
>         2r>
>         fdt-prop
> 
>         @
>     REPEAT
> 
>     drop
> ;
> 
>

Segher Boessenkool Oct. 4, 2017, 2:36 p.m. UTC | #8

On Tue, Oct 03, 2017 at 05:28:43PM +1100, David Gibson wrote:
> On Tue, Oct 03, 2017 at 04:15:23PM +1100, Alexey Kardashevskiy wrote:
> > I tested the blob by storing it from QEMU to a file and decompiling it;
> > this produces error which I do not really
> > understand as the name of the root is an empty string (literaly:
> > 00 00 00 01  00 00 00 00) and yet this error:
> > 
> > aik@fstn1-p1:~$ dtc -f -I dtb -O dts -o dbg.dts dbg.dtb
> > ERROR (name_properties): "name" property in / is incorrect ("/" instead of base node name)
> > Warning: Input tree has errors, output forced
> 
> The reason for this error is it's not to do with the "node name" as
> given after the FDT_BEGIN_TAG, but with the "name" property.  That's
> not required in FDT, but if supplied should match that node name
> exactly.  IIRC, however, for human readability runtime OF treats
> 'name' of the root node as being '/'.

SLOF usually sets the "name" property in the root node to "/".  The
standard actually requires "Name of system’s manufacturer and model
number, e.g., “ABC,mat750”", which is also not an empty string (nor
an absent property).

> Arguably that is a bug in dtc - '/' as the name property in the root
> is probably acceptable.  On the other hand, you should probably just
> omit the 'name' properties when you flatten the tree.

Yup, it is one of the differences between FDT and OF device trees.


Segher

Segher Boessenkool Oct. 4, 2017, 4:40 p.m. UTC | #9

On Tue, Oct 03, 2017 at 11:13:51PM +1100, Alexey Kardashevskiy wrote:
> > +: fdt-properties ( phandle -- )
> > +    dup encode-int s" phandle" fdt-prop
> > +    >r
> > +    s" "
> > +    BEGIN
> > +        r@ next-property
> > +    WHILE
> > +        2dup
> > +        2dup r@ get-property
> > +        not IF
> > +            2swap fdt-prop
> > +        THEN
> > +    REPEAT
> > +    r>
> > +    drop
> > +;
> 
> 
> Doing as below still works but brings the time for the huge guest (256
> CPUs, 256 E1000) down to 366ms as it does not search for properties every
> time but just iterates through them, I picked this from ".properties". So
> we are back in business :) But I'd really love someone to explain me how
> that works as I am unable to parse this kind of internals myself... Segher,
> please.

I'll try.

: link>name  cell+ ;
: name>string  char+ count ;

A wordlist is a linked list, offset 0 of each record is a pointer to
the next.  At offset 1 cell is the per-name data: a flags byte, a byte
that is the number of chars in the name, and the actual chars.

Well.  The wordlists are themselves a linked list; offset 0 holds the
address of the next wordlist, and offset 1 cell holds the address of
the first name record.  Like:

STRUCT
  cell FIELD wid>next
  cell FIELD wid>names \ the head of the list of name records
END-STRUCT

STRUCT
  cell FIELD name>next
  char FIELD name>flags
  char FIELD name>count
  0    FIELD name>chars
END-STRUCT


The default (find) implementation (from engine.in):

: ((find))  ( str len head -- 0 | link )
   BEGIN dup WHILE
      >r 2dup r@ link>name name>string string=ci IF \ if found the name:
         2drop r> EXIT THEN \ return a pointer to this name record
      r> @ \ follow the linked list
   REPEAT
   3drop false ; \ nope, not found


> : fdt-properties ( phandle -- )
>     dup encode-int s" phandle" fdt-prop
> 
>     node>properties @ cell+ @

"properties" in a node struct is a wordlist, which is just a pointer
to the first name record in that wordlist.

>     BEGIN
>         dup
>     WHILE
>         dup
> 
>         link>

: link>  link>name name> ;
where name> moves to after the name (cell-aligned):
: name>  char+ dup c@ 1+ chars+ aligned ;
(skip the flags byte, get the char count byte, skip that many bytes
(and the count byte itself), align).

>         dup >name name>string 2>r
>         execute
>         2r>
>         fdt-prop

>name tries to find the name, given an xt.  You do not need it here.

> 
>         @
>     REPEAT
> 
>     drop
> ;

So maybe more like

: fdt-property ( link -- )
   dup link> execute rot link>name name>string fdt-prop ;
: fdt-properties ( phandle -- )
   dup encode-int s" phandle" fdt-prop
   node>properties @ cell+ @ BEGIN dup WHILE dup fdt-property @ REPEAT drop ;

(and maybe better names?  Something that makes clear it is copying
properties to the flat tree).


Segher

Alexey Kardashevskiy Oct. 5, 2017, 3:26 a.m. UTC | #10

On 05/10/17 03:40, Segher Boessenkool wrote:
> On Tue, Oct 03, 2017 at 11:13:51PM +1100, Alexey Kardashevskiy wrote:
>>> +: fdt-properties ( phandle -- )
>>> +    dup encode-int s" phandle" fdt-prop
>>> +    >r
>>> +    s" "
>>> +    BEGIN
>>> +        r@ next-property
>>> +    WHILE
>>> +        2dup
>>> +        2dup r@ get-property
>>> +        not IF
>>> +            2swap fdt-prop
>>> +        THEN
>>> +    REPEAT
>>> +    r>
>>> +    drop
>>> +;
>>
>>
>> Doing as below still works but brings the time for the huge guest (256
>> CPUs, 256 E1000) down to 366ms as it does not search for properties every
>> time but just iterates through them, I picked this from ".properties". So
>> we are back in business :) But I'd really love someone to explain me how
>> that works as I am unable to parse this kind of internals myself... Segher,
>> please.
> 
> I'll try.
> 
> : link>name  cell+ ;
> : name>string  char+ count ;
> 
> A wordlist is a linked list, offset 0 of each record is a pointer to
> the next.  At offset 1 cell is the per-name data: a flags byte, a byte
> that is the number of chars in the name, and the actual chars.
> 
> Well.  The wordlists are themselves a linked list; offset 0 holds the
> address of the next wordlist, and offset 1 cell holds the address of
> the first name record.  Like:
> 
> STRUCT
>   cell FIELD wid>next
>   cell FIELD wid>names \ the head of the list of name records
> END-STRUCT
> 
> STRUCT
>   cell FIELD name>next
>   char FIELD name>flags
>   char FIELD name>count
>   0    FIELD name>chars
> END-STRUCT


These do not exist in the existing SLOF code, you just typed them here, right?

I want to put these into slof/fs/node.fs right after the node struct
definition with some comments about what points to what.


> 
> 
> The default (find) implementation (from engine.in):
> 
> : ((find))  ( str len head -- 0 | link )
>    BEGIN dup WHILE
>       >r 2dup r@ link>name name>string string=ci IF \ if found the name:
>          2drop r> EXIT THEN \ return a pointer to this name record
>       r> @ \ follow the linked list
>    REPEAT
>    3drop false ; \ nope, not found
> 
> 
>> : fdt-properties ( phandle -- )
>>     dup encode-int s" phandle" fdt-prop
>>
>>     node>properties @ cell+ @
> 
> "properties" in a node struct is a wordlist, which is just a pointer
> to the first name record in that wordlist.

			( phandle )
node>properties @	( properties-wordlist )
cell+ @			( properties-names-next )

And properties-names-next points to the first 'name>next' thingy. Ok.

> 
>>     BEGIN
>>         dup
>>     WHILE
>>         dup
>>
>>         link>
> 
> : link>  link>name name> ;

Ahhh. I was searching for 'LINK_X3E' but it is:
col(LINK> LINK>NAME NAME>)

Ok. Another confusion was that 'NAME' here does not point to a string but
to 'name>flags'. Ok.

btw what does 'lfa' stand for in stack comments?


So, after 'link>', the stack is:
			( properties-names-next end-of-name )


And after that end-of-name there is/are what? One 'xt' there returns
data+len, where does that token come from? I assume next one is 'xt' which
prints from 'ls', right? All I am getting it all wrong and there is just
one xt?


> where name> moves to after the name (cell-aligned):
>
> : name>  char+ dup c@ 1+ chars+ aligned ;
> (skip the flags byte, get the char count byte, skip that many bytes
> (and the count byte itself), align).
> 
>>         dup >name name>string 2>r
>>         execute
>>         2r>
>>         fdt-prop
> 
>> name tries to find the name, given an xt.  You do not need it here.


Well, with this new knowledge - sounds like the name is right there between
 properties-names-next and what 'link>' returns so I do not need extra work
to extract it.

'>name' simply goes backwards to the beginning of the name, right (if I am
reading the beginning of slof/fs/base.fs correctly)?

btw '.property' still does search, what is that - room for a little
improvement or I am missing something again?


> 
>>
>>         @
>>     REPEAT
>>
>>     drop
>> ;
> 
> So maybe more like
> 
> : fdt-property ( link -- )
>    dup link> execute rot link>name name>string fdt-prop ;

Without stack comments, I'll forget this in 2 months maximum.

> : fdt-properties ( phandle -- )
>    dup encode-int s" phandle" fdt-prop
>    node>properties @ cell+ @ BEGIN dup WHILE dup fdt-property @ REPEAT drop ;


With this formatting and missing stack comments I will stop understanding
it in a week :(


> (and maybe better names?  Something that makes clear it is copying
> properties to the flat tree).

It does not work like this, if you do not like the name - do not just say
it, suggest a better one ;)



and thanks for the lesson!

Alexey Kardashevskiy Oct. 5, 2017, 7:39 a.m. UTC | #11

On 05/10/17 14:26, Alexey Kardashevskiy wrote:
> On 05/10/17 03:40, Segher Boessenkool wrote:
>> On Tue, Oct 03, 2017 at 11:13:51PM +1100, Alexey Kardashevskiy wrote:
>>>> +: fdt-properties ( phandle -- )
>>>> +    dup encode-int s" phandle" fdt-prop
>>>> +    >r
>>>> +    s" "
>>>> +    BEGIN
>>>> +        r@ next-property
>>>> +    WHILE
>>>> +        2dup
>>>> +        2dup r@ get-property
>>>> +        not IF
>>>> +            2swap fdt-prop
>>>> +        THEN
>>>> +    REPEAT
>>>> +    r>
>>>> +    drop
>>>> +;
>>>
>>>
>>> Doing as below still works but brings the time for the huge guest (256
>>> CPUs, 256 E1000) down to 366ms as it does not search for properties every
>>> time but just iterates through them, I picked this from ".properties". So
>>> we are back in business :) But I'd really love someone to explain me how
>>> that works as I am unable to parse this kind of internals myself... Segher,
>>> please.
>>
>> I'll try.
>>
>> : link>name  cell+ ;
>> : name>string  char+ count ;
>>
>> A wordlist is a linked list, offset 0 of each record is a pointer to
>> the next.  At offset 1 cell is the per-name data: a flags byte, a byte
>> that is the number of chars in the name, and the actual chars.
>>
>> Well.  The wordlists are themselves a linked list; offset 0 holds the
>> address of the next wordlist, and offset 1 cell holds the address of
>> the first name record.  Like:
>>
>> STRUCT
>>   cell FIELD wid>next
>>   cell FIELD wid>names \ the head of the list of name records
>> END-STRUCT
>>
>> STRUCT
>>   cell FIELD name>next
>>   char FIELD name>flags
>>   char FIELD name>count
>>   0    FIELD name>chars
>> END-STRUCT
> 
> 
> These do not exist in the existing SLOF code, you just typed them here, right?
> 
> I want to put these into slof/fs/node.fs right after the node struct
> definition with some comments about what points to what.
> 
> 
>>
>>
>> The default (find) implementation (from engine.in):
>>
>> : ((find))  ( str len head -- 0 | link )
>>    BEGIN dup WHILE
>>       >r 2dup r@ link>name name>string string=ci IF \ if found the name:
>>          2drop r> EXIT THEN \ return a pointer to this name record
>>       r> @ \ follow the linked list
>>    REPEAT
>>    3drop false ; \ nope, not found
>>
>>
>>> : fdt-properties ( phandle -- )
>>>     dup encode-int s" phandle" fdt-prop
>>>
>>>     node>properties @ cell+ @
>>
>> "properties" in a node struct is a wordlist, which is just a pointer
>> to the first name record in that wordlist.
> 
> 			( phandle )
> node>properties @	( properties-wordlist )
> cell+ @			( properties-names-next )
> 
> And properties-names-next points to the first 'name>next' thingy. Ok.
> 
>>
>>>     BEGIN
>>>         dup
>>>     WHILE
>>>         dup
>>>
>>>         link>
>>
>> : link>  link>name name> ;
> 
> Ahhh. I was searching for 'LINK_X3E' but it is:
> col(LINK> LINK>NAME NAME>)
> 
> Ok. Another confusion was that 'NAME' here does not point to a string but
> to 'name>flags'. Ok.
> 
> btw what does 'lfa' stand for in stack comments?
> 
> 
> So, after 'link>', the stack is:
> 			( properties-names-next end-of-name )
> 
> 
> And after that end-of-name there is/are what? One 'xt' there returns
> data+len, where does that token come from? I assume next one is 'xt' which
> prints from 'ls', right? All I am getting it all wrong and there is just
> one xt?
> 
> 
>> where name> moves to after the name (cell-aligned):
>>
>> : name>  char+ dup c@ 1+ chars+ aligned ;
>> (skip the flags byte, get the char count byte, skip that many bytes
>> (and the count byte itself), align).
>>
>>>         dup >name name>string 2>r
>>>         execute
>>>         2r>
>>>         fdt-prop
>>
>>> name tries to find the name, given an xt.  You do not need it here.
> 
> 
> Well, with this new knowledge - sounds like the name is right there between
>  properties-names-next and what 'link>' returns so I do not need extra work
> to extract it.


btw 336ms now (was 351ms), nice. See "[PATCH slof v4 2/3] fdt: Pass the
resulting device tree to QEMU" commit log for all numbers.

Segher Boessenkool Oct. 5, 2017, 12:21 p.m. UTC | #12

On Thu, Oct 05, 2017 at 06:39:03PM +1100, Alexey Kardashevskiy wrote:
> btw 336ms now (was 351ms), nice. See "[PATCH slof v4 2/3] fdt: Pass the
> resulting device tree to QEMU" commit log for all numbers.

Nice indeed :-)

If you want further speedups, a juicy target is improving how the
wordlists are searched, since that will help everything.

Segher

Alexey Kardashevskiy Oct. 6, 2017, 1:40 a.m. UTC | #13

On 05/10/17 23:21, Segher Boessenkool wrote:
> On Thu, Oct 05, 2017 at 06:39:03PM +1100, Alexey Kardashevskiy wrote:
>> btw 336ms now (was 351ms), nice. See "[PATCH slof v4 2/3] fdt: Pass the
>> resulting device tree to QEMU" commit log for all numbers.
> 
> Nice indeed :-)
> 
> If you want further speedups, a juicy target is improving how the
> wordlists are searched, since that will help everything.

I was pretty happy with 1.7sec really :) With fdt, I would stop now and
probably come back later to look at wordlists when/if you answer my
questions 3 mails above in this thread.

And is there much to improve just search? I'd think the linked list needs
to be replaced with something sorted so it is a different data structure.

Segher Boessenkool Oct. 6, 2017, 5:11 p.m. UTC | #14

Hi Alexey,

On Fri, Oct 06, 2017 at 12:40:49PM +1100, Alexey Kardashevskiy wrote:
> On 05/10/17 23:21, Segher Boessenkool wrote:
> > On Thu, Oct 05, 2017 at 06:39:03PM +1100, Alexey Kardashevskiy wrote:
> >> btw 336ms now (was 351ms), nice. See "[PATCH slof v4 2/3] fdt: Pass the
> >> resulting device tree to QEMU" commit log for all numbers.
> > 
> > Nice indeed :-)
> > 
> > If you want further speedups, a juicy target is improving how the
> > wordlists are searched, since that will help everything.
> 
> I was pretty happy with 1.7sec really :) With fdt, I would stop now and
> probably come back later to look at wordlists when/if you answer my
> questions 3 mails above in this thread.

Good plan :-)  (I hope I can find that mail, if not, just ping me?)

> And is there much to improve just search? I'd think the linked list needs
> to be replaced with something sorted so it is a different data structure.

You still need something like a linked list because there can be two
words in the same wordlist with the same name, and you can still find
both in some ways, but the normal ways always should find the newer.

The current scheme to accelerate lookup is very simplistic: it takes a
hash of the word name, and has a cache indexed by that.  The cache has
just one way (no chaining, no nothing), so two conflicting names will
keep fighting for the slot.  If something is not found in the cache,
the slow way (walked the linked lists) is used.

The whole cache is blown away whenever the search order is changed,
etc., too.

As you see it could all be a lot faster (at the cost of complexity).
It always was fast enough ;-)

Some ways things could be improved:
* Keep caches per word list, not just one globally.
* Actually reorder the word lists when a word is found, i.e. pull a
found word closer to the word list head; but you also need to maintain
the original order some way, for words like MARKER or FORGET.

Segher

Alexey Kardashevskiy Oct. 9, 2017, 2:06 a.m. UTC | #15

This is the mail I wanted you to comment on. Thanks!


On 05/10/17 14:26, Alexey Kardashevskiy wrote:
> On 05/10/17 03:40, Segher Boessenkool wrote:
>> On Tue, Oct 03, 2017 at 11:13:51PM +1100, Alexey Kardashevskiy wrote:
>>>> +: fdt-properties ( phandle -- )
>>>> +    dup encode-int s" phandle" fdt-prop
>>>> +    >r
>>>> +    s" "
>>>> +    BEGIN
>>>> +        r@ next-property
>>>> +    WHILE
>>>> +        2dup
>>>> +        2dup r@ get-property
>>>> +        not IF
>>>> +            2swap fdt-prop
>>>> +        THEN
>>>> +    REPEAT
>>>> +    r>
>>>> +    drop
>>>> +;
>>>
>>>
>>> Doing as below still works but brings the time for the huge guest (256
>>> CPUs, 256 E1000) down to 366ms as it does not search for properties every
>>> time but just iterates through them, I picked this from ".properties". So
>>> we are back in business :) But I'd really love someone to explain me how
>>> that works as I am unable to parse this kind of internals myself... Segher,
>>> please.
>>
>> I'll try.
>>
>> : link>name  cell+ ;
>> : name>string  char+ count ;
>>
>> A wordlist is a linked list, offset 0 of each record is a pointer to
>> the next.  At offset 1 cell is the per-name data: a flags byte, a byte
>> that is the number of chars in the name, and the actual chars.
>>
>> Well.  The wordlists are themselves a linked list; offset 0 holds the
>> address of the next wordlist, and offset 1 cell holds the address of
>> the first name record.  Like:
>>
>> STRUCT
>>   cell FIELD wid>next
>>   cell FIELD wid>names \ the head of the list of name records
>> END-STRUCT
>>
>> STRUCT
>>   cell FIELD name>next
>>   char FIELD name>flags
>>   char FIELD name>count
>>   0    FIELD name>chars
>> END-STRUCT
> 
> 
> These do not exist in the existing SLOF code, you just typed them here, right?
> 
> I want to put these into slof/fs/node.fs right after the node struct
> definition with some comments about what points to what.
> 
> 
>>
>>
>> The default (find) implementation (from engine.in):
>>
>> : ((find))  ( str len head -- 0 | link )
>>    BEGIN dup WHILE
>>       >r 2dup r@ link>name name>string string=ci IF \ if found the name:
>>          2drop r> EXIT THEN \ return a pointer to this name record
>>       r> @ \ follow the linked list
>>    REPEAT
>>    3drop false ; \ nope, not found
>>
>>
>>> : fdt-properties ( phandle -- )
>>>     dup encode-int s" phandle" fdt-prop
>>>
>>>     node>properties @ cell+ @
>>
>> "properties" in a node struct is a wordlist, which is just a pointer
>> to the first name record in that wordlist.
> 
> 			( phandle )
> node>properties @	( properties-wordlist )
> cell+ @			( properties-names-next )
> 
> And properties-names-next points to the first 'name>next' thingy. Ok.
> 
>>
>>>     BEGIN
>>>         dup
>>>     WHILE
>>>         dup
>>>
>>>         link>
>>
>> : link>  link>name name> ;
> 
> Ahhh. I was searching for 'LINK_X3E' but it is:
> col(LINK> LINK>NAME NAME>)
> 
> Ok. Another confusion was that 'NAME' here does not point to a string but
> to 'name>flags'. Ok.
> 
> btw what does 'lfa' stand for in stack comments?
> 
> 
> So, after 'link>', the stack is:
> 			( properties-names-next end-of-name )
> 
> 
> And after that end-of-name there is/are what? One 'xt' there returns
> data+len, where does that token come from? I assume next one is 'xt' which
> prints from 'ls', right? All I am getting it all wrong and there is just
> one xt?
> 
> 
>> where name> moves to after the name (cell-aligned):
>>
>> : name>  char+ dup c@ 1+ chars+ aligned ;
>> (skip the flags byte, get the char count byte, skip that many bytes
>> (and the count byte itself), align).
>>
>>>         dup >name name>string 2>r
>>>         execute
>>>         2r>
>>>         fdt-prop
>>
>>> name tries to find the name, given an xt.  You do not need it here.
> 
> 
> Well, with this new knowledge - sounds like the name is right there between
>  properties-names-next and what 'link>' returns so I do not need extra work
> to extract it.
> 
> '>name' simply goes backwards to the beginning of the name, right (if I am
> reading the beginning of slof/fs/base.fs correctly)?
> 
> btw '.property' still does search, what is that - room for a little
> improvement or I am missing something again?
> 
> 
>>
>>>
>>>         @
>>>     REPEAT
>>>
>>>     drop
>>> ;
>>
>> So maybe more like
>>
>> : fdt-property ( link -- )
>>    dup link> execute rot link>name name>string fdt-prop ;
> 
> Without stack comments, I'll forget this in 2 months maximum.
> 
>> : fdt-properties ( phandle -- )
>>    dup encode-int s" phandle" fdt-prop
>>    node>properties @ cell+ @ BEGIN dup WHILE dup fdt-property @ REPEAT drop ;
> 
> 
> With this formatting and missing stack comments I will stop understanding
> it in a week :(
> 
> 
>> (and maybe better names?  Something that makes clear it is copying
>> properties to the flat tree).
> 
> It does not work like this, if you do not like the name - do not just say
> it, suggest a better one ;)
> 
> 
> 
> and thanks for the lesson!
> 
>

Segher Boessenkool Oct. 9, 2017, 5:28 p.m. UTC | #16

On Mon, Oct 09, 2017 at 01:06:08PM +1100, Alexey Kardashevskiy wrote:
> This is the mail I wanted you to comment on. Thanks!

Ah ok, thanks.

> >> Well.  The wordlists are themselves a linked list; offset 0 holds the
> >> address of the next wordlist, and offset 1 cell holds the address of
> >> the first name record.  Like:
> >>
> >> STRUCT
> >>   cell FIELD wid>next
> >>   cell FIELD wid>names \ the head of the list of name records
> >> END-STRUCT
> >>
> >> STRUCT
> >>   cell FIELD name>next
> >>   char FIELD name>flags
> >>   char FIELD name>count
> >>   0    FIELD name>chars
> >> END-STRUCT
> > 
> > 
> > These do not exist in the existing SLOF code, you just typed them here, right?

Yes, this is just a description of what the core code (engine.in) does.
You do not actually want these words, certainly not in node.fs .

For example, "whatever>next" is just a no-op: writing it as no code at
all is faster, this is quite hot code.

> >> : link>  link>name name> ;
> > 
> > Ahhh. I was searching for 'LINK_X3E' but it is:
> > col(LINK> LINK>NAME NAME>)
> > 
> > Ok. Another confusion was that 'NAME' here does not point to a string but
> > to 'name>flags'. Ok.

Yeah, engine.in is mostly like Forth syntax (ignoring all the immediate
fields, and the words headers ("col" etc.)

> > btw what does 'lfa' stand for in stack comments?

lfa is "link field address".  nfa is "name field address".  xt is
"execution token".  SLOF uses a very traditional dictionary structure.
Each word is laid out like this:

lfa:	cell, a pointer to the previous word in this list.
nfa:	at least one cell; a flags field, and the name of the word as
	a counted string, and padding to the next word boundary.
xt:	the "body" of the word, starting with the code field, and then
	for e.g. colon words a list of (mostly) xts, etc.

> > So, after 'link>', the stack is:
> > 			( properties-names-next end-of-name )

: link> ( lfa -- xt )

> > '>name' simply goes backwards to the beginning of the name, right (if I am
> > reading the beginning of slof/fs/base.fs correctly)?

Yes, but it guesses: it searches back to a byte that could be the
correct length byte.  It is also slow.  You do not want to use it
(unless you have to, to get a name when only knowing the xt: it's
really only for debugging things).

> > btw '.property' still does search, what is that - room for a little
> > improvement or I am missing something again?

Probably, yes.  Seems it could just use   link>name   instead of
link> >name   .
Well, it is   link> dup >name name>string   so it could do
link>name dup name> swap name>string   or something like that.

> >> : fdt-property ( link -- )
> >>    dup link> execute rot link>name name>string fdt-prop ;
> > 
> > Without stack comments, I'll forget this in 2 months maximum.

(There is a stack comment :-P )

I don't write "what is on the stack" after every word, certainly not
in short phrases, where it should be obvious.

I suspect your problem reading this is that you do not know the stack
signatures of link> etc. by heart.

> >> : fdt-properties ( phandle -- )
> >>    dup encode-int s" phandle" fdt-prop
> >>    node>properties @ cell+ @ BEGIN dup WHILE dup fdt-property @ REPEAT drop ;
> > 
> > 
> > With this formatting and missing stack comments I will stop understanding
> > it in a week :(

A bit better factored then?

: (fdt-properties) ( lfa -- )
   BEGIN dup WHILE dup fdt-property @ REPEAT drop ;
: fdt-properties ( phandle -- )
   dup encode-int s" phandle" fdt-prop
   node>properties @ cell+ @ (fdt-properties) ;

> >> (and maybe better names?  Something that makes clear it is copying
> >> properties to the flat tree).
> > 
> > It does not work like this, if you do not like the name - do not just say
> > it, suggest a better one ;)

It certainly does work like that!  Coming up with good names is the
hardest part of programming (and a very important part).

Maybe just names like "make-fdt-properties"?

Cheers, HTH,

Segher

Alexey Kardashevskiy Oct. 10, 2017, 2:23 a.m. UTC | #17

On 10/10/17 04:28, Segher Boessenkool wrote:
> On Mon, Oct 09, 2017 at 01:06:08PM +1100, Alexey Kardashevskiy wrote:
>> This is the mail I wanted you to comment on. Thanks!
> 
> Ah ok, thanks.
> 
>>>> Well.  The wordlists are themselves a linked list; offset 0 holds the
>>>> address of the next wordlist, and offset 1 cell holds the address of
>>>> the first name record.  Like:
>>>>
>>>> STRUCT
>>>>   cell FIELD wid>next
>>>>   cell FIELD wid>names \ the head of the list of name records
>>>> END-STRUCT
>>>>
>>>> STRUCT
>>>>   cell FIELD name>next
>>>>   char FIELD name>flags
>>>>   char FIELD name>count
>>>>   0    FIELD name>chars
>>>> END-STRUCT
>>>
>>>
>>> These do not exist in the existing SLOF code, you just typed them here, right?
> 
> Yes, this is just a description of what the core code (engine.in) does.
> You do not actually want these words, certainly not in node.fs .

Even declaring these is bad? Elsewhere you suggested:

: fdt-properties ( phandle -- )
   dup encode-int s" phandle" fdt-prop
   node>properties @ cell+ @ BEGIN dup WHILE dup fdt-property @ REPEAT drop ;

How bad is replacing an absolutely magic 'cell+' with a kinda documented
'wid>names'?


> 
> For example, "whatever>next" is just a no-op: writing it as no code at
> all is faster, this is quite hot code.

Sure, wid>next and name>next are useless in the code but they still serve a
documentation purpose, and other struct fields are no no-op.


> 
>>>> : link>  link>name name> ;
>>>
>>> Ahhh. I was searching for 'LINK_X3E' but it is:
>>> col(LINK> LINK>NAME NAME>)
>>>
>>> Ok. Another confusion was that 'NAME' here does not point to a string but
>>> to 'name>flags'. Ok.
> 
> Yeah, engine.in is mostly like Forth syntax (ignoring all the immediate
> fields, and the words headers ("col" etc.)
> 
>>> btw what does 'lfa' stand for in stack comments?
> 
> lfa is "link field address".  nfa is "name field address".  xt is
> "execution token".  SLOF uses a very traditional dictionary structure.
> Each word is laid out like this:
> 
> lfa:	cell, a pointer to the previous word in this list.
> nfa:	at least one cell; a flags field, and the name of the word as
> 	a counted string, and padding to the next word boundary.

So "nfa" is name>next + name>flags + name>count + name>chars + padding. Ok.


> xt:	the "body" of the word, starting with the code field, and then
> 	for e.g. colon words a list of (mostly) xts, etc.


So it is a wordlist where the very first word has the node's name and its
xt returns the raw date (pointer + len), and other words do some other
stuff (.print-xxxx?), is that correct? Where is that very first word
compiled to the xt which returns the raw data?




> 
>>> So, after 'link>', the stack is:
>>> 			( properties-names-next end-of-name )
> 
> : link> ( lfa -- xt )
> 
>>> '>name' simply goes backwards to the beginning of the name, right (if I am
>>> reading the beginning of slof/fs/base.fs correctly)?
> 
> Yes, but it guesses: it searches back to a byte that could be the
> correct length byte.  It is also slow.  You do not want to use it
> (unless you have to, to get a name when only knowing the xt: it's
> really only for debugging things).
> 
>>> btw '.property' still does search, what is that - room for a little
>>> improvement or I am missing something again?
> 
> Probably, yes.  Seems it could just use   link>name   instead of
> link> >name   .
> Well, it is   link> dup >name name>string   so it could do
> link>name dup name> swap name>string   or something like that.

Aha! :)


> 
>>>> : fdt-property ( link -- )
>>>>    dup link> execute rot link>name name>string fdt-prop ;
>>>
>>> Without stack comments, I'll forget this in 2 months maximum.
> 
> (There is a stack comment :-P )
> 
> I don't write "what is on the stack" after every word, certainly not
> in short phrases, where it should be obvious.
> 
> I suspect your problem reading this is that you do not know the stack
> signatures of link> etc. by heart.


Exactly :(


>>>> : fdt-properties ( phandle -- )
>>>>    dup encode-int s" phandle" fdt-prop
>>>>    node>properties @ cell+ @ BEGIN dup WHILE dup fdt-property @ REPEAT drop ;
>>>
>>>
>>> With this formatting and missing stack comments I will stop understanding
>>> it in a week :(
> 
> A bit better factored then?
> 
> : (fdt-properties) ( lfa -- )
>    BEGIN dup WHILE dup fdt-property @ REPEAT drop ;
> : fdt-properties ( phandle -- )
>    dup encode-int s" phandle" fdt-prop
>    node>properties @ cell+ @ (fdt-properties) ;

May be... I'll do this but I am still struggling with a single line loops.
Why exactly is this style bad?

: fdt-add-properties ( phandle -- )
    dup encode-int s" phandle" fdt-prop

    node>properties @ wid>names @
    BEGIN
        dup
    WHILE
        dup fdt-add-property
        @
    REPEAT
    drop
;


> 
>>>> (and maybe better names?  Something that makes clear it is copying
>>>> properties to the flat tree).
>>>
>>> It does not work like this, if you do not like the name - do not just say
>>> it, suggest a better one ;)
> 
> It certainly does work like that!  Coming up with good names is the
> hardest part of programming (and a very important part).

You have a better idea what is a good forth name ;)

> Maybe just names like "make-fdt-properties"?


I have now fdt-add-properties and fdt-add-property, too bad?


> 
> Cheers, HTH,
> 
> 
> Segher
>

Alexey Kardashevskiy Oct. 13, 2017, 8:26 a.m. UTC | #18

On 04/10/17 12:02, David Gibson wrote:
> On Tue, Oct 03, 2017 at 11:13:51PM +1100, Alexey Kardashevskiy wrote:
>> On 03/10/17 16:15, Alexey Kardashevskiy wrote:
>>> This creates flatten device tree and passes it to QEMU via a custom
>>> hypercall right before jumping to RTAS.
>>>
>>> On a machine with 256 CPUs and 256 virtual Intel E1000 devices the blob
>>> is 360KB (356KB structs and 20KB of strings), building such a tree takes
>>> ~2s on a POWER8 box. A simple tree with 1 CPU and a couple of devices
>>> takes 38ms and creates 16KB blob.
>>>
>>> This preloads strings with 40 property names from CPU and PCI device nodes
>>> and the strings lookup only searches within these. Without string reusing
>>> at all, the strings blob is 200KB and rendering time is 1.7sec; with
>>> unlimited reusing, the strings blob is 4KB and rendering time is 2.8sec.
>>>
>>> While we are here, fix the version handling in fdt-init. It only matters
>>> a little for the fdt-debug==1 case though.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---

[...]

Adding Ben to cc:.

>>
>> Doing as below still works but brings the time for the huge guest (256
>> CPUs, 256 E1000) down to 366ms as it does not search for properties every
> 
> Ah, much better.  That's a cost I can live with.


I did some more debugging. There is a noticeable delay when booting such a
big guest, I cannot really measure prom_init() as the "jiffies" symbol is
not available there but a stopwatch shows 8.5sec in the guest's
flatten_device_tree() which fetches the entire device tree via the client
interface, node-by-node, property-by-property. With two small patches
(below, for the guest kernel and for SLOF) the delay is not noticeable any
more.

The questions are:

1. are we interested in such optimization?

2. do I have to add lunux,phandle as well, or just "phandle" is enough (it
is now for my tests)

3. what should allocate the fdt - the guest or slof? In my draft - it is
the guest as it simply reserves 1MB for this and that's it

4. reserved map - SLOF puts a zero and my draft adds the real map to the
end - is that hacky or ok?







diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 613f79f03877..6102eb682c71 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -2466,6 +2466,19 @@ static void __init flatten_device_tree(void)
                prom_panic("Can't allocate initial device-tree chunk\n");
        mem_end = mem_start + room;

+       if (!call_prom_ret("fdt-fetch", 2, 1, NULL, mem_start, room)) {
+               hdr = (void *) mem_start;
+               dt_header_start = (unsigned long)hdr;
+               hdr->boot_cpuid_phys = cpu_to_be32(prom.cpu);
+               hdr->off_mem_rsvmap = hdr->totalsize;
+               rsvmap = (void*)hdr + be32_to_cpu(hdr->totalsize);
+               hdr->totalsize = be32_to_cpu(hdr->totalsize) +
sizeof(mem_reserve_map);
+               hdr->totalsize = cpu_to_be32(hdr->totalsize);
+               memcpy(rsvmap, mem_reserve_map, sizeof(mem_reserve_map));
+               mem_reserve_cnt = MEM_RESERVE_MAP_SIZE;
+               return;
+       }
+
        /* Get root of tree */
        root = call_prom("peer", 1, 1, (phandle)0);
        if (root == (phandle)0)




diff --git a/slof/fs/client.fs b/slof/fs/client.fs
index 7d537a6..67976fb 100644
--- a/slof/fs/client.fs
+++ b/slof/fs/client.fs
@@ -308,4 +308,18 @@ ALSO client-voc DEFINITIONS
 : set-callback ( newfunc -- oldfunc )
   client-callback @ swap client-callback ! ;

+\ Curtom method to get FDT blob
+: fdt-fetch ( buf len -- 0 )
+    " /" find-node fdt-flatten-tree ( buf len dtb )
+    dup >r
+    >fdth_tsize l@              ( buf len size r: dtb )
+    min                         ( buf minsize r: dtb )
+    r@ -rot
+    move
+
+    r> fdt-flatten-tree-free
+
+    0
+;
+

Segher Boessenkool Oct. 13, 2017, 12:45 p.m. UTC | #19

On Fri, Oct 13, 2017 at 07:26:02PM +1100, Alexey Kardashevskiy wrote:
> diff --git a/slof/fs/client.fs b/slof/fs/client.fs
> index 7d537a6..67976fb 100644
> --- a/slof/fs/client.fs
> +++ b/slof/fs/client.fs
> @@ -308,4 +308,18 @@ ALSO client-voc DEFINITIONS
>  : set-callback ( newfunc -- oldfunc )
>    client-callback @ swap client-callback ! ;
> 
> +\ Curtom method to get FDT blob

s/r/s/

> +: fdt-fetch ( buf len -- 0 )
> +    " /" find-node fdt-flatten-tree ( buf len dtb )

" /" find-node   is   device-tree @

> +    dup >r
> +    >fdth_tsize l@              ( buf len size r: dtb )
> +    min                         ( buf minsize r: dtb )
> +    r@ -rot
> +    move
> +
> +    r> fdt-flatten-tree-free
> +
> +    0
> +;

That "min" isn't correct I think -- if the DTB doesn't fit, you want to
complain loudly and/or return an error to the caller.


Segher

Alexey Kardashevskiy Oct. 14, 2017, 12:24 a.m. UTC | #20

On 13/10/17 23:45, Segher Boessenkool wrote:
> On Fri, Oct 13, 2017 at 07:26:02PM +1100, Alexey Kardashevskiy wrote:
>> diff --git a/slof/fs/client.fs b/slof/fs/client.fs
>> index 7d537a6..67976fb 100644
>> --- a/slof/fs/client.fs
>> +++ b/slof/fs/client.fs
>> @@ -308,4 +308,18 @@ ALSO client-voc DEFINITIONS
>>  : set-callback ( newfunc -- oldfunc )
>>    client-callback @ swap client-callback ! ;
>>
>> +\ Curtom method to get FDT blob
> 
> s/r/s/
> 
>> +: fdt-fetch ( buf len -- 0 )
>> +    " /" find-node fdt-flatten-tree ( buf len dtb )
> 
> " /" find-node   is   device-tree @

Hm. Lovely :)


>> +    dup >r
>> +    >fdth_tsize l@              ( buf len size r: dtb )
>> +    min                         ( buf minsize r: dtb )
>> +    r@ -rot
>> +    move
>> +
>> +    r> fdt-flatten-tree-free
>> +
>> +    0
>> +;
> 
> That "min" isn't correct I think -- if the DTB doesn't fit, you want to
> complain loudly and/or return an error to the caller.

Well, this is not a real patch, this is an RFC to hear about the idea in
general. I'll add a complain here if we decide it is worth doing.

Segher Boessenkool Oct. 14, 2017, 12:34 a.m. UTC | #21

Hi Alexey,

On Sat, Oct 14, 2017 at 11:24:05AM +1100, Alexey Kardashevskiy wrote:
> >> +    dup >r
> >> +    >fdth_tsize l@              ( buf len size r: dtb )
> >> +    min                         ( buf minsize r: dtb )
> >> +    r@ -rot
> >> +    move
> >> +
> >> +    r> fdt-flatten-tree-free
> >> +
> >> +    0
> >> +;
> > 
> > That "min" isn't correct I think -- if the DTB doesn't fit, you want to
> > complain loudly and/or return an error to the caller.
> 
> Well, this is not a real patch, this is an RFC to hear about the idea in
> general. I'll add a complain here if we decide it is worth doing.

Returning an error is more important (and probably enough, the kernel is
in control at this point).


Segher

Segher Boessenkool Oct. 15, 2017, 12:19 p.m. UTC | #22

On Tue, Oct 10, 2017 at 01:23:15PM +1100, Alexey Kardashevskiy wrote:
> On 10/10/17 04:28, Segher Boessenkool wrote:
> >>>> STRUCT
> >>>>   cell FIELD wid>next
> >>>>   cell FIELD wid>names \ the head of the list of name records
> >>>> END-STRUCT
> >>>>
> >>>> STRUCT
> >>>>   cell FIELD name>next
> >>>>   char FIELD name>flags
> >>>>   char FIELD name>count
> >>>>   0    FIELD name>chars
> >>>> END-STRUCT
> >>>
> >>>
> >>> These do not exist in the existing SLOF code, you just typed them here, right?
> > 
> > Yes, this is just a description of what the core code (engine.in) does.
> > You do not actually want these words, certainly not in node.fs .
> 
> Even declaring these is bad?

The name lookup is *the* slowest part of the engine, you do not want to
make it slower for no good reason.

The word fields are not a fixed-length struct, so you cannot describe
that with a STRUCT (and there already are NAME>LINK LINK> NAME> words,
and no code outside of the engine has any business accessing flags,
count, chars directly).

It's fine to document this of course.

If other code wants to do unusual things to wordlists we really should
abstract that to some other words, not "manually" manipulate memory
(which you will still do with the "struct" definitions).  Yup I'm to
blame for the first occurrence of this here ;-)

> Elsewhere you suggested:
>
> : fdt-properties ( phandle -- )
>    dup encode-int s" phandle" fdt-prop
>    node>properties @ cell+ @ BEGIN dup WHILE dup fdt-property @ REPEAT drop ;
> 
> How bad is replacing an absolutely magic 'cell+' with a kinda documented
> 'wid>names'?

It's not really good either ;-)

: for-all-words ( wid xt -- ) \ xt has sig ( lfa -- )
   dup >r cell+ @ BEGIN dup WHILE dup  r@ execute  @ REPEAT  r> drop ;
: fdt-copy-properties ( phandle -- )
   dup encode-int s" phandle" fdt-prop
   node>properties @ ['] fdt-copy-property for-all-words ;

Maybe it's nicer to use the nfa... dunno.

> > lfa is "link field address".  nfa is "name field address".  xt is
> > "execution token".  SLOF uses a very traditional dictionary structure.
> > Each word is laid out like this:
> > 
> > lfa:	cell, a pointer to the previous word in this list.
> > nfa:	at least one cell; a flags field, and the name of the word as
> > 	a counted string, and padding to the next word boundary.
> 
> So "nfa" is name>next + name>flags + name>count + name>chars + padding. Ok.

\ link>name ( lfa -- nfa )
\ name> ( nfa -- xt )
\ link> ( lfa -- xt )

The name field is the flags, the count, the chars, and padding to cell size.
The nfa is the address of the name field, so the address of the flags.

> > xt:	the "body" of the word, starting with the code field, and then
> > 	for e.g. colon words a list of (mostly) xts, etc.
> 
> 
> So it is a wordlist

Nope, a wordlist is something else.  I shouldn't have said list: it's
just a bunch of xt's one after another in memory.  Like an array.

> where the very first word has the node's name and its
> xt returns the raw date (pointer + len), and other words do some other
> stuff (.print-xxxx?), is that correct? Where is that very first word
> compiled to the xt which returns the raw data?

A colon def like:
: over  2dup drop ;  \ a silly way to implement over of course :-)
is laid out in memory like (assuming cell size 8):

link                    \ to the previous word in this word list
00 04 6f 76 65 72 00 00 \ flags=0, "over" as counted string, padding
' DOCOL                 \ the address of docol; "nest"
' 2dup
' drop
' EXIT                  \ or SEMIS or whatever is the end of a colon
                        \ word these days; "unnest"

> > A bit better factored then?
> > 
> > : (fdt-properties) ( lfa -- )
> >    BEGIN dup WHILE dup fdt-property @ REPEAT drop ;
> > : fdt-properties ( phandle -- )
> >    dup encode-int s" phandle" fdt-prop
> >    node>properties @ cell+ @ (fdt-properties) ;
> 
> May be... I'll do this but I am still struggling with a single line loops.
> Why exactly is this style bad?
> 
> : fdt-add-properties ( phandle -- )
>     dup encode-int s" phandle" fdt-prop
> 
>     node>properties @ wid>names @
>     BEGIN
>         dup
>     WHILE
>         dup fdt-add-property
>         @
>     REPEAT
>     drop
> ;

Because that won't even fit more than two definitions on a screen.  Rigid
identation practices like that also make it impossible to use spacing to
indicate where phrases end.  And, importantly, the only reason you would
want indents is if you are nesting too deep (i.e. at all).

Most definitions should fit on a single line!

> >>>> (and maybe better names?  Something that makes clear it is copying
> >>>> properties to the flat tree).
> >>>
> >>> It does not work like this, if you do not like the name - do not just say
> >>> it, suggest a better one ;)
> > 
> > It certainly does work like that!  Coming up with good names is the
> > hardest part of programming (and a very important part).
> 
> You have a better idea what is a good forth name ;)

If a name makes clear what the word does, it is a good name, especially
if it is a nice short name as well.  If it is hard to make a nice short
name that describes exactly what the word does, chances are the word
does too much.

> I have now fdt-add-properties and fdt-add-property, too bad?

I think "copy" is better than "add"?


Segher

[v3,5/5] fdt: Pass the resulting device tree to QEMU

Commit Message

Comments

Patch