Message ID | 20240202051548.877087-1-kconsul@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
Series | [v2] slof/engine.in: refine +COMP and -COMP by not using COMPILE | expand |
Hi Segher/Thomas, On 2024-02-02 00:15:48, Kautuk Consul wrote: > Use the standard "DOTICK <word> COMPILE," mechanism in +COMP and -COMP > as is being used by the rest of the compiler. > Also use "SEMICOLON" instead of "EXIT" to compile into HERE in -COMP > as that is more informative as it mirrors the way the col() macro defines > a colon definition. > > Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com> > --- > slof/engine.in | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/slof/engine.in b/slof/engine.in > index 59e82f1..fa4d82e 100644 > --- a/slof/engine.in > +++ b/slof/engine.in > @@ -422,8 +422,8 @@ imm(.( LIT(')') PARSE TYPE) > col(COMPILE R> CELL+ DUP @ COMPILE, >R) > > var(THERE 0) > -col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) > -col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE) > +col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE DOTICK DOCOL COMPILE,) > +col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT DOTICK SEMICOLON COMPILE, THERE @ DOTO HERE COMP-BUFFER EXECUTE) What do you think about the above changes ? Are there any more changes I could do to this patch ? Or if you want to reject can you tell me why exactly ? > > // Structure words. > col(RESOLVE-ORIG HERE OVER CELL+ - SWAP !) > -- > 2.31.1 >
Hi Segher/Thomas, On 2024-02-02 00:15:48, Kautuk Consul wrote: > Use the standard "DOTICK <word> COMPILE," mechanism in +COMP and -COMP > as is being used by the rest of the compiler. > Also use "SEMICOLON" instead of "EXIT" to compile into HERE in -COMP > as that is more informative as it mirrors the way the col() macro defines > a colon definition. > > Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com> > --- > slof/engine.in | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/slof/engine.in b/slof/engine.in > index 59e82f1..fa4d82e 100644 > --- a/slof/engine.in > +++ b/slof/engine.in > @@ -422,8 +422,8 @@ imm(.( LIT(')') PARSE TYPE) > col(COMPILE R> CELL+ DUP @ COMPILE, >R) > > var(THERE 0) > -col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) > -col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE) > +col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE DOTICK DOCOL COMPILE,) > +col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT DOTICK SEMICOLON COMPILE, THERE @ DOTO HERE COMP-BUFFER EXECUTE) > Did you get time to review this simple patch ? Is there something wrong in it or the description ? > // Structure words. > col(RESOLVE-ORIG HERE OVER CELL+ - SWAP !) > -- > 2.31.1 >
On 22/02/2024 08.38, Kautuk Consul wrote: > Hi Segher/Thomas, > > On 2024-02-02 00:15:48, Kautuk Consul wrote: >> Use the standard "DOTICK <word> COMPILE," mechanism in +COMP and -COMP >> as is being used by the rest of the compiler. >> Also use "SEMICOLON" instead of "EXIT" to compile into HERE in -COMP >> as that is more informative as it mirrors the way the col() macro defines >> a colon definition. >> >> Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com> >> --- >> slof/engine.in | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/slof/engine.in b/slof/engine.in >> index 59e82f1..fa4d82e 100644 >> --- a/slof/engine.in >> +++ b/slof/engine.in >> @@ -422,8 +422,8 @@ imm(.( LIT(')') PARSE TYPE) >> col(COMPILE R> CELL+ DUP @ COMPILE, >R) >> >> var(THERE 0) >> -col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) >> -col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE) >> +col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE DOTICK DOCOL COMPILE,) >> +col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT DOTICK SEMICOLON COMPILE, THERE @ DOTO HERE COMP-BUFFER EXECUTE) >> > Did you get time to review this simple patch ? > Is there something wrong in it or the description ? Hi Kautuk, could you maybe do some performance checks to see whether this make a difference (e.g. by running the command in a tight loop many times)? You can use "tb@" to get the current value of the timebase counter, so reading that before and after the loop should provide you with a way of measuring the required time. Thomas
Hi Thomas, > > Hi Kautuk, > > could you maybe do some performance checks to see whether this make a > difference (e.g. by running the command in a tight loop many times)? > You can use "tb@" to get the current value of the timebase counter, so > reading that before and after the loop should provide you with a way of > measuring the required time. > > Thomas > This patch is to improve compilation timings of the IF/AHEAD/THEN/CASE/ENDCASE/BEGIN/AGAIN/UNTIL/DO/?DO/LOOP/+LOOP/ Forth words that are NOT within any Forth procedure. And it does this in the same way for all of these Forth words because all of these words simply utilize the +COMP and -COMP words. I created a patch on top of this patch file that introduces the older implementation of IF and THEN and I called them IF2 and THEN2 as follows: col(+COMP-BEFORE STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) col(-COMP-BEFORE -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE SEMICOLON THERE @ DOTO HERE COMP-BUFFER EXECUTE) imm(IF2 +COMP-BEFORE DOTICK DO0BRANCH COMPILE, HERE 0 COMPILE,) imm(THEN2 ?COMP RESOLVE-ORIG -COMP-BEFORE) The IF2 and THEN2 use -COMP-BEFORE and +COMP-BEFORE in order to have the changes before I applied my "[PATCH v2] slof/engine.in: refine +COMP and -COMP by not using" patch file. Now that I have both implementation, I used the timebase in order to test what is the difference in timebase before and after invocation of numerous IF-THEN and IF2-THEN2 Forth words. I made the following changes to ./board-qemu/slof/OF.fs: diff --git a/board-qemu/slof/OF.fs b/board-qemu/slof/OF.fs index df33c80..56805fc 100644 --- a/board-qemu/slof/OF.fs +++ b/board-qemu/slof/OF.fs @@ -22,6 +22,7 @@ hex #include "base.fs" + \ Set default load-base to 0x4000 4000 to default-load-base @@ -329,6 +330,151 @@ check-boot-from-ram 8ff cp +." BEFORE-PATCH: BEFORE TB is: " tb@ . +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +cr ." BEFORE-PATCH: AFTER TB is: " tb@ . cr + +." AFTER-PATCH: BEFORE TB is: " tb@ . +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +cr ." AFTER-PATCH: AFTER TB is: " tb@ . cr + +." AFTER-PATCH: BEFORE TB is: " tb@ . +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +cr ." AFTER-PATCH: AFTER TB is: " tb@ . cr With the above changes in slof/engine.in and board-qemu/slof/OF.fs I complied SLOF and got the following output on running a guest: [root@r223l performance_work]# virsh start vm4 --console Domain 'vm4' started Connected to domain 'vm4' Escape character is ^] (Ctrl + ]) Populating /vdevice methods Populating /vdevice/vty@30000000 Populating /vdevice/nvram@71000000 Populating /pci@800000020000000 00 0800 (D) : 1b36 000d serial bus [ usb-xhci ] 00 1000 (D) : 1af4 1003 virtio [ serial ] 00 1800 (D) : 1af4 1001 virtio [ block ] 00 2000 (D) : 1af4 1002 legacy-device* 00 2800 (D) : 1234 1111 qemu vga No NVRAM common partition, re-initializing... Installing QEMU fb Scanning USB XHCI: Initializing USB Keyboard No console specified using screen & keyboard BEFORE-PATCH: BEFORE TB is: 9de978a1 BEFORE-PATCH: AFTER TB is: 9e78efba AFTER-PATCH: BEFORE TB is: 9ebb67aa AFTER-PATCH: AFTER TB is: 9f2247cc AFTER-PATCH: BEFORE TB is: 9f64b9fd AFTER-PATCH: AFTER TB is: 9fc33e6c Welcome to Open Firmware Copyright (c) 2004, 2017 IBM Corporation All rights reserved. This program and the accompanying materials are made available under the terms of the BSD License available at http://www.opensource.org/licenses/bsd-license.php Trying to load: from: /pci@800000020000000/scsi@3 ... Successfully loaded [root@r223l performance_work]# echo $((0x9e78efba-0x9de978a1)) 9402137 [root@r223l performance_work]# echo $((0x9f2247cc-0x9ebb67aa)) 6742050 [root@r223l performance_work]# echo $((0x9fc33e6c-0x9f64b9fd)) 6194287 [root@r223l performance_work]# echo "scale=4;(9402137-6742050)/512" | bc 5195.4824 [root@r223l performance_work]# echo "scale=4;(9402137-6194287)/512" | bc 6265.3320 [root@r223l performance_work]# As per the calculations in the output of the BEFORE-PATCH and AFTER-PATCH logs I find that there is a very noticeable and consistent improvement in multiple runs in terms of microseconds. (My POWER9 bare-metal has 512 MHz timebase-frequency so thats why I am dividing by 512). Note: The above figures include the execution speed of IF-THEN and IF2-THEN2 after compilation. But since the actual execution speeds of the IF-THEN and the IF2-THEN2 after their compilation should be the same, this should get adjusted in my subtraction in the above 2 bc commands. >
Just to clarify one last point: On 2024-02-22 17:16:33, Kautuk Consul wrote: > Hi Thomas, > > > > > Hi Kautuk, > > > > could you maybe do some performance checks to see whether this make a > > difference (e.g. by running the command in a tight loop many times)? Running a single loop many times will not expose much because that loop (which is NOT within a Forth colon subroutine) will compile only once. In my performance benchmarking with tb@ I have put 45 IF-THEN and IF2-THEN2 control statements that will each compile once and reveal the difference in compilation speeds. > > You can use "tb@" to get the current value of the timebase counter, so > > reading that before and after the loop should provide you with a way of > > measuring the required time. > > > > Thomas > >
On Mon, Feb 12, 2024 at 02:58:46PM +0530, Kautuk Consul wrote: > Hi Segher/Thomas, > > On 2024-02-02 00:15:48, Kautuk Consul wrote: > > Use the standard "DOTICK <word> COMPILE," mechanism in +COMP and -COMP > > as is being used by the rest of the compiler. > > Also use "SEMICOLON" instead of "EXIT" to compile into HERE in -COMP > > as that is more informative as it mirrors the way the col() macro defines > > a colon definition. > > > > Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com> > > --- > > slof/engine.in | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/slof/engine.in b/slof/engine.in > > index 59e82f1..fa4d82e 100644 > > --- a/slof/engine.in > > +++ b/slof/engine.in > > @@ -422,8 +422,8 @@ imm(.( LIT(')') PARSE TYPE) > > col(COMPILE R> CELL+ DUP @ COMPILE, >R) > > > > var(THERE 0) > > -col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) > > -col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE) > > +col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE DOTICK DOCOL COMPILE,) > > +col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT DOTICK SEMICOLON COMPILE, THERE @ DOTO HERE COMP-BUFFER EXECUTE) > > What do you think about the above changes ? > Are there any more changes I could do to this patch ? > Or if you want to reject can you tell me why exactly ? I think both changes are bad. They reduce abstraction, for no reason at all. If you think the compiler should inline more, or do better optimisations even, work on *that*, don't do one unimportant case of it manually. I never made the indirect threading engine in Paflof faster, because it was plenty fast already. In SLOF, almost everything is compiled at runtime, and if it is important to speed that up there are some well- known usual caching tricks to make things *factors* faster. The main focus points for SLOF were to have an engine that is easily adapted for different purposes (and it was! Ask me about it :-) ), and to have things using it as debuggable as possible (you really need some hardware debugging thing to make it real easy; I had one back then. You need to be able to look at all memory state after a stop (a crash, perhaps), and seeing all CPU registers is useful as well. If you want to improve engine.in, get rid of it completely? Make the whol thing cross-compile perhaps. Everything from source code. The engine.in thing is essentially an already compiled thing (but not relocated yet, not fixed to some address), which is still in mostly obvious 1-1 correspondence to it source code, which can be easily "uncompiled" as well. Like: col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE) : +comp ( -- ) state @ 1 state +! IF exit THEN here there ! comp-buffer to here compile docol ; : -comp ( -- ) -1 state +! state @ IF exit THEN compile exit there @ to here comp-buffer execute ; "['] semicolon compile," is not something a user would ever write. A user would write "compile exit". It is standard Forth, it works anywhere. It is much more idiomatic.. Segher
On Fri, Feb 23, 2024 at 11:29:23AM +0530, Kautuk Consul wrote: > > > difference (e.g. by running the command in a tight loop many times)? > Running a single loop many times will not expose much because that loop > (which is NOT within a Forth colon subroutine) will compile only once. > In my performance benchmarking with tb@ I have put 45 IF-THEN and > IF2-THEN2 control statements that will each compile once and reveal the > difference in compilation speeds. All of this is only for things compiled in interpretation mode anyway. Even how you get the source code in (read it from a slow flash rom in the best case!) dominates performance. You do not write things in Forth because it is perfect speed. Write things directly in machine code if you want that, or in another high- level language that emphasises optimal execution speed. The good things about Forth are rapid prototyping, immediate testing of all code you write, simple compact code, that kind of goodness. Ideal for (system) firmware! Segher
On 2024-02-23 15:04:56, Segher Boessenkool wrote: > On Fri, Feb 23, 2024 at 11:29:23AM +0530, Kautuk Consul wrote: > > > > difference (e.g. by running the command in a tight loop many times)? > > Running a single loop many times will not expose much because that loop > > (which is NOT within a Forth colon subroutine) will compile only once. > > In my performance benchmarking with tb@ I have put 45 IF-THEN and > > IF2-THEN2 control statements that will each compile once and reveal the > > difference in compilation speeds. > > All of this is only for things compiled in interpretation mode anyway. > Even how you get the source code in (read it from a slow flash rom in > the best case!) dominates performance. > > You do not write things in Forth because it is perfect speed. Write > things directly in machine code if you want that, or in another high- > level language that emphasises optimal execution speed. The good things > about Forth are rapid prototyping, immediate testing of all code you > write, simple compact code, that kind of goodness. Ideal for (system) > firmware! > > > Segher Yes, but SLOF will be there on the product we sell to our customers. Considering that there is a noticeable improvement in performance I just thought maybe IBM management would be interested in it.
Hi, On 2024-02-23 14:57:23, Segher Boessenkool wrote: > > I think both changes are bad. They reduce abstraction, for no reason at > all. > > If you think the compiler should inline more, or do better optimisations > even, work on *that*, don't do one unimportant case of it manually. > > I never made the indirect threading engine in Paflof faster, because it > was plenty fast already. In SLOF, almost everything is compiled at > runtime, and if it is important to speed that up there are some well- > known usual caching tricks to make things *factors* faster. The main > focus points for SLOF were to have an engine that is easily adapted for > different purposes (and it was! Ask me about it :-) ), and to have > things using it as debuggable as possible (you really need some hardware > debugging thing to make it real easy; I had one back then. You need to > be able to look at all memory state after a stop (a crash, perhaps), and > seeing all CPU registers is useful as well. > > If you want to improve engine.in, get rid of it completely? Make the > whol thing cross-compile perhaps. Everything from source code. The > engine.in thing is essentially an already compiled thing (but not > relocated yet, not fixed to some address), which is still in mostly > obvious 1-1 correspondence to it source code, which can be easily > "uncompiled" as well. Like: :-). Getting rid of it completely and making the whole thing cross-compile would require more time that I'm not so sure that I or even my manager would be able to spare in our project. > > col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) > col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE) > > : +comp ( -- ) > state @ 1 state +! IF exit THEN > here there ! > comp-buffer to here > compile docol ; > : -comp ( -- ) > -1 state +! > state @ IF exit THEN > compile exit > there @ to here > comp-buffer execute ; > > "['] semicolon compile," is not something a user would ever write. A > user would write "compile exit". It is standard Forth, it works > anywhere. It is much more idiomatic.. Okay, I can accept the fact that maybe we should use EXIT instead of SEMICOLON. But at least can we remove the invocation of the "COMPILE" keyword in +COMP and -COMP ? The rest of the compiler in slof/engine.in uses the standard "DOTICK <word> COMPILE," format so why cannot we use this for -COMP as well as +COMP ? > > > Segher
Hi Thomas, On 2024-02-26 10:02:55, Kautuk Consul wrote: > On 2024-02-23 15:04:56, Segher Boessenkool wrote: > > On Fri, Feb 23, 2024 at 11:29:23AM +0530, Kautuk Consul wrote: > > > > > difference (e.g. by running the command in a tight loop many times)? > > > Running a single loop many times will not expose much because that loop > > > (which is NOT within a Forth colon subroutine) will compile only once. > > > In my performance benchmarking with tb@ I have put 45 IF-THEN and > > > IF2-THEN2 control statements that will each compile once and reveal the > > > difference in compilation speeds. > > > > All of this is only for things compiled in interpretation mode anyway. > > Even how you get the source code in (read it from a slow flash rom in > > the best case!) dominates performance. > > > > You do not write things in Forth because it is perfect speed. Write > > things directly in machine code if you want that, or in another high- > > level language that emphasises optimal execution speed. The good things > > about Forth are rapid prototyping, immediate testing of all code you > > write, simple compact code, that kind of goodness. Ideal for (system) > > firmware! > > > > > > Segher > > Yes, but SLOF will be there on the product we sell to our customers. > Considering that there is a noticeable improvement in performance I just > thought maybe IBM management would be interested in it. On this note, what did you also try to understand the performance implications of my patch ? What improvements did you observer on your set up?
On 2024-02-26 10:13:52, Kautuk Consul wrote: > Hi Thomas, > > On 2024-02-26 10:02:55, Kautuk Consul wrote: > > On 2024-02-23 15:04:56, Segher Boessenkool wrote: > > > On Fri, Feb 23, 2024 at 11:29:23AM +0530, Kautuk Consul wrote: > > > > > > difference (e.g. by running the command in a tight loop many times)? > > > > Running a single loop many times will not expose much because that loop > > > > (which is NOT within a Forth colon subroutine) will compile only once. > > > > In my performance benchmarking with tb@ I have put 45 IF-THEN and > > > > IF2-THEN2 control statements that will each compile once and reveal the > > > > difference in compilation speeds. > > > > > > All of this is only for things compiled in interpretation mode anyway. > > > Even how you get the source code in (read it from a slow flash rom in > > > the best case!) dominates performance. > > > > > > You do not write things in Forth because it is perfect speed. Write > > > things directly in machine code if you want that, or in another high- > > > level language that emphasises optimal execution speed. The good things > > > about Forth are rapid prototyping, immediate testing of all code you > > > write, simple compact code, that kind of goodness. Ideal for (system) > > > firmware! > > > > > > > > > Segher > > > > Yes, but SLOF will be there on the product we sell to our customers. > > Considering that there is a noticeable improvement in performance I just > > thought maybe IBM management would be interested in it. > > On this note, what did you also try to understand the performance > implications of my patch ? What improvements did you observer on your > set up? One strange thing I noticed is that when I copy paste the same IF-THEN or IF2-THEN2 lines below the ones that I already have in my OF.fs, the timebase seems to keep reducing for some strange reason. The number of IF-THEN and IF2-THEN2 statements are the exact same.
One change: On 2024-02-26 12:17:54, Kautuk Consul wrote: > One strange thing I noticed is that when I copy paste the same IF-THEN > or IF2-THEN2 lines below the ones that I already have in my OF.fs, the > timebase seems to keep reducing for some strange reason. The number of > IF-THEN and IF2-THEN2 statements are the exact same. It seems it is not a good idea to put the before patch and after patch statements together as the timebase difference keeps decreasing. So then I ran the VM once with the "BEFORE-PATCH" and "AFTER-PATCH" changes and I find that for 45 lines of IF-THEN and IF2-THEN2 the performance improvement is aroung 35 microseconds only. Per IF-THEN pair then the improvement is only in nanoseconds.
Hi Segher, > > If you want to improve engine.in, get rid of it completely? Make the > > whol thing cross-compile perhaps. Everything from source code. The > > engine.in thing is essentially an already compiled thing (but not > > relocated yet, not fixed to some address), which is still in mostly > > obvious 1-1 correspondence to it source code, which can be easily > > "uncompiled" as well. Like: > > :-). Getting rid of it completely and making the whole thing > cross-compile would require more time that I'm not so sure that I or > even my manager would be able to spare in our project. > > > > > col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) > > col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE) > > > > : +comp ( -- ) > > state @ 1 state +! IF exit THEN > > here there ! > > comp-buffer to here > > compile docol ; > > : -comp ( -- ) > > -1 state +! > > state @ IF exit THEN > > compile exit > > there @ to here > > comp-buffer execute ; > > > > "['] semicolon compile," is not something a user would ever write. A > > user would write "compile exit". It is standard Forth, it works > > anywhere. It is much more idiomatic.. > > Okay, I can accept the fact that maybe we should use EXIT instead of > SEMICOLON. But at least can we remove the invocation of the "COMPILE" > keyword in +COMP and -COMP ? The rest of the compiler in slof/engine.in > uses the standard "DOTICK <word> COMPILE," format so why cannot we use > this for -COMP as well as +COMP ? > Do you agree with the above reasoning as well as the fact that I think we would all here (in the KVM team) appreciate even this small improvement in performance ? Can I send a v3 patch with the "DOTICK EXIT COMPILE," "DOTICK DOCOL COMPILE," changes ?
Hi Segher, > > > > If you want to improve engine.in, get rid of it completely? Make the > > > whol thing cross-compile perhaps. Everything from source code. The > > > engine.in thing is essentially an already compiled thing (but not > > > relocated yet, not fixed to some address), which is still in mostly > > > obvious 1-1 correspondence to it source code, which can be easily > > > "uncompiled" as well. Like: > > > > :-). Getting rid of it completely and making the whole thing > > cross-compile would require more time that I'm not so sure that I or > > even my manager would be able to spare in our project. > > > > > > > > col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) > > > col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE) > > > > > > : +comp ( -- ) > > > state @ 1 state +! IF exit THEN > > > here there ! > > > comp-buffer to here > > > compile docol ; > > > : -comp ( -- ) > > > -1 state +! > > > state @ IF exit THEN > > > compile exit > > > there @ to here > > > comp-buffer execute ; > > > > > > "['] semicolon compile," is not something a user would ever write. A > > > user would write "compile exit". It is standard Forth, it works > > > anywhere. It is much more idiomatic.. > > > > Okay, I can accept the fact that maybe we should use EXIT instead of > > SEMICOLON. But at least can we remove the invocation of the "COMPILE" > > keyword in +COMP and -COMP ? The rest of the compiler in slof/engine.in > > uses the standard "DOTICK <word> COMPILE," format so why cannot we use > > this for -COMP as well as +COMP ? > > Do you all agree with the above reasoning as well as the fact that I think we would all here (in the KVM team) appreciate even this small improvement in performance ? Can I send a v3 patch with the "DOTICK EXIT COMPILE," "DOTICK DOCOL COMPILE," changes ?
Hi Segher/Alexey/Thomas, > > > If you want to improve engine.in, get rid of it completely? Make the > > > whol thing cross-compile perhaps. Everything from source code. The > > > engine.in thing is essentially an already compiled thing (but not > > > relocated yet, not fixed to some address), which is still in mostly > > > obvious 1-1 correspondence to it source code, which can be easily > > > "uncompiled" as well. Like: > > > > :-). Getting rid of it completely and making the whole thing > > cross-compile would require more time that I'm not so sure that I or > > even my manager would be able to spare in our project. > > > > > > > > col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) > > > col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE) > > > > > > : +comp ( -- ) > > > state @ 1 state +! IF exit THEN > > > here there ! > > > comp-buffer to here > > > compile docol ; > > > : -comp ( -- ) > > > -1 state +! > > > state @ IF exit THEN > > > compile exit > > > there @ to here > > > comp-buffer execute ; > > > > > > "['] semicolon compile," is not something a user would ever write. A > > > user would write "compile exit". It is standard Forth, it works > > > anywhere. It is much more idiomatic.. > > > > Okay, I can accept the fact that maybe we should use EXIT instead of > > SEMICOLON. But at least can we remove the invocation of the "COMPILE" > > keyword in +COMP and -COMP ? The rest of the compiler in slof/engine.in > > uses the standard "DOTICK <word> COMPILE," format so why cannot we use > > this for -COMP as well as +COMP ? > > Do you all agree with the above reasoning as well as the fact that I think we would all here (in the KVM team) appreciate even this small improvement in performance ? Can I send a v3 patch with the "DOTICK EXIT COMPILE," "DOTICK DOCOL COMPILE," changes ? Or should I just abandon this patch ? My point is that when we aren't doing anything unorthodox in/with the slof/engine.in code then why not go in for a useful optimization in SLOF as this is part of the KVM Over PowerVM product ?
On Wed, 22 May 2024, at 19:04, Kautuk Consul wrote: > Hi Segher/Alexey/Thomas, > > > > > If you want to improve engine.in, get rid of it completely? Make the > > > > whol thing cross-compile perhaps. Everything from source code. The > > > > engine.in thing is essentially an already compiled thing (but not > > > > relocated yet, not fixed to some address), which is still in mostly > > > > obvious 1-1 correspondence to it source code, which can be easily > > > > "uncompiled" as well. Like: > > > > > > :-). Getting rid of it completely and making the whole thing > > > cross-compile would require more time that I'm not so sure that I or > > > even my manager would be able to spare in our project. > > > > > > > > > > > col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) > > > > col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE) > > > > > > > > : +comp ( -- ) > > > > state @ 1 state +! IF exit THEN > > > > here there ! > > > > comp-buffer to here > > > > compile docol ; > > > > : -comp ( -- ) > > > > -1 state +! > > > > state @ IF exit THEN > > > > compile exit > > > > there @ to here > > > > comp-buffer execute ; > > > > > > > > "['] semicolon compile," is not something a user would ever write. A > > > > user would write "compile exit". It is standard Forth, it works > > > > anywhere. It is much more idiomatic.. > > > > > > Okay, I can accept the fact that maybe we should use EXIT instead of > > > SEMICOLON. But at least can we remove the invocation of the "COMPILE" > > > keyword in +COMP and -COMP ? The rest of the compiler in slof/engine.in > > > uses the standard "DOTICK <word> COMPILE," format so why cannot we use > > > this for -COMP as well as +COMP ? > > > > Do you all agree with the above reasoning as well as the fact that I think > we would all here (in the KVM team) appreciate even this small > improvement in performance ? > Can I send a v3 patch with the "DOTICK EXIT COMPILE," "DOTICK DOCOL COMPILE," changes ? > > Or should I just abandon this patch ? My point is that when we aren't > doing anything unorthodox in/with the slof/engine.in code then why not > go in for a useful optimization in SLOF as this is part of the KVM Over > PowerVM product ? Is this optimization even measurable? :) There is one thing which SLOF could do much faster (seconds or tens of seconds) - it is PCI scan where SLOF walks through all busses and slots even though it is all in the device tree already. But this "dotick" business does not scream for optimization imho. Thanks,
Hi, > > Is this optimization even measurable? :) > > There is one thing which SLOF could do much faster (seconds or tens of seconds) - it is PCI scan where SLOF walks through all busses and slots even though it is all in the device tree already. But this "dotick" business does not scream for optimization imho. Thanks, Yeah. In nanoseconds per control statement pair as per the benchmarking in the emails before. For 45 IF-THEN and IF2-THEN2 statements I achieved an improvement of around 35 microseconds on an average. Nothing much but I just thought it is an interesting change considering most of the Forth compiler is fine. I was just reading through the Forth compiler in slof/engine.in and saw this and decided to post it out of interest. But I will abandon this patch now as you feel that this is not enough to warrant any change in slof/engine.in. Thanks anyway!
diff --git a/slof/engine.in b/slof/engine.in index 59e82f1..fa4d82e 100644 --- a/slof/engine.in +++ b/slof/engine.in @@ -422,8 +422,8 @@ imm(.( LIT(')') PARSE TYPE) col(COMPILE R> CELL+ DUP @ COMPILE, >R) var(THERE 0) -col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) -col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE) +col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE DOTICK DOCOL COMPILE,) +col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT DOTICK SEMICOLON COMPILE, THERE @ DOTO HERE COMP-BUFFER EXECUTE) // Structure words. col(RESOLVE-ORIG HERE OVER CELL+ - SWAP !)
Use the standard "DOTICK <word> COMPILE," mechanism in +COMP and -COMP as is being used by the rest of the compiler. Also use "SEMICOLON" instead of "EXIT" to compile into HERE in -COMP as that is more informative as it mirrors the way the col() macro defines a colon definition. Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com> --- slof/engine.in | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)