diff mbox series

[v3,01/20] accel/tcg: restrict assert on icount_enabled to qemu-system

Message ID 20240807000652.1417776-2-debug@rivosinc.com
State New
Headers show
Series riscv support for control flow integrity extensions | expand

Commit Message

Deepak Gupta Aug. 7, 2024, 12:06 a.m. UTC
commit 16ad9788 [1] restricted icount to qemu-system only. Although
assert in `cpu_loop_exec_tb` is on `icount_enabled()` which is 0 when
its qemu-user and debug build starts asserting.
Move assert for qemu-system.

[1] - https://lists.gnu.org/archive/html/qemu-riscv/2024-01/msg00608.html

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 accel/tcg/cpu-exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Richard Henderson Aug. 7, 2024, 12:48 a.m. UTC | #1
On 8/7/24 10:06, Deepak Gupta wrote:
> commit 16ad9788 [1] restricted icount to qemu-system only. Although
> assert in `cpu_loop_exec_tb` is on `icount_enabled()` which is 0 when
> its qemu-user and debug build starts asserting.
> Move assert for qemu-system.
> 
> [1] - https://lists.gnu.org/archive/html/qemu-riscv/2024-01/msg00608.html
> 
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>   accel/tcg/cpu-exec.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 245fd6327d..8cc2a6104f 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -927,9 +927,9 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
>           return;
>       }
>   
> +#ifndef CONFIG_USER_ONLY
>       /* Instruction counter expired.  */
>       assert(icount_enabled());
> -#ifndef CONFIG_USER_ONLY
>       /* Ensure global icount has gone forward */
>       icount_update(cpu);
>       /* Refill decrementer and continue execution.  */

No, this is a real bug.

Just above we handled

   (1) exit for tcg (non-)chaining (!= TB_EXIT_REQUESTED),
   (2) exit for exception/interrupt (cpu_loop_exit_requested).

The only thing that is left is exit for icount expired.
And for that we *must* have icount enabled.

How did you encounter this?


r~
Deepak Gupta Aug. 7, 2024, 6:45 p.m. UTC | #2
On Wed, Aug 07, 2024 at 10:48:56AM +1000, Richard Henderson wrote:
>On 8/7/24 10:06, Deepak Gupta wrote:
>>commit 16ad9788 [1] restricted icount to qemu-system only. Although
>>assert in `cpu_loop_exec_tb` is on `icount_enabled()` which is 0 when
>>its qemu-user and debug build starts asserting.
>>Move assert for qemu-system.
>>
>>[1] - https://lists.gnu.org/archive/html/qemu-riscv/2024-01/msg00608.html
>>
>>Signed-off-by: Deepak Gupta <debug@rivosinc.com>
>>---
>>  accel/tcg/cpu-exec.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>>index 245fd6327d..8cc2a6104f 100644
>>--- a/accel/tcg/cpu-exec.c
>>+++ b/accel/tcg/cpu-exec.c
>>@@ -927,9 +927,9 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
>>          return;
>>      }
>>+#ifndef CONFIG_USER_ONLY
>>      /* Instruction counter expired.  */
>>      assert(icount_enabled());
>>-#ifndef CONFIG_USER_ONLY
>>      /* Ensure global icount has gone forward */
>>      icount_update(cpu);
>>      /* Refill decrementer and continue execution.  */
>
>No, this is a real bug.
>
>Just above we handled
>
>  (1) exit for tcg (non-)chaining (!= TB_EXIT_REQUESTED),
>  (2) exit for exception/interrupt (cpu_loop_exit_requested).
>
>The only thing that is left is exit for icount expired.
>And for that we *must* have icount enabled.
>
>How did you encounter this?

hmm I was experimenting with reducing TB flags (i.e. not using two different TB flags
for zicfilp). But I swear, I didn't see it go away (for qemu-user) when I had switched to
two TB flags.

Now that you've pointed it out specifically, I tried again.
Its not asserting at this place anymore for qemu-ser

I'll remove this patch in next version. And if I encounter this again, will dig a little bit
more deep into it and try to get repro steps.

>
>
>r~
Deepak Gupta Aug. 12, 2024, 5:41 p.m. UTC | #3
On Wed, Aug 07, 2024 at 10:48:56AM +1000, Richard Henderson wrote:
>On 8/7/24 10:06, Deepak Gupta wrote:
>>commit 16ad9788 [1] restricted icount to qemu-system only. Although
>>assert in `cpu_loop_exec_tb` is on `icount_enabled()` which is 0 when
>>its qemu-user and debug build starts asserting.
>>Move assert for qemu-system.
>>
>>[1] - https://lists.gnu.org/archive/html/qemu-riscv/2024-01/msg00608.html
>>
>>Signed-off-by: Deepak Gupta <debug@rivosinc.com>
>>---
>>  accel/tcg/cpu-exec.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>>index 245fd6327d..8cc2a6104f 100644
>>--- a/accel/tcg/cpu-exec.c
>>+++ b/accel/tcg/cpu-exec.c
>>@@ -927,9 +927,9 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
>>          return;
>>      }
>>+#ifndef CONFIG_USER_ONLY
>>      /* Instruction counter expired.  */
>>      assert(icount_enabled());
>>-#ifndef CONFIG_USER_ONLY
>>      /* Ensure global icount has gone forward */
>>      icount_update(cpu);
>>      /* Refill decrementer and continue execution.  */
>
>No, this is a real bug.
>
>Just above we handled
>
>  (1) exit for tcg (non-)chaining (!= TB_EXIT_REQUESTED),
>  (2) exit for exception/interrupt (cpu_loop_exit_requested).
>
>The only thing that is left is exit for icount expired.
>And for that we *must* have icount enabled.
>
>How did you encounter this?

I spent last week incorporating your suggestions. And during the flux of it, I started
seeing this issue again. As soon as I switch to branch from where I sent the patches out,
this icount assert issue disappears. So something definitley is triggering the issue.
It happens only when zicfilp and zicfiss are enabled.

I am still trying to root cause and in a fog right now. Although I am not very well versed
with tcg internals. So any pointer here which could help me debug it faster would be well
appreciated. Thanks.


>
>
>r~
diff mbox series

Patch

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 245fd6327d..8cc2a6104f 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -927,9 +927,9 @@  static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
         return;
     }
 
+#ifndef CONFIG_USER_ONLY
     /* Instruction counter expired.  */
     assert(icount_enabled());
-#ifndef CONFIG_USER_ONLY
     /* Ensure global icount has gone forward */
     icount_update(cpu);
     /* Refill decrementer and continue execution.  */