Message ID | 20150618081640.GK931@aurel32.net |
---|---|
State | New |
Headers | show |
> From: Aurelien Jarno [mailto:aurelien@aurel32.net] > On 2015-06-18 10:12, Pavel Dovgaluk wrote: > > > From: Aurelien Jarno [mailto:aurelien@aurel32.net] > > > On 2015-06-17 15:41, Pavel Dovgalyuk wrote: > > > > In icount mode every translation block looks as follows: > > > > > > > > if icount < n then exit > > > > icount -= n > > > > instr1 > > > > instr2 > > > > ... > > > > instrn > > > > exit > > > > > > > > When one of these instructions initiates an exception, icount should be > > > > restored and adjusted number of instructions should be subtracted from icount > > > > instead of initial n. > > > > > > > > tlb_fill function passes retaddr to raise_exception, which allows restoring > > > > current instructions in TB and correct icount calculation. > > > > > > > > When exception triggered with other function (e.g. by embedding call to > > > > exception raising helper into TB), then PC is not passed as retaddr and > > > > correct icount is not recovered. In such cases icount will be decreased > > > > by the value equal to the size of TB. > > > > > > Looking at how icount work, I see it's basically a variable in the CPU > > > state (icount_decr.u16.low), which is already accessed from the TB. > > > Couldn't we adjust it using additional code before generating an > > > exception, when in icount mode. > > > > > > For example for MIPS, we can add some code before generate_exception > > > which use the value from s->gen_opc_icount[j] to adjust > > > the variable icount_decr.u16.low. > > > > It is possible, but it will incur additional overhead, because we will > > have to update icount every time the exception might be generated. > > We'll have to update icount value before and after every helper call, > > that can cause an exception: > > > > icount -= n > > ... > > instr_k > > icount += n - k > > helper > > icount -= n - k > > ... > > > > And this overhead will slowdown the code even if no exception occur. > > That's where I might disagree. Retranslation seems a very good idea on > the paper, but in practice it doesn't seems to always bring the > performance improvement it should. In addition it seems to be highly > dependent on the target. Just to give some numbers, on MIPS (as your > patch originally concerns this architecture), 40% of code generation is > actually due to retranslation. The problem is that over the time we have > improved a lot the code generation (liveness analysis, better register > allocation, constant propagation, ...) and thus we have increased the > code generation time. While it clearly has some benefits when this code > is actually executed, it's not the case when the code is simply > retranslated. In short we spend more time to find the CPU state > corresponding to an exception than before. > ... > > All of that to say that I am worried for the performances to see more > paths through the retranslation code, especially on MIPS as it seems to > be costly. That said I haven't really look in details at other targets, > nor hosts. I fixed syscalls, exceptions that occur without any conditions, and removed redundant calls to save_cpu_state. Then I measured the performance without enabling icount. And Linux boots even faster than with original version. I'll submit this version for review soon. > Now to come back about your patches, we might want to simply fix icount > first, even if it has some performance impact, and deal with the > retranslation issue separately, as it concerns more than just icount. Pavel Dovgalyuk
diff --git a/target-mips/translate.c b/target-mips/translate.c index 1d128ee..5238d71 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -19435,6 +19435,7 @@ gen_intermediate_code_internal(MIPSCPU *cpu, TranslationBlock *tb, LOG_DISAS("\ntb %p idx %d hflags %04x\n", tb, ctx.mem_idx, ctx.hflags); gen_tb_start(tb); while (ctx.bstate == BS_NONE) { + save_cpu_state(&ctx, 1); if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) { QTAILQ_FOREACH(bp, &cs->breakpoints, entry) { if (bp->pc == ctx.pc) { diff --git a/translate-all.c b/translate-all.c index b6b0e1c..3d4c017 100644 --- a/translate-all.c +++ b/translate-all.c @@ -212,6 +212,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb, int64_t ti; #endif + return -1; + #ifdef CONFIG_PROFILER ti = profile_getclock(); #endif