Message ID | 20231019104648.389942-1-pbonzini@redhat.com |
---|---|
Headers | show |
Series | target/i386: decoder changes for 8.2 | expand |
On 10/19/23 12:46, Paolo Bonzini wrote: > This includes: > > - implementing SHA and CMPccXADD instruction extensions > > - introducing a new mechanism for flags writeback that avoids a > tricky failure > > - converting the more orthogonal parts of the one-byte opcode > map, as well as the CMOVcc and SETcc instructions. > > Tested by booting several 32-bit and 64-bit guests. > > The new decoder produces roughly 2% more ops, but after optimization there > are just 0.5% more and almost all of them come from cmp instructions. > For some reason that I have not investigated, these end up with an extra > mov even after optimization: > > sub_i64 tmp0,rax,$0x33 > mov_i64 cc_src,$0x33 mov_i64 cc_dst,tmp0 > sub_i64 cc_dst,rax,$0x33 mov_i64 cc_src,$0x33 > discard cc_src2 discard cc_src2 > discard cc_op discard cc_op > > It could be easily fixed by not reusing gen_SUB for cmp instructions, > or by debugging what goes on in the optimizer. However, it does not > result in larger assembly. Oops, I missed Richard's newer reviews. Will send v3 sometime next week. Paolo
On 10/19/23 03:46, Paolo Bonzini wrote: > This includes: > > - implementing SHA and CMPccXADD instruction extensions > > - introducing a new mechanism for flags writeback that avoids a > tricky failure > > - converting the more orthogonal parts of the one-byte opcode > map, as well as the CMOVcc and SETcc instructions. > > Tested by booting several 32-bit and 64-bit guests. > > The new decoder produces roughly 2% more ops, but after optimization there > are just 0.5% more and almost all of them come from cmp instructions. > For some reason that I have not investigated, these end up with an extra > mov even after optimization: > > sub_i64 tmp0,rax,$0x33 > mov_i64 cc_src,$0x33 mov_i64 cc_dst,tmp0 > sub_i64 cc_dst,rax,$0x33 mov_i64 cc_src,$0x33 > discard cc_src2 discard cc_src2 > discard cc_op discard cc_op > > It could be easily fixed by not reusing gen_SUB for cmp instructions, > or by debugging what goes on in the optimizer. However, it does not > result in larger assembly. This is expected behaviour out of the tcg optimizer. We don't forward-propagate outputs at that point. But during register allocation of the "mov cc_dst,tmp0" opcode, we will see that tmp0 is dead and re-assign the register from tmp0 to cc_dst without emitting an host instruction. r~