[v2,00/19] target/i386: decoder changes for 8.2

Message ID	20231019104648.389942-1-pbonzini@redhat.com
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Paolo Bonzini <pbonzini@redhat.com> To: qemu-devel@nongnu.org Subject: [PATCH v2 00/19] target/i386: decoder changes for 8.2 Date: Thu, 19 Oct 2023 12:46:29 +0200 Message-ID: <20231019104648.389942-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Series	target/i386: decoder changes for 8.2 \| expand [v2,00/19] target/i386: decoder changes for 8.2 [v2,01/19] target/i386: group common checks in the decoding phase [v2,02/19] target/i386: validate VEX.W for AVX instructions [v2,03/19] target/i386: implement SHA instructions [v2,04/19] tests/tcg/i386: initialize more registers in test-avx [v2,05/19] tests/tcg/i386: test-avx: add test cases for SHA new instructions [v2,06/19] target/i386: accept full MemOp in gen_ext_tl [v2,07/19] target/i386: introduce flags writeback mechanism [v2,08/19] target/i386: implement CMPccXADD [v2,09/19] target/i386: do not clobber A0 in POP translation [v2,10/19] target/i386: reintroduce debugging mechanism [v2,11/19] target/i386: move 00-5F opcodes to new decoder [v2,12/19] target/i386: adjust decoding of J operand [v2,13/19] target/i386: split eflags computation out of gen_compute_eflags [v2,14/19] tcg: add negsetcondi [v2,15/19] target/i386: move 60-BF opcodes to new decoder [v2,16/19] target/i386: move operand load and writeback out of gen_cmovcc1 [v2,17/19] target/i386: move remaining conditional operations to new decoder [v2,18/19] target/i386: remove now converted opcodes from old decoder [v2,19/19] target/i386: remove gen_op

Message ID

20231019104648.389942-1-pbonzini@redhat.com

Headers

From: Paolo Bonzini <pbonzini@redhat.com>
To: qemu-devel@nongnu.org
Subject: [PATCH v2 00/19] target/i386: decoder changes for 8.2
Date: Thu, 19 Oct 2023 12:46:29 +0200
Message-ID: <20231019104648.389942-1-pbonzini@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=170.10.133.124;
 envelope-from=pbonzini@redhat.com;
 helo=us-smtp-delivery-124.mimecast.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001,
 SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Series

target/i386: decoder changes for 8.2 | expand

Message

Paolo Bonzini Oct. 19, 2023, 10:46 a.m. UTC

This includes:

- implementing SHA and CMPccXADD instruction extensions

- introducing a new mechanism for flags writeback that avoids a
  tricky failure

- converting the more orthogonal parts of the one-byte opcode
  map, as well as the CMOVcc and SETcc instructions.

Tested by booting several 32-bit and 64-bit guests.

The new decoder produces roughly 2% more ops, but after optimization there
are just 0.5% more and almost all of them come from cmp instructions.
For some reason that I have not investigated, these end up with an extra
mov even after optimization:

                                sub_i64 tmp0,rax,$0x33
 mov_i64 cc_src,$0x33           mov_i64 cc_dst,tmp0
 sub_i64 cc_dst,rax,$0x33       mov_i64 cc_src,$0x33
 discard cc_src2                discard cc_src2
 discard cc_op                  discard cc_op

It could be easily fixed by not reusing gen_SUB for cmp instructions,
or by debugging what goes on in the optimizer.  However, it does not
result in larger assembly.

Paolo

v1->v2: call set_cc_op from the delayed flags writeback
	preparation for CC_OP_DYNAMIC
	fix INC/DEC to use delayed flags writeback
	remove cc_srcT from delayed flags writeback
	annotate places that call set_cc_op() from emit functions
	rewrite IMUL expansion to avoid nowb and to commonize flags handling
	introduce tcg_gen_negsetcondi*

Paolo Bonzini (19):
  target/i386: group common checks in the decoding phase
  target/i386: validate VEX.W for AVX instructions
  target/i386: implement SHA instructions
  tests/tcg/i386: initialize more registers in test-avx
  tests/tcg/i386: test-avx: add test cases for SHA new instructions
  target/i386: accept full MemOp in gen_ext_tl
  target/i386: introduce flags writeback mechanism
  target/i386: implement CMPccXADD
  target/i386: do not clobber A0 in POP translation
  target/i386: reintroduce debugging mechanism
  target/i386: move 00-5F opcodes to new decoder
  target/i386: adjust decoding of J operand
  target/i386: split eflags computation out of gen_compute_eflags
  tcg: add negsetcondi
  target/i386: move 60-BF opcodes to new decoder
  target/i386: move operand load and writeback out of gen_cmovcc1
  target/i386: move remaining conditional operations to new decoder
  target/i386: remove now converted opcodes from old decoder
  target/i386: remove gen_op

 include/tcg/tcg-op-common.h          |    4 +
 include/tcg/tcg-op.h                 |    2 +
 target/i386/cpu.c                    |    4 +-
 target/i386/cpu.h                    |    1 +
 target/i386/ops_sse.h                |  128 ++++
 target/i386/tcg/decode-new.c.inc     |  616 ++++++++++++++--
 target/i386/tcg/decode-new.h         |   43 +-
 target/i386/tcg/emit.c.inc           |  745 ++++++++++++++++++-
 target/i386/tcg/ops_sse_header.h.inc |   14 +
 target/i386/tcg/translate.c          | 1001 +++-----------------------
 tcg/tcg-op.c                         |   12 +
 tests/tcg/i386/Makefile.target       |    2 +-
 tests/tcg/i386/test-avx.c            |    8 +
 tests/tcg/i386/test-avx.py           |    3 +-
 tests/tcg/i386/test-flags.c          |   37 +
 15 files changed, 1644 insertions(+), 976 deletions(-)
 create mode 100644 tests/tcg/i386/test-flags.c

Comments

Paolo Bonzini Oct. 19, 2023, 11:39 a.m. UTC | #1

On 10/19/23 12:46, Paolo Bonzini wrote:
> This includes:
> 
> - implementing SHA and CMPccXADD instruction extensions
> 
> - introducing a new mechanism for flags writeback that avoids a
>    tricky failure
> 
> - converting the more orthogonal parts of the one-byte opcode
>    map, as well as the CMOVcc and SETcc instructions.
> 
> Tested by booting several 32-bit and 64-bit guests.
> 
> The new decoder produces roughly 2% more ops, but after optimization there
> are just 0.5% more and almost all of them come from cmp instructions.
> For some reason that I have not investigated, these end up with an extra
> mov even after optimization:
> 
>                                  sub_i64 tmp0,rax,$0x33
>   mov_i64 cc_src,$0x33           mov_i64 cc_dst,tmp0
>   sub_i64 cc_dst,rax,$0x33       mov_i64 cc_src,$0x33
>   discard cc_src2                discard cc_src2
>   discard cc_op                  discard cc_op
> 
> It could be easily fixed by not reusing gen_SUB for cmp instructions,
> or by debugging what goes on in the optimizer.  However, it does not
> result in larger assembly.

Oops, I missed Richard's newer reviews.  Will send v3 sometime next week.

Paolo

Richard Henderson Oct. 19, 2023, 3:44 p.m. UTC | #2

On 10/19/23 03:46, Paolo Bonzini wrote:
> This includes:
> 
> - implementing SHA and CMPccXADD instruction extensions
> 
> - introducing a new mechanism for flags writeback that avoids a
>    tricky failure
> 
> - converting the more orthogonal parts of the one-byte opcode
>    map, as well as the CMOVcc and SETcc instructions.
> 
> Tested by booting several 32-bit and 64-bit guests.
> 
> The new decoder produces roughly 2% more ops, but after optimization there
> are just 0.5% more and almost all of them come from cmp instructions.
> For some reason that I have not investigated, these end up with an extra
> mov even after optimization:
> 
>                                  sub_i64 tmp0,rax,$0x33
>   mov_i64 cc_src,$0x33           mov_i64 cc_dst,tmp0
>   sub_i64 cc_dst,rax,$0x33       mov_i64 cc_src,$0x33
>   discard cc_src2                discard cc_src2
>   discard cc_op                  discard cc_op
> 
> It could be easily fixed by not reusing gen_SUB for cmp instructions,
> or by debugging what goes on in the optimizer.  However, it does not
> result in larger assembly.

This is expected behaviour out of the tcg optimizer.  We don't forward-propagate outputs 
at that point.  But during register allocation of the "mov cc_dst,tmp0" opcode, we will 
see that tmp0 is dead and re-assign the register from tmp0 to cc_dst without emitting an 
host instruction.


r~