From patchwork Wed Feb 5 21:27:54 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 317278 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 6DF012C00A0 for ; Thu, 6 Feb 2014 08:28:25 +1100 (EST) Received: from localhost ([::1]:33346 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WBA19-0003wC-Ag for incoming@patchwork.ozlabs.org; Wed, 05 Feb 2014 16:28:23 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45333) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WBA0o-0003w1-NK for qemu-devel@nongnu.org; Wed, 05 Feb 2014 16:28:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WBA0i-0001cb-KK for qemu-devel@nongnu.org; Wed, 05 Feb 2014 16:28:02 -0500 Received: from mx1.redhat.com ([209.132.183.28]:22685) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WBA0i-0001cN-9G for qemu-devel@nongnu.org; Wed, 05 Feb 2014 16:27:56 -0500 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s15LRsLa025382 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 5 Feb 2014 16:27:55 -0500 Received: from [10.3.113.35] (ovpn-113-35.phx2.redhat.com [10.3.113.35]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id s15LRswC003252; Wed, 5 Feb 2014 16:27:54 -0500 Message-ID: <1391635674.15608.13.camel@ul30vt.home> From: Alex Williamson To: Maik Broemme Date: Wed, 05 Feb 2014 14:27:54 -0700 In-Reply-To: <20140205211012.GJ9486@parallels.com> References: <20140205185945.GA996@parallels.com> <1391631994.15608.7.camel@ul30vt.home> <20140205211012.GJ9486@parallels.com> Mime-Version: 1.0 X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: qemu-devel@nongnu.org Subject: Re: [Qemu-devel] Multi GPU passthrough via VFIO X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Wed, 2014-02-05 at 22:10 +0100, Maik Broemme wrote: > Hi Alex, > > Alex Williamson wrote: > > On Wed, 2014-02-05 at 19:59 +0100, Maik Broemme wrote: > > > Hi, > > > > > > currently VFIO with multi GPU passthrough is working partially and > > > hopefully somebody has a hint about the problem. I'm doing passthrough > > > of an AMD Radeon R9 290X and AMD Radeon 7870 GHz Edition to a single VM. > > > > > > If the VM is running Linux this works quite well with radeon or fglrx > > > driver. Please see 'dmesg' log attached, when using the radeon driver. > > > If needed I can also post one with fglrx driver. > > > > > > If I do the exact same passthrough to a Windows VM and use latest AMD > > > Catalyst 14.1 (2/1/2014) or AMD Catalyst 13.12 (12/18/2013) I can get > > > only the first device working (AMD R9 290X) with 'x-vga=on'. I don't > > > enable 'x-vga=on' on second device as this should never work. :) > > > > Why not? The guest is able to change the VGA enable bit in the emulated > > bridge registers and access VGA space of each device, just like happens > > on bare metal. You'll only get one device initialized from seabios, but > > that's the same as would happen on bare metal as well. > > > > Well it was just my guess as it would behave like most physical boxes > in this case. :) > > > > I see > > > BIOS boot screen and everything works fine except for the second GPU. > > > The windows device manager just show me "Code 12" for the second GPU > > > and its HD Audio device. Code 12 means: "This device cannot find enough > > > free resources that it can use". > > > > I've seen the same using Nvidia GRID GPUs (w/o x-vga=on), but only with > > the Q35 chipset model, Linux works, Windows reports Code 12. I have no > > idea why as all the PCI resources appear to be properly sized and > > mapped. FWIW, 2 GRID GPUs assigned to a guest do work with the 440FX > > chipset model. Beyond 2 we run out of MMIO resources below 4G and > > something bad happens. > > > > Interesting. I will try 440FX a bit later and see if this works. What I > can also do is to post system resource conflicts from Windows, the AMD > Catalyst Center has it integrated. Maybe this will help? If you actually see conflicts, then yes. The Code 12 I've seen I was never able to identify a conflict. The trouble with 440FX is that you'll need to use pci-bridges to isolate VGA space of each GPU. Otherwise one card would need to be disabled to ensure the VGA accesses go to the other. > > > QEMU is called in both cases via the following. I just replace the > > > '-drive' accordingly. > > > > > > /usr/bin/taskset -c 0,1,2,3 /usr/bin/qemu-system-x86_64 \ > > > -machine q35,accel=kvm \ > > > -enable-kvm \ > > > -nodefaults \ > > > -nographic \ > > > -vga none \ > > > -boot order=nc \ > > > -cpu host \ > > > -smp cores=4,threads=1,sockets=1 \ > > > -m 8192 \ > > > -rtc base=localtime \ > > > -k de \ > > > -drive file=/srv/kvm/linux-drive0.img,id=drive0,if=none,cache=none,aio=threads \ > > > -mon chardev=monitor0 \ > > > -chardev socket,id=monitor0,path=/tmp/linux.monitor,nowait,server \ > > > -netdev tap,id=net0,vhost=on,helper=/usr/lib/qemu/qemu-bridge-helper \ > > > -device virtio-net-pci,netdev=net0,mac=00:00:00:02:01:04 \ > > > -device virtio-blk-pci,drive=drive0,ioeventfd=on \ > > > -device ioh3420,bus=pcie.0,id=pcie0,port=1,chassis=1,multifunction=on \ > > > -device ioh3420,bus=pcie.0,id=pcie1,port=2,chassis=2,multifunction=on \ > > > -device vfio-pci,host=01:00.0,addr=00.0,bus=pcie0,multifunction=on,x-vga=on \ > > > -device vfio-pci,host=01:00.1,addr=00.1,bus=pcie0 \ > > > -device vfio-pci,host=02:00.0,addr=00.0,bus=pcie1,multifunction=on \ > > > -device vfio-pci,host=02:00.1,addr=00.1,bus=pcie1 \ > > > -no-reboot > > > > > > My setup is the following: > > > > > > Kernel: linux-3.13.1 > > > Seabios: seabios-git-rel.1.7.4.r51.g151d034 (5/2/2014) > > > QEMU: qemu-git-2.0.r30666.g31db5b3 (5/2/2014) > > > > > > Below is the 'lspci' output and I'm using the AMD Radeon HD 5430 as device > > > for my local X server: > > > > > > 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (external gfx0 port B) (rev 02) > > > 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD990 I/O Memory Management Unit (IOMMU) > > > 00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port B) > > > 00:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port D) > > > 00:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port H) > > > 00:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (external gfx1 port B) > > > 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40) > > > 00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller > > > 00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller > > > 00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller > > > 00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller > > > 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 42) > > > 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40) > > > 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller (rev 40) > > > 00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge (rev 40) > > > 00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller > > > 00:15.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB700/SB800/SB900 PCI to PCI bridge (PCIE port 0) > > > 00:15.1 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB700/SB800/SB900 PCI to PCI bridge (PCIE port 1) > > > 00:15.2 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB900 PCI to PCI bridge (PCIE port 2) > > > 00:15.3 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB900 PCI to PCI bridge (PCIE port 3) > > > 00:16.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller > > > 00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller > > > 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 0 > > > 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 1 > > > 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 2 > > > 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 3 > > > 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 4 > > > 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 5 > > > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] > > > 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8 > > > 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn XT [Radeon HD 7870 GHz Edition] > > > 02:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] > > > 03:00.0 USB controller: Etron Technology, Inc. EJ168 USB 3.0 Host Controller (rev 01) > > > 04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Park [Mobility Radeon HD 5430] > > > 04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300 Series] > > > 06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) > > > 07:00.0 USB controller: Etron Technology, Inc. EJ168 USB 3.0 Host Controller (rev 01) > > > > > > Another minor issue is that the R9 290X is not reset during shutdown of > > > VM (neither Linux nor Windows) but it can be tricked with doing > > > "suspend-to-ram" between two starts. That's why I use '-no-reboot' option > > > in QEMU. The 7870 is doing the reset properly. > > > > > > Is the NoSoftRst "-" on the 290X vs "+" on the 7870 in lspci -vvv by > > chance? Thanks, > > > > Here are both. It is funny it is opposite as you described. :) Oops, yes. Does this help? I can't figure out why I coded it the way that I did. Probably overly targeting a specific device. Thanks, Alex > root@homer:~# lspci -vvv -s 01:00.0 | grep NoSoftRst > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > > root@homer:~# lspci -vvv -s 02:00.0 | grep NoSoftRst > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > > root@homer:~# lspci -vvv -s 01:00.0 > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller]) > Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00 > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 49 > Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M] > Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M] > Region 4: I/O ports at be00 [size=256] > Region 5: Memory at fdd80000 (32-bit, non-prefetchable) [size=256K] > [virtual] Expansion ROM at d0000000 [disabled] [size=128K] > Capabilities: [48] Vendor Specific Information: Len=08 > Capabilities: [50] Power Management version 3 > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us > ClockPM- Surprise- LLActRep- BwNot- > LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Address: 00000000fee00000 Data: 0000 > Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 > Capabilities: [150 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- > Capabilities: [270 v1] #19 > Capabilities: [2b0 v1] Address Translation Service (ATS) > ATSCap: Invalidate Queue Depth: 00 > ATSCtl: Enable+, Smallest Translation Unit: 00 > Capabilities: [2c0 v1] #13 > Capabilities: [2d0 v1] #1b > Kernel driver in use: vfio-pci > Kernel modules: radeon > > root@homer:~# lspci -vvv -s 02:00.0 > 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn XT [Radeon HD 7870 GHz Edition] (prog-if 00 [VGA controller]) > Subsystem: XFX Pine Group Inc. Device 3251 > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 48 > Region 0: Memory at a0000000 (64-bit, prefetchable) [size=256M] > Region 2: Memory at fda80000 (64-bit, non-prefetchable) [size=256K] > Region 4: I/O ports at ee00 [size=256] > [virtual] Expansion ROM at fda00000 [disabled] [size=128K] > Capabilities: [48] Vendor Specific Information: Len=08 > Capabilities: [50] Power Management version 3 > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us > ClockPM- Surprise- LLActRep- BwNot- > LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Address: 00000000fee00000 Data: 0000 > Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 > Capabilities: [150 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- > Capabilities: [270 v1] #19 > Capabilities: [2b0 v1] Address Translation Service (ATS) > ATSCap: Invalidate Queue Depth: 00 > ATSCtl: Enable+, Smallest Translation Unit: 00 > Capabilities: [2c0 v1] #13 > Capabilities: [2d0 v1] #1b > Kernel driver in use: vfio-pci > Kernel modules: radeon > > > Alex > > > > --Maik --- a/hw/misc/vfio.c +++ b/hw/misc/vfio.c @@ -3136,7 +3136,7 @@ static void vfio_pci_reset_handler(void *opaque) QLIST_FOREACH(group, &group_list, next) { QLIST_FOREACH(vdev, &group->device_list, next) { - if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) { + if (!vdev->reset_works || !vdev->has_flr) { vdev->needs_reset = true; } }