Message ID | 1391635674.15608.13.camel@ul30vt.home |
---|---|
State | New |
Headers | show |
Hi Alex, Alex Williamson <alex.williamson@redhat.com> wrote: > On Wed, 2014-02-05 at 22:10 +0100, Maik Broemme wrote: > > Hi Alex, > > > > Alex Williamson <alex.williamson@redhat.com> wrote: > > > On Wed, 2014-02-05 at 19:59 +0100, Maik Broemme wrote: > > > > Hi, > > > > > > > > currently VFIO with multi GPU passthrough is working partially and > > > > hopefully somebody has a hint about the problem. I'm doing passthrough > > > > of an AMD Radeon R9 290X and AMD Radeon 7870 GHz Edition to a single VM. > > > > > > > > If the VM is running Linux this works quite well with radeon or fglrx > > > > driver. Please see 'dmesg' log attached, when using the radeon driver. > > > > If needed I can also post one with fglrx driver. > > > > > > > > If I do the exact same passthrough to a Windows VM and use latest AMD > > > > Catalyst 14.1 (2/1/2014) or AMD Catalyst 13.12 (12/18/2013) I can get > > > > only the first device working (AMD R9 290X) with 'x-vga=on'. I don't > > > > enable 'x-vga=on' on second device as this should never work. :) > > > > > > Why not? The guest is able to change the VGA enable bit in the emulated > > > bridge registers and access VGA space of each device, just like happens > > > on bare metal. You'll only get one device initialized from seabios, but > > > that's the same as would happen on bare metal as well. > > > > > > > Well it was just my guess as it would behave like most physical boxes > > in this case. :) > > > > > > I see > > > > BIOS boot screen and everything works fine except for the second GPU. > > > > The windows device manager just show me "Code 12" for the second GPU > > > > and its HD Audio device. Code 12 means: "This device cannot find enough > > > > free resources that it can use". > > > > > > I've seen the same using Nvidia GRID GPUs (w/o x-vga=on), but only with > > > the Q35 chipset model, Linux works, Windows reports Code 12. I have no > > > idea why as all the PCI resources appear to be properly sized and > > > mapped. FWIW, 2 GRID GPUs assigned to a guest do work with the 440FX > > > chipset model. Beyond 2 we run out of MMIO resources below 4G and > > > something bad happens. > > > > > > > Interesting. I will try 440FX a bit later and see if this works. What I > > can also do is to post system resource conflicts from Windows, the AMD > > Catalyst Center has it integrated. Maybe this will help? > > If you actually see conflicts, then yes. The Code 12 I've seen I was > never able to identify a conflict. The trouble with 440FX is that > you'll need to use pci-bridges to isolate VGA space of each GPU. > Otherwise one card would need to be disabled to ensure the VGA accesses > go to the other. > Okay I've collected all necessary information (hopefully). Some are in German but if needed I can translate it. Please find it below: - Conflicts: E/A-Port 0x000003C0-0x000003DF AMD Radeon R9 200 Series E/A-Port 0x000003C0-0x000003DF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 IRQ 10 AMD Radeon HD 7800 Series IRQ 10 Intel(R) ICH9 Family SMBus Controller - 2930 Speicheradresse 0xFE800000-0xFE83FFFF AMD Radeon R9 200 Series Speicheradresse 0xFE800000-0xFE83FFFF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 Speicheradresse 0xE0000000-0xEFFFFFFF AMD Radeon HD 7800 Series Speicheradresse 0xE0000000-0xEFFFFFFF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 Speicheradresse 0xA0000-0xBFFFF AMD Radeon R9 200 Series Speicheradresse 0xA0000-0xBFFFF PCI-Bus Speicheradresse 0xA0000-0xBFFFF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 Speicheradresse 0xC0000000-0xCFFFFFFF AMD Radeon R9 200 Series Speicheradresse 0xC0000000-0xCFFFFFFF PCI-Bus Speicheradresse 0xC0000000-0xCFFFFFFF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 E/A-Port 0x000003B0-0x000003BB AMD Radeon R9 200 Series E/A-Port 0x000003B0-0x000003BB Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 Speicheradresse 0xFE600000-0xFE63FFFF AMD Radeon HD 7800 Series Speicheradresse 0xFE600000-0xFE63FFFF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 E/A-Port 0x0000C000-0x0000C0FF AMD Radeon HD 7800 Series E/A-Port 0x0000C000-0x0000C0FF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 E/A-Port 0x0000D000-0x0000D0FF AMD Radeon R9 200 Series E/A-Port 0x0000D000-0x0000D0FF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 - Display devices: Name AMD Radeon R9 200 Series PNP-Gerätekennung PCI\VEN_1002&DEV_67B0&SUBSYS_0B001002&REV_00\4&2122111D&0&0018 Adaptertyp AMD Radeon Graphics Processor (0x67B0), Advanced Micro Devices, Inc.-kompatibel Adapterbeschreibung AMD Radeon R9 200 Series Adapter-RAM (1.048.576) Bytes Installierte Treiber aticfx64.dll,aticfx64.dll,aticfx64.dll,aticfx32,aticfx32,aticfx32,atiumd64.dll,atidxx64.dll,atidxx64.dll,atiumdag,atidxx32,atidxx32,atiumdva,atiumd6a.cap,atitmm64.dll Treiberversion 13.350.1005.0 INF-Datei oem7.inf (Abschnitt ati2mtag_Hawaii) Farbebenen Nicht verfügbar Farbtabelleneinträge 4294967296 Auflösung 1920 x 1080 x 60 Hz Bits/Pixel 32 Speicheradresse 0xC0000000-0xCFFFFFFF Speicheradresse 0xD0000000-0xD07FFFFF E/A-Port 0x0000D000-0x0000D0FF Speicheradresse 0xFE800000-0xFE83FFFF IRQ-Kanal IRQ 4294967287 E/A-Port 0x000003B0-0x000003BB E/A-Port 0x000003C0-0x000003DF Speicheradresse 0xA0000-0xBFFFF Treiber c:\windows\system32\drivers\atikmpag.sys (8.14.1.6367, 622,00 KB (636.928 Bytes), 31.01.2014 20:28) Name AMD Radeon HD 7800 Series PNP-Gerätekennung PCI\VEN_1002&DEV_6818&SUBSYS_32511682&REV_00\4&49049C7&0&0820 Adaptertyp Nicht verfügbar, Advanced Micro Devices, Inc.-kompatibel Adapterbeschreibung AMD Radeon HD 7800 Series Adapter-RAM Nicht verfügbar Installierte Treiber aticfx64.dll,aticfx64.dll,aticfx64.dll,aticfx32,aticfx32,aticfx32,atiumd64.dll,atidxx64.dll,atidxx64.dll,atiumdag,atidxx32,atidxx32,atiumdva,atiumd6a.cap,atitmm64.dll Treiberversion 13.350.1005.0 INF-Datei oem7.inf (Abschnitt ati2mtag_R575B) Farbebenen Nicht verfügbar Farbtabelleneinträge Nicht verfügbar Auflösung Nicht verfügbar Bits/Pixel Nicht verfügbar Speicheradresse 0xE0000000-0xEFFFFFFF Speicheradresse 0xFE600000-0xFE63FFFF E/A-Port 0x0000C000-0x0000C0FF IRQ-Kanal IRQ 10 Treiber c:\windows\system32\drivers\atikmpag.sys (8.14.1.6367, 622,00 KB (636.928 Bytes), 31.01.2014 20:28) - I/O: 0x00000000-0x00000CD7 PCI-Bus OK 0x00000060-0x00000060 Standardtastatur (PS/2) OK 0x00000064-0x00000064 Standardtastatur (PS/2) OK 0x00000070-0x00000071 System CMOS/Echtzeituhr OK 0x00000072-0x00000077 System CMOS/Echtzeituhr OK 0x000002F8-0x000002FF Kommunikationsanschluss (COM2) OK 0x00000378-0x0000037F Druckeranschluss (LPT1) OK 0x000003B0-0x000003BB AMD Radeon R9 200 Series OK 0x000003B0-0x000003BB Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 OK 0x000003C0-0x000003DF AMD Radeon R9 200 Series OK 0x000003C0-0x000003DF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 OK 0x000003F2-0x000003F5 Standard-Diskettenlaufwerkcontroller OK 0x000003F7-0x000003F7 Standard-Diskettenlaufwerkcontroller OK 0x000003F8-0x000003FF Kommunikationsanschluss (COM1) OK 0x00000CD8-0x00000CF7 ACPI-Modulgerät OK 0x00000D00-0x0000FFFF PCI-Bus OK 0x0000B100-0x0000B13F Intel(R) ICH9 Family SMBus Controller - 2930 OK 0x0000C000-0x0000C0FF AMD Radeon HD 7800 Series OK 0x0000C000-0x0000C0FF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 OK 0x0000D000-0x0000D0FF AMD Radeon R9 200 Series OK 0x0000D000-0x0000D0FF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 OK 0x0000E000-0x0000E03F Red Hat VirtIO SCSI controller OK 0x0000E080-0x0000E09F Red Hat VirtIO Ethernet Adapter OK 0x0000E0A0-0x0000E0BF Standard AHCI 1.0 Serieller-ATA-Controller OK - Memory: 0xC0000000-0xCFFFFFFF AMD Radeon R9 200 Series OK 0xC0000000-0xCFFFFFFF PCI-Bus OK 0xC0000000-0xCFFFFFFF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 OK 0xD0000000-0xD07FFFFF AMD Radeon R9 200 Series OK 0xFE800000-0xFE83FFFF AMD Radeon R9 200 Series OK 0xFE800000-0xFE83FFFF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 OK 0xFEA40000-0xFEA40FFF Red Hat VirtIO Ethernet Adapter OK 0xFED00000-0xFED003FF Hochpräzisionsereigniszeitgeber OK 0xFEA41000-0xFEA41FFF Red Hat VirtIO SCSI controller OK 0xE0000000-0xEFFFFFFF AMD Radeon HD 7800 Series OK 0xE0000000-0xEFFFFFFF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 OK 0xFE600000-0xFE63FFFF AMD Radeon HD 7800 Series OK 0xFE600000-0xFE63FFFF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 OK 0xFEA42000-0xFEA42FFF Standard AHCI 1.0 Serieller-ATA-Controller OK 0xFE860000-0xFE863FFF High Definition Audio-Controller OK 0xA0000-0xBFFFF AMD Radeon R9 200 Series OK 0xA0000-0xBFFFF PCI-Bus OK 0xA0000-0xBFFFF Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 OK - IRQ: IRQ 1 Standardtastatur (PS/2) OK IRQ 3 Kommunikationsanschluss (COM2) OK IRQ 4 Kommunikationsanschluss (COM1) OK IRQ 6 Standard-Diskettenlaufwerkcontroller OK IRQ 8 System CMOS/Echtzeituhr OK IRQ 10 AMD Radeon HD 7800 Series OK IRQ 10 Intel(R) ICH9 Family SMBus Controller - 2930 OK IRQ 12 PS/2-kompatible Maus OK IRQ 16 Standard AHCI 1.0 Serieller-ATA-Controller OK IRQ 20 High Definition Audio-Controller OK IRQ 81 Microsoft ACPI-konformes System OK IRQ 82 Microsoft ACPI-konformes System OK IRQ 83 Microsoft ACPI-konformes System OK IRQ 84 Microsoft ACPI-konformes System OK IRQ 85 Microsoft ACPI-konformes System OK IRQ 86 Microsoft ACPI-konformes System OK IRQ 87 Microsoft ACPI-konformes System OK IRQ 88 Microsoft ACPI-konformes System OK IRQ 89 Microsoft ACPI-konformes System OK IRQ 90 Microsoft ACPI-konformes System OK IRQ 91 Microsoft ACPI-konformes System OK IRQ 92 Microsoft ACPI-konformes System OK IRQ 93 Microsoft ACPI-konformes System OK IRQ 94 Microsoft ACPI-konformes System OK IRQ 95 Microsoft ACPI-konformes System OK IRQ 96 Microsoft ACPI-konformes System OK IRQ 97 Microsoft ACPI-konformes System OK IRQ 98 Microsoft ACPI-konformes System OK IRQ 99 Microsoft ACPI-konformes System OK IRQ 100 Microsoft ACPI-konformes System OK IRQ 101 Microsoft ACPI-konformes System OK IRQ 102 Microsoft ACPI-konformes System OK IRQ 103 Microsoft ACPI-konformes System OK IRQ 104 Microsoft ACPI-konformes System OK IRQ 105 Microsoft ACPI-konformes System OK IRQ 106 Microsoft ACPI-konformes System OK IRQ 107 Microsoft ACPI-konformes System OK IRQ 108 Microsoft ACPI-konformes System OK IRQ 109 Microsoft ACPI-konformes System OK IRQ 110 Microsoft ACPI-konformes System OK IRQ 111 Microsoft ACPI-konformes System OK IRQ 112 Microsoft ACPI-konformes System OK IRQ 113 Microsoft ACPI-konformes System OK IRQ 114 Microsoft ACPI-konformes System OK IRQ 115 Microsoft ACPI-konformes System OK IRQ 116 Microsoft ACPI-konformes System OK IRQ 117 Microsoft ACPI-konformes System OK IRQ 118 Microsoft ACPI-konformes System OK IRQ 119 Microsoft ACPI-konformes System OK IRQ 120 Microsoft ACPI-konformes System OK IRQ 121 Microsoft ACPI-konformes System OK IRQ 122 Microsoft ACPI-konformes System OK IRQ 123 Microsoft ACPI-konformes System OK IRQ 124 Microsoft ACPI-konformes System OK IRQ 125 Microsoft ACPI-konformes System OK IRQ 126 Microsoft ACPI-konformes System OK IRQ 127 Microsoft ACPI-konformes System OK IRQ 128 Microsoft ACPI-konformes System OK IRQ 129 Microsoft ACPI-konformes System OK IRQ 130 Microsoft ACPI-konformes System OK IRQ 131 Microsoft ACPI-konformes System OK IRQ 132 Microsoft ACPI-konformes System OK IRQ 133 Microsoft ACPI-konformes System OK IRQ 134 Microsoft ACPI-konformes System OK IRQ 135 Microsoft ACPI-konformes System OK IRQ 136 Microsoft ACPI-konformes System OK IRQ 137 Microsoft ACPI-konformes System OK IRQ 138 Microsoft ACPI-konformes System OK IRQ 139 Microsoft ACPI-konformes System OK IRQ 140 Microsoft ACPI-konformes System OK IRQ 141 Microsoft ACPI-konformes System OK IRQ 142 Microsoft ACPI-konformes System OK IRQ 143 Microsoft ACPI-konformes System OK IRQ 144 Microsoft ACPI-konformes System OK IRQ 145 Microsoft ACPI-konformes System OK IRQ 146 Microsoft ACPI-konformes System OK IRQ 147 Microsoft ACPI-konformes System OK IRQ 148 Microsoft ACPI-konformes System OK IRQ 149 Microsoft ACPI-konformes System OK IRQ 150 Microsoft ACPI-konformes System OK IRQ 151 Microsoft ACPI-konformes System OK IRQ 152 Microsoft ACPI-konformes System OK IRQ 153 Microsoft ACPI-konformes System OK IRQ 154 Microsoft ACPI-konformes System OK IRQ 155 Microsoft ACPI-konformes System OK IRQ 156 Microsoft ACPI-konformes System OK IRQ 157 Microsoft ACPI-konformes System OK IRQ 158 Microsoft ACPI-konformes System OK IRQ 159 Microsoft ACPI-konformes System OK IRQ 160 Microsoft ACPI-konformes System OK IRQ 161 Microsoft ACPI-konformes System OK IRQ 162 Microsoft ACPI-konformes System OK IRQ 163 Microsoft ACPI-konformes System OK IRQ 164 Microsoft ACPI-konformes System OK IRQ 165 Microsoft ACPI-konformes System OK IRQ 166 Microsoft ACPI-konformes System OK IRQ 167 Microsoft ACPI-konformes System OK IRQ 168 Microsoft ACPI-konformes System OK IRQ 169 Microsoft ACPI-konformes System OK IRQ 170 Microsoft ACPI-konformes System OK IRQ 171 Microsoft ACPI-konformes System OK IRQ 172 Microsoft ACPI-konformes System OK IRQ 173 Microsoft ACPI-konformes System OK IRQ 174 Microsoft ACPI-konformes System OK IRQ 175 Microsoft ACPI-konformes System OK IRQ 176 Microsoft ACPI-konformes System OK IRQ 177 Microsoft ACPI-konformes System OK IRQ 178 Microsoft ACPI-konformes System OK IRQ 179 Microsoft ACPI-konformes System OK IRQ 180 Microsoft ACPI-konformes System OK IRQ 181 Microsoft ACPI-konformes System OK IRQ 182 Microsoft ACPI-konformes System OK IRQ 183 Microsoft ACPI-konformes System OK IRQ 184 Microsoft ACPI-konformes System OK IRQ 185 Microsoft ACPI-konformes System OK IRQ 186 Microsoft ACPI-konformes System OK IRQ 187 Microsoft ACPI-konformes System OK IRQ 188 Microsoft ACPI-konformes System OK IRQ 189 Microsoft ACPI-konformes System OK IRQ 190 Microsoft ACPI-konformes System OK IRQ 4294967287 AMD Radeon R9 200 Series OK IRQ 4294967288 Red Hat VirtIO Ethernet Adapter OK IRQ 4294967289 Red Hat VirtIO Ethernet Adapter OK IRQ 4294967290 Red Hat VirtIO Ethernet Adapter OK IRQ 4294967291 Red Hat VirtIO SCSI controller OK IRQ 4294967292 Red Hat VirtIO SCSI controller OK IRQ 4294967293 Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 OK IRQ 4294967294 Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420 OK I'm no expert but it looks like Windows never enabled MSI for the second card as qemu shell with 'info pci' show me both with IRQ 10. I'll hope it helps. > > > > QEMU is called in both cases via the following. I just replace the > > > > '-drive' accordingly. > > > > > > > > /usr/bin/taskset -c 0,1,2,3 /usr/bin/qemu-system-x86_64 \ > > > > -machine q35,accel=kvm \ > > > > -enable-kvm \ > > > > -nodefaults \ > > > > -nographic \ > > > > -vga none \ > > > > -boot order=nc \ > > > > -cpu host \ > > > > -smp cores=4,threads=1,sockets=1 \ > > > > -m 8192 \ > > > > -rtc base=localtime \ > > > > -k de \ > > > > -drive file=/srv/kvm/linux-drive0.img,id=drive0,if=none,cache=none,aio=threads \ > > > > -mon chardev=monitor0 \ > > > > -chardev socket,id=monitor0,path=/tmp/linux.monitor,nowait,server \ > > > > -netdev tap,id=net0,vhost=on,helper=/usr/lib/qemu/qemu-bridge-helper \ > > > > -device virtio-net-pci,netdev=net0,mac=00:00:00:02:01:04 \ > > > > -device virtio-blk-pci,drive=drive0,ioeventfd=on \ > > > > -device ioh3420,bus=pcie.0,id=pcie0,port=1,chassis=1,multifunction=on \ > > > > -device ioh3420,bus=pcie.0,id=pcie1,port=2,chassis=2,multifunction=on \ > > > > -device vfio-pci,host=01:00.0,addr=00.0,bus=pcie0,multifunction=on,x-vga=on \ > > > > -device vfio-pci,host=01:00.1,addr=00.1,bus=pcie0 \ > > > > -device vfio-pci,host=02:00.0,addr=00.0,bus=pcie1,multifunction=on \ > > > > -device vfio-pci,host=02:00.1,addr=00.1,bus=pcie1 \ > > > > -no-reboot > > > > > > > > My setup is the following: > > > > > > > > Kernel: linux-3.13.1 > > > > Seabios: seabios-git-rel.1.7.4.r51.g151d034 (5/2/2014) > > > > QEMU: qemu-git-2.0.r30666.g31db5b3 (5/2/2014) > > > > > > > > Below is the 'lspci' output and I'm using the AMD Radeon HD 5430 as device > > > > for my local X server: > > > > > > > > 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (external gfx0 port B) (rev 02) > > > > 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD990 I/O Memory Management Unit (IOMMU) > > > > 00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port B) > > > > 00:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port D) > > > > 00:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port H) > > > > 00:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (external gfx1 port B) > > > > 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40) > > > > 00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller > > > > 00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller > > > > 00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller > > > > 00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller > > > > 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 42) > > > > 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40) > > > > 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller (rev 40) > > > > 00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge (rev 40) > > > > 00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller > > > > 00:15.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB700/SB800/SB900 PCI to PCI bridge (PCIE port 0) > > > > 00:15.1 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB700/SB800/SB900 PCI to PCI bridge (PCIE port 1) > > > > 00:15.2 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB900 PCI to PCI bridge (PCIE port 2) > > > > 00:15.3 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB900 PCI to PCI bridge (PCIE port 3) > > > > 00:16.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller > > > > 00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller > > > > 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 0 > > > > 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 1 > > > > 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 2 > > > > 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 3 > > > > 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 4 > > > > 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 5 > > > > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] > > > > 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8 > > > > 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn XT [Radeon HD 7870 GHz Edition] > > > > 02:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] > > > > 03:00.0 USB controller: Etron Technology, Inc. EJ168 USB 3.0 Host Controller (rev 01) > > > > 04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Park [Mobility Radeon HD 5430] > > > > 04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300 Series] > > > > 06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) > > > > 07:00.0 USB controller: Etron Technology, Inc. EJ168 USB 3.0 Host Controller (rev 01) > > > > > > > > Another minor issue is that the R9 290X is not reset during shutdown of > > > > VM (neither Linux nor Windows) but it can be tricked with doing > > > > "suspend-to-ram" between two starts. That's why I use '-no-reboot' option > > > > in QEMU. The 7870 is doing the reset properly. > > > > > > > > > Is the NoSoftRst "-" on the 290X vs "+" on the 7870 in lspci -vvv by > > > chance? Thanks, > > > > > > > Here are both. It is funny it is opposite as you described. :) > > > Oops, yes. Does this help? > > --- a/hw/misc/vfio.c > +++ b/hw/misc/vfio.c > @@ -3136,7 +3136,7 @@ static void vfio_pci_reset_handler(void *opaque) > > QLIST_FOREACH(group, &group_list, next) { > QLIST_FOREACH(vdev, &group->device_list, next) { > - if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) { > + if (!vdev->reset_works || !vdev->has_flr) { > vdev->needs_reset = true; > } > } > > I can't figure out why I coded it the way that I did. Probably overly > targeting a specific device. Thanks, > This patch works absolutely fine. After applying it to my 'qemu-git', the device resets works flawlessly. So it would be great to push it upstream as it looks good. > Alex > > > root@homer:~# lspci -vvv -s 01:00.0 | grep NoSoftRst > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > > > > root@homer:~# lspci -vvv -s 02:00.0 | grep NoSoftRst > > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > > > > root@homer:~# lspci -vvv -s 01:00.0 > > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller]) > > Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00 > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > > Latency: 0, Cache Line Size: 64 bytes > > Interrupt: pin A routed to IRQ 49 > > Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M] > > Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M] > > Region 4: I/O ports at be00 [size=256] > > Region 5: Memory at fdd80000 (32-bit, non-prefetchable) [size=256K] > > [virtual] Expansion ROM at d0000000 [disabled] [size=128K] > > Capabilities: [48] Vendor Specific Information: Len=08 <?> > > Capabilities: [50] Power Management version 3 > > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > > Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 > > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited > > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ > > MaxPayload 128 bytes, MaxReadReq 512 bytes > > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > > LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us > > ClockPM- Surprise- LLActRep- BwNot- > > LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+ > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > > DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported > > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled > > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- > > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > > Compliance De-emphasis: -6dB > > LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- > > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > > Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > > Address: 00000000fee00000 Data: 0000 > > Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> > > Capabilities: [150 v2] Advanced Error Reporting > > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- > > Capabilities: [270 v1] #19 > > Capabilities: [2b0 v1] Address Translation Service (ATS) > > ATSCap: Invalidate Queue Depth: 00 > > ATSCtl: Enable+, Smallest Translation Unit: 00 > > Capabilities: [2c0 v1] #13 > > Capabilities: [2d0 v1] #1b > > Kernel driver in use: vfio-pci > > Kernel modules: radeon > > > > root@homer:~# lspci -vvv -s 02:00.0 > > 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn XT [Radeon HD 7870 GHz Edition] (prog-if 00 [VGA controller]) > > Subsystem: XFX Pine Group Inc. Device 3251 > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > > Latency: 0, Cache Line Size: 64 bytes > > Interrupt: pin A routed to IRQ 48 > > Region 0: Memory at a0000000 (64-bit, prefetchable) [size=256M] > > Region 2: Memory at fda80000 (64-bit, non-prefetchable) [size=256K] > > Region 4: I/O ports at ee00 [size=256] > > [virtual] Expansion ROM at fda00000 [disabled] [size=128K] > > Capabilities: [48] Vendor Specific Information: Len=08 <?> > > Capabilities: [50] Power Management version 3 > > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) > > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > > Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 > > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited > > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ > > MaxPayload 128 bytes, MaxReadReq 512 bytes > > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > > LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us > > ClockPM- Surprise- LLActRep- BwNot- > > LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+ > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > > DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported > > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled > > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- > > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > > Compliance De-emphasis: -6dB > > LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- > > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > > Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > > Address: 00000000fee00000 Data: 0000 > > Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> > > Capabilities: [150 v2] Advanced Error Reporting > > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- > > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- > > Capabilities: [270 v1] #19 > > Capabilities: [2b0 v1] Address Translation Service (ATS) > > ATSCap: Invalidate Queue Depth: 00 > > ATSCtl: Enable+, Smallest Translation Unit: 00 > > Capabilities: [2c0 v1] #13 > > Capabilities: [2d0 v1] #1b > > Kernel driver in use: vfio-pci > > Kernel modules: radeon > > > > > Alex > > > > > > > --Maik > > > --Maik
Hi Alex, Maik Broemme <mbroemme@parallels.com> wrote: > > > > > Another minor issue is that the R9 290X is not reset during shutdown of > > > > > VM (neither Linux nor Windows) but it can be tricked with doing > > > > > "suspend-to-ram" between two starts. That's why I use '-no-reboot' option > > > > > in QEMU. The 7870 is doing the reset properly. > > > > > > > > > > > > Is the NoSoftRst "-" on the 290X vs "+" on the 7870 in lspci -vvv by > > > > chance? Thanks, > > > > > > > > > > Here are both. It is funny it is opposite as you described. :) > > > > > > Oops, yes. Does this help? > > > > --- a/hw/misc/vfio.c > > +++ b/hw/misc/vfio.c > > @@ -3136,7 +3136,7 @@ static void vfio_pci_reset_handler(void *opaque) > > > > QLIST_FOREACH(group, &group_list, next) { > > QLIST_FOREACH(vdev, &group->device_list, next) { > > - if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) { > > + if (!vdev->reset_works || !vdev->has_flr) { > > vdev->needs_reset = true; > > } > > } > > > > I can't figure out why I coded it the way that I did. Probably overly > > targeting a specific device. Thanks, > > > > This patch works absolutely fine. After applying it to my 'qemu-git', the > device resets works flawlessly. So it would be great to push it upstream > as it looks good. > Okay sorry. I was too fast here. It was just working first time but now even after clean reboot it no longer works as expected but behavior is very strange. Windows: 1st boot works fine - boot VGA and Windows ATI driver loaded, issue reboot and qemu stopped due to '-no-reboot'. 2nd boot works partially - boot VGA and Windows ATI driver loaded but black screen and my system becames terrible slow and mostly unresponsive. My dmesg shows immediately after ATI driver will enable the device the following: [ 159.984324] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x270 [ 159.984340] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x1b@0x2d0 [ 160.129036] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x19@0x270 [ 160.129049] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x1b@0x2d0 [ 172.977677] kvm: zapping shadow pages for mmio generation wraparound [ 173.160174] br0: port 2(tap0) entered forwarding state [ 175.902967] vfio-pci 0000:01:00.0: irq 46 for MSI/MSI-X [ 188.340430] Clocksource tsc unstable (delta = -119654611 ns) [ 188.340511] Switched to clocksource hpet [ 191.088693] hpet1: lost 12 rtc interrupts [ 191.926555] hpet1: lost 25 rtc interrupts So your patch fixed indeed reset issue of boot VGA but something else is wrong now. :) Linux (fglrx): 1st boot works fine - boot VGA, fglrx loads fine and X could be started, issue reboot via SSH and qemu stopped due to '-no-reboot'. 2nd boot works partially - boot VGA, fglrx loads fine but X couldn't be started and fails with: [ 34.265111] fglrx_pci 0000:02:00.0: irq 50 for MSI/MSI-X [ 34.344313] <6>[fglrx] Firegl kernel thread PID: 318 [ 34.344400] <6>[fglrx] Firegl kernel thread PID: 319 [ 34.344478] <6>[fglrx] Firegl kernel thread PID: 320 [ 34.344589] <6>[fglrx] IRQ 50 Enabled [ 34.356105] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 [ 34.356107] <6>[fglrx] Reserved FB block: Unshared offset:fac3000, size:3000 [ 34.356109] <6>[fglrx] Reserved FB block: Unshared offset:fac6000, size:23a000 [ 34.356110] <6>[fglrx] Reserved FB block: Unshared offset:7fff4000, size:c000 [ 34.386436] fglrx_pci 0000:01:00.0: irq 51 for MSI/MSI-X [ 34.490902] <6>[fglrx] Firegl kernel thread PID: 321 [ 34.490994] <6>[fglrx] Firegl kernel thread PID: 322 [ 34.491069] <6>[fglrx] Firegl kernel thread PID: 323 [ 34.491166] <6>[fglrx] IRQ 51 Enabled [ 34.505271] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 [ 34.505273] <6>[fglrx] Reserved FB block: Unshared offset:f9c3000, size:3000 [ 34.505274] <6>[fglrx] Reserved FB block: Unshared offset:f9c6000, size:23a000 [ 34.505276] <6>[fglrx] Reserved FB block: Unshared offset:fc00000, size:100000 [ 34.505277] <6>[fglrx] Reserved FB block: Unshared offset:fff8000, size:8000 [ 34.505278] <6>[fglrx] Reserved FB block: Unshared offset:ffff4000, size:c000 [ 34.526198] BUG: unable to handle kernel paging request at ffff880c724e8008 [ 34.526203] IP: [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx] [ 34.526277] PGD 1b3e067 PUD 0 [ 34.526279] Oops: 0002 [#1] PREEMPT SMP [ 34.526282] Modules linked in: mousedev crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_intel microcode snd_hda_codec serio_raw psmouse parport_pc snd_hwdep snd_pcm parport snd_page_alloc processor snd_timer snd soundcore i2c_i801 intel_agp lpc_ich pcspkr intel_gtt i2c_core shpchp evdev fglrx(PO) amd_iommu_v2 button ext4 crc16 mbcache jbd2 atkbd libps2 virtio_blk virtio_net ahci libahci libata scsi_mod i8042 floppy serio virtio_pci virtio_ring virtio [ 34.526307] CPU: 1 PID: 316 Comm: X Tainted: P O 3.13.1-2-ARCH #1 [ 34.526309] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011 [ 34.526311] task: ffff8800776e2d00 ti: ffff880037a28000 task.ti: ffff880037a28000 [ 34.526312] RIP: 0010:[<ffffffffa0399af6>] [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx] [ 34.526353] RSP: 0018:ffff880037a29810 EFLAGS: 00010296 [ 34.526354] RAX: 0000000000000001 RBX: ffff8800724e800c RCX: 0000000000000006 [ 34.526356] RDX: 0000000000000003 RSI: 0000000000000002 RDI: ffff8800724e8264 [ 34.526357] RBP: ffff88007b19a00c R08: 00000000000186a0 R09: 000000000001e848 [ 34.526358] R10: 00000002fffffffd R11: 00000000ffffffff R12: 0000000000000001 [ 34.526359] R13: ffff88007b19a00c R14: 0000000000000000 R15: ffff880037a298b0 [ 34.526363] FS: 00007f0ba649b880(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 34.526365] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 34.526366] CR2: ffff880c724e8008 CR3: 0000000037998000 CR4: 00000000000406e0 [ 34.526372] Stack: [ 34.526373] ffff88007b19a2f4 ffff88007bffcd1c 0000000000000001 ffffffffa0322cf0 [ 34.526375] 0000000000000000 0000000000000000 0000000000000000 ffff880077ed2c08 [ 34.526378] 0000000000000000 ffff880077ed2c08 ffff880037a298a0 ffffffffa0327f14 [ 34.526380] Call Trace: [ 34.526435] [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220 [fglrx] [ 34.526490] [<ffffffffa0327f14>] ? PECI_NotifyDALPreAdapterClockChange+0x144/0x160 [fglrx] [ 34.526546] [<ffffffffa031e321>] ? PHM_SetPowerState+0x31/0xc0 [fglrx] [ 34.526597] [<ffffffffa0340a5b>] ? PSM_ApplyHardwareAttributes_Dynamic+0x9b/0xf0 [fglrx] [ 34.526651] [<ffffffffa033fde9>] ? PSM_AdjustPowerState_Dynamic+0x169/0x540 [fglrx] [ 34.526668] [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220 [fglrx] [ 34.526668] [<ffffffffa0342ee4>] ? PEM_ExcuteEventChain+0x64/0xe0 [fglrx] [ 34.526668] [<ffffffffa0341302>] ? PEM_HandleEvent+0x92/0xd0 [fglrx] [ 34.526668] [<ffffffffa03357c0>] ? PEM_CWDDEPM_NotifyEvent+0xe0/0x4d0 [fglrx] [ 34.526668] [<ffffffffa0333869>] ? PP_Cwdde+0x109/0x180 [fglrx] [ 34.526668] [<ffffffffa02091dc>] ? firegl_pplib_cwddepm+0xbc/0x130 [fglrx] [ 34.526668] [<ffffffffa02092d9>] ? firegl_pplib_notify_event+0x89/0xd0 [fglrx] [ 34.526668] [<ffffffffa020292f>] ? hal_init_gpu+0x2bf/0x480 [fglrx] [ 34.526668] [<ffffffffa01dcc7b>] ? firegl_open+0x2db/0x310 [fglrx] [ 34.526668] [<ffffffffa01cb287>] ? ip_firegl_open+0x17/0x20 [fglrx] [ 34.526668] [<ffffffffa01ccac8>] ? firegl_stub_open+0x98/0x100 [fglrx] [ 34.526668] [<ffffffff811a82bf>] ? chrdev_open+0x9f/0x1d0 [ 34.526668] [<ffffffff811a1967>] ? do_dentry_open+0x1b7/0x2c0 [ 34.526668] [<ffffffff811aed41>] ? __inode_permission+0x41/0xb0 [ 34.526668] [<ffffffff811a8220>] ? cdev_put+0x30/0x30 [ 34.526668] [<ffffffff811a1d91>] ? finish_open+0x31/0x40 [ 34.526668] [<ffffffff811b1b72>] ? do_last+0x572/0xe90 [ 34.526668] [<ffffffff811af036>] ? link_path_walk+0x236/0x8d0 [ 34.526668] [<ffffffff811b254b>] ? path_openat+0xbb/0x6b0 [ 34.526668] [<ffffffff811b3c6a>] ? do_filp_open+0x3a/0x90 [ 34.526668] [<ffffffff811c0567>] ? __alloc_fd+0xa7/0x130 [ 34.526668] [<ffffffff811a2f49>] ? do_sys_open+0x129/0x220 [ 34.526668] [<ffffffff811a305e>] ? SyS_open+0x1e/0x20 [ 34.526668] [<ffffffff8152136d>] ? system_call_fastpath+0x1a/0x1f [ 34.526668] Code: 8b 4a 1c 8b 93 e0 18 00 00 48 8d bb 58 02 00 00 85 d2 0f 84 63 02 00 00 f6 c2 01 0f 84 20 01 00 00 44 8b 1b 41 ff cb 4f 8d 14 5b <46> 89 44 93 08 8b 95 3c 02 00 00 48 89 d0 48 c1 e8 07 a8 01 75 [ 34.526668] RIP [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx] [ 34.526668] RSP <ffff880037a29810> [ 34.526668] CR2: ffff880c724e8008 [ 34.526668] ---[ end trace 5431e6dcf1c31dea ]--- [ 69.317528] type=1006 audit(1391649552.046:4): pid=324 uid=0 old auid=4294967295 new auid=0 old ses=4294967295 new ses=3 res=1 I know it is the binary driver but I would also retry with radeon one but I believe there will be a similar crash. In my first try I just rebooted the Linux VM several times without starting X. I got it one time working without getting 'Clocksource tsc unstable' but now I'm unable to repeat it. So I believe something more is needed. > > Alex > > > > --Maik > --Maik
On Thu, 2014-02-06 at 01:25 +0100, Maik Broemme wrote: > Hi Alex, > > Maik Broemme <mbroemme@parallels.com> wrote: > > > > > > Another minor issue is that the R9 290X is not reset during shutdown of > > > > > > VM (neither Linux nor Windows) but it can be tricked with doing > > > > > > "suspend-to-ram" between two starts. That's why I use '-no-reboot' option > > > > > > in QEMU. The 7870 is doing the reset properly. > > > > > > > > > > > > > > > Is the NoSoftRst "-" on the 290X vs "+" on the 7870 in lspci -vvv by > > > > > chance? Thanks, > > > > > > > > > > > > > Here are both. It is funny it is opposite as you described. :) > > > > > > > > > Oops, yes. Does this help? > > > > > > --- a/hw/misc/vfio.c > > > +++ b/hw/misc/vfio.c > > > @@ -3136,7 +3136,7 @@ static void vfio_pci_reset_handler(void *opaque) > > > > > > QLIST_FOREACH(group, &group_list, next) { > > > QLIST_FOREACH(vdev, &group->device_list, next) { > > > - if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) { > > > + if (!vdev->reset_works || !vdev->has_flr) { > > > vdev->needs_reset = true; > > > } > > > } > > > > > > I can't figure out why I coded it the way that I did. Probably overly > > > targeting a specific device. Thanks, > > > > > > > This patch works absolutely fine. After applying it to my 'qemu-git', the > > device resets works flawlessly. So it would be great to push it upstream > > as it looks good. > > > > Okay sorry. I was too fast here. It was just working first time but now > even after clean reboot it no longer works as expected but behavior > is very strange. > > Windows: > > 1st boot works fine - boot VGA and Windows ATI driver loaded, issue > reboot and qemu stopped due to '-no-reboot'. > > 2nd boot works partially - boot VGA and Windows ATI driver loaded but > black screen and my system becames terrible slow and mostly > unresponsive. My dmesg shows immediately after ATI driver will > enable the device the following: > > [ 159.984324] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x270 > [ 159.984340] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x1b@0x2d0 > [ 160.129036] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x19@0x270 > [ 160.129049] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x1b@0x2d0 > [ 172.977677] kvm: zapping shadow pages for mmio generation wraparound > [ 173.160174] br0: port 2(tap0) entered forwarding state > [ 175.902967] vfio-pci 0000:01:00.0: irq 46 for MSI/MSI-X > [ 188.340430] Clocksource tsc unstable (delta = -119654611 ns) > [ 188.340511] Switched to clocksource hpet > [ 191.088693] hpet1: lost 12 rtc interrupts > [ 191.926555] hpet1: lost 25 rtc interrupts > > So your patch fixed indeed reset issue of boot VGA but something else > is wrong now. :) Can you try the cards separately? If you run lspci on the device in the host, does it report as normal? Often when the host gets slow and we get these sorts of clock issues it means the bus is fatal and we get timeouts trying to read from it. > Linux (fglrx): > > 1st boot works fine - boot VGA, fglrx loads fine and X could be > started, issue reboot via SSH and qemu stopped due to > '-no-reboot'. > > 2nd boot works partially - boot VGA, fglrx loads fine but X couldn't > be started and fails with: > > [ 34.265111] fglrx_pci 0000:02:00.0: irq 50 for MSI/MSI-X > [ 34.344313] <6>[fglrx] Firegl kernel thread PID: 318 > [ 34.344400] <6>[fglrx] Firegl kernel thread PID: 319 > [ 34.344478] <6>[fglrx] Firegl kernel thread PID: 320 > [ 34.344589] <6>[fglrx] IRQ 50 Enabled > [ 34.356105] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 > [ 34.356107] <6>[fglrx] Reserved FB block: Unshared offset:fac3000, size:3000 > [ 34.356109] <6>[fglrx] Reserved FB block: Unshared offset:fac6000, size:23a000 > [ 34.356110] <6>[fglrx] Reserved FB block: Unshared offset:7fff4000, size:c000 > [ 34.386436] fglrx_pci 0000:01:00.0: irq 51 for MSI/MSI-X > [ 34.490902] <6>[fglrx] Firegl kernel thread PID: 321 > [ 34.490994] <6>[fglrx] Firegl kernel thread PID: 322 > [ 34.491069] <6>[fglrx] Firegl kernel thread PID: 323 > [ 34.491166] <6>[fglrx] IRQ 51 Enabled > [ 34.505271] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 > [ 34.505273] <6>[fglrx] Reserved FB block: Unshared offset:f9c3000, size:3000 > [ 34.505274] <6>[fglrx] Reserved FB block: Unshared offset:f9c6000, size:23a000 > [ 34.505276] <6>[fglrx] Reserved FB block: Unshared offset:fc00000, size:100000 > [ 34.505277] <6>[fglrx] Reserved FB block: Unshared offset:fff8000, size:8000 > [ 34.505278] <6>[fglrx] Reserved FB block: Unshared offset:ffff4000, size:c000 > [ 34.526198] BUG: unable to handle kernel paging request at ffff880c724e8008 > [ 34.526203] IP: [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx] > [ 34.526277] PGD 1b3e067 PUD 0 > [ 34.526279] Oops: 0002 [#1] PREEMPT SMP > [ 34.526282] Modules linked in: mousedev crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_intel microcode snd_hda_codec serio_raw psmouse parport_pc snd_hwdep snd_pcm parport snd_page_alloc processor snd_timer snd soundcore i2c_i801 intel_agp lpc_ich pcspkr intel_gtt i2c_core shpchp evdev fglrx(PO) amd_iommu_v2 button ext4 crc16 mbcache jbd2 atkbd libps2 virtio_blk virtio_net ahci libahci libata scsi_mod i8042 floppy serio virtio_pci virtio_ring virtio > [ 34.526307] CPU: 1 PID: 316 Comm: X Tainted: P O 3.13.1-2-ARCH #1 > [ 34.526309] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011 > [ 34.526311] task: ffff8800776e2d00 ti: ffff880037a28000 task.ti: ffff880037a28000 > [ 34.526312] RIP: 0010:[<ffffffffa0399af6>] [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx] > [ 34.526353] RSP: 0018:ffff880037a29810 EFLAGS: 00010296 > [ 34.526354] RAX: 0000000000000001 RBX: ffff8800724e800c RCX: 0000000000000006 > [ 34.526356] RDX: 0000000000000003 RSI: 0000000000000002 RDI: ffff8800724e8264 > [ 34.526357] RBP: ffff88007b19a00c R08: 00000000000186a0 R09: 000000000001e848 > [ 34.526358] R10: 00000002fffffffd R11: 00000000ffffffff R12: 0000000000000001 > [ 34.526359] R13: ffff88007b19a00c R14: 0000000000000000 R15: ffff880037a298b0 > [ 34.526363] FS: 00007f0ba649b880(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 > [ 34.526365] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 34.526366] CR2: ffff880c724e8008 CR3: 0000000037998000 CR4: 00000000000406e0 > [ 34.526372] Stack: > [ 34.526373] ffff88007b19a2f4 ffff88007bffcd1c 0000000000000001 ffffffffa0322cf0 > [ 34.526375] 0000000000000000 0000000000000000 0000000000000000 ffff880077ed2c08 > [ 34.526378] 0000000000000000 ffff880077ed2c08 ffff880037a298a0 ffffffffa0327f14 > [ 34.526380] Call Trace: > [ 34.526435] [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220 [fglrx] > [ 34.526490] [<ffffffffa0327f14>] ? PECI_NotifyDALPreAdapterClockChange+0x144/0x160 [fglrx] > [ 34.526546] [<ffffffffa031e321>] ? PHM_SetPowerState+0x31/0xc0 [fglrx] > [ 34.526597] [<ffffffffa0340a5b>] ? PSM_ApplyHardwareAttributes_Dynamic+0x9b/0xf0 [fglrx] > [ 34.526651] [<ffffffffa033fde9>] ? PSM_AdjustPowerState_Dynamic+0x169/0x540 [fglrx] > [ 34.526668] [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220 [fglrx] > [ 34.526668] [<ffffffffa0342ee4>] ? PEM_ExcuteEventChain+0x64/0xe0 [fglrx] > [ 34.526668] [<ffffffffa0341302>] ? PEM_HandleEvent+0x92/0xd0 [fglrx] > [ 34.526668] [<ffffffffa03357c0>] ? PEM_CWDDEPM_NotifyEvent+0xe0/0x4d0 [fglrx] > [ 34.526668] [<ffffffffa0333869>] ? PP_Cwdde+0x109/0x180 [fglrx] > [ 34.526668] [<ffffffffa02091dc>] ? firegl_pplib_cwddepm+0xbc/0x130 [fglrx] > [ 34.526668] [<ffffffffa02092d9>] ? firegl_pplib_notify_event+0x89/0xd0 [fglrx] > [ 34.526668] [<ffffffffa020292f>] ? hal_init_gpu+0x2bf/0x480 [fglrx] > [ 34.526668] [<ffffffffa01dcc7b>] ? firegl_open+0x2db/0x310 [fglrx] > [ 34.526668] [<ffffffffa01cb287>] ? ip_firegl_open+0x17/0x20 [fglrx] > [ 34.526668] [<ffffffffa01ccac8>] ? firegl_stub_open+0x98/0x100 [fglrx] > [ 34.526668] [<ffffffff811a82bf>] ? chrdev_open+0x9f/0x1d0 > [ 34.526668] [<ffffffff811a1967>] ? do_dentry_open+0x1b7/0x2c0 > [ 34.526668] [<ffffffff811aed41>] ? __inode_permission+0x41/0xb0 > [ 34.526668] [<ffffffff811a8220>] ? cdev_put+0x30/0x30 > [ 34.526668] [<ffffffff811a1d91>] ? finish_open+0x31/0x40 > [ 34.526668] [<ffffffff811b1b72>] ? do_last+0x572/0xe90 > [ 34.526668] [<ffffffff811af036>] ? link_path_walk+0x236/0x8d0 > [ 34.526668] [<ffffffff811b254b>] ? path_openat+0xbb/0x6b0 > [ 34.526668] [<ffffffff811b3c6a>] ? do_filp_open+0x3a/0x90 > [ 34.526668] [<ffffffff811c0567>] ? __alloc_fd+0xa7/0x130 > [ 34.526668] [<ffffffff811a2f49>] ? do_sys_open+0x129/0x220 > [ 34.526668] [<ffffffff811a305e>] ? SyS_open+0x1e/0x20 > [ 34.526668] [<ffffffff8152136d>] ? system_call_fastpath+0x1a/0x1f > [ 34.526668] Code: 8b 4a 1c 8b 93 e0 18 00 00 48 8d bb 58 02 00 00 85 d2 0f 84 63 02 00 00 f6 c2 01 0f 84 20 01 00 00 44 8b 1b 41 ff cb 4f 8d 14 5b <46> 89 44 93 08 8b 95 3c 02 00 00 48 89 d0 48 c1 e8 07 a8 01 75 > [ 34.526668] RIP [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx] > [ 34.526668] RSP <ffff880037a29810> > [ 34.526668] CR2: ffff880c724e8008 > [ 34.526668] ---[ end trace 5431e6dcf1c31dea ]--- > [ 69.317528] type=1006 audit(1391649552.046:4): pid=324 uid=0 old auid=4294967295 new auid=0 old ses=4294967295 new ses=3 res=1 > > I know it is the binary driver but I would also retry with radeon one but > I believe there will be a similar crash. In my first try I just rebooted > the Linux VM several times without starting X. > > I got it one time working without getting 'Clocksource tsc unstable' but > now I'm unable to repeat it. So I believe something more is needed. Bus resets are a mixed blessing, it returns the card to a relatively known state, but it's a fairly unusual event from a platform perspective and we have no idea what kind of quirks the host system bios might have in place to workaround hardware. If the bus is not fatal you might try running lspci -vvv in the host at various points to see what changed. For instance, boot a Linux guest to text mode and see if the card is in the same state between first boot and second boot before starting X. Thanks, Alex
--- a/hw/misc/vfio.c +++ b/hw/misc/vfio.c @@ -3136,7 +3136,7 @@ static void vfio_pci_reset_handler(void *opaque) QLIST_FOREACH(group, &group_list, next) { QLIST_FOREACH(vdev, &group->device_list, next) { - if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) { + if (!vdev->reset_works || !vdev->has_flr) { vdev->needs_reset = true; } }