Message ID | 54CA6CF6.7090308@redhat.com |
---|---|
State | New |
Headers | show |
On 29/01/2015 18:25, Laszlo Ersek wrote: > Another possibility would be to replace QLIST_INSERT_HEAD() with > QTAILQ_INSERT_TAIL() in qbus_realize(), but compared to the virt board > code, that could very easily regress stuff. Yes, this would be an ABI change. Paolo
On 29 January 2015 at 17:25, Laszlo Ersek <lersek@redhat.com> wrote: > I find that virtio-mmio devices are assigned highest-to-lowest addresses > as they are found on the command line. IOW, this code (from commit > f5fdcd6e): > > /* Note that we have to create the transports in forwards order > * so that command line devices are inserted lowest address first, > * and then add dtb nodes in reverse order so that they appear in > * the finished device tree lowest address first. > */ > for (i = 0; i < NUM_VIRTIO_TRANSPORTS; i++) { > int irq = vbi->irqmap[VIRT_MMIO] + i; > hwaddr base = vbi->memmap[VIRT_MMIO].base + i * size; > > sysbus_create_simple("virtio-mmio", base, pic[irq]); > } > > works exactly in reverse. Note that you can't rely on device ordering anyway. Guests should be using UUIDs or similar to ensure they mount the right filesystems in the right places, because the kernel makes no guarantee that it will enumerate devices in the device tree in any particular order. > The reason is that qbus_realize() reverses the order of child buses: > > if (bus->parent) { > QLIST_INSERT_HEAD(&bus->parent->child_bus, bus, sibling); > > ie. it inserts at the head, not at the tail. > > Then, when the -device options are processed, qbus_find_recursive() > locates free buses "forwards". > > According to info qtree, for the command line options > > -device virtio-net-device,netdev=netdev0,bootindex=2 > -device virtio-blk-device,drive=hd0,bootindex=0 > -device virtio-scsi-device,id=scsi0 > > (in this order), we end up with the following (partial) tree: > > dev: virtio-mmio, id "" > gpio-out "sysbus-irq" 1 > indirect_desc = true > event_idx = true > mmio 000000000a003e00/0000000000000200 > bus: virtio-mmio-bus.31 What tree is this a dump of? It doesn't look like a device tree dump. Dumping the dtb produces results in the expected order: [run qemu with -machine dumpdtb=/tmp/dump.dtb] dtc -I dtb -O dts /tmp/dump.dtb |less [...] virtio_mmio@a000000 { interrupts = <0x0 0x10 0x1>; reg = <0x0 0xa000000 0x0 0x200>; compatible = "virtio,mmio"; }; virtio_mmio@a000200 { interrupts = <0x0 0x11 0x1>; reg = <0x0 0xa000200 0x0 0x200>; compatible = "virtio,mmio"; }; [etc] Device tree nodes appear in the tree lowest address first, which is exactly what the code above is claiming to do. thanks -- PMM
On 01/29/15 19:15, Peter Maydell wrote: > On 29 January 2015 at 17:25, Laszlo Ersek <lersek@redhat.com> wrote: >> I find that virtio-mmio devices are assigned highest-to-lowest addresses >> as they are found on the command line. IOW, this code (from commit >> f5fdcd6e): >> >> /* Note that we have to create the transports in forwards order >> * so that command line devices are inserted lowest address first, >> * and then add dtb nodes in reverse order so that they appear in >> * the finished device tree lowest address first. >> */ >> for (i = 0; i < NUM_VIRTIO_TRANSPORTS; i++) { >> int irq = vbi->irqmap[VIRT_MMIO] + i; >> hwaddr base = vbi->memmap[VIRT_MMIO].base + i * size; >> >> sysbus_create_simple("virtio-mmio", base, pic[irq]); >> } >> >> works exactly in reverse. > > Note that you can't rely on device ordering anyway. Guests > should be using UUIDs or similar to ensure they mount the > right filesystems in the right places, because the kernel > makes no guarantee that it will enumerate devices in the > device tree in any particular order. Understood. Then we'll probably have to address this in libguestfs. > >> The reason is that qbus_realize() reverses the order of child buses: >> >> if (bus->parent) { >> QLIST_INSERT_HEAD(&bus->parent->child_bus, bus, sibling); >> >> ie. it inserts at the head, not at the tail. >> >> Then, when the -device options are processed, qbus_find_recursive() >> locates free buses "forwards". >> >> According to info qtree, for the command line options >> >> -device virtio-net-device,netdev=netdev0,bootindex=2 >> -device virtio-blk-device,drive=hd0,bootindex=0 >> -device virtio-scsi-device,id=scsi0 >> >> (in this order), we end up with the following (partial) tree: >> >> dev: virtio-mmio, id "" >> gpio-out "sysbus-irq" 1 >> indirect_desc = true >> event_idx = true >> mmio 000000000a003e00/0000000000000200 >> bus: virtio-mmio-bus.31 > > What tree is this a dump of? It doesn't look like a > device tree dump. No, it is the output of "info qtree", as I said above. > Dumping the dtb produces results in > the expected order: > > [run qemu with -machine dumpdtb=/tmp/dump.dtb] > dtc -I dtb -O dts /tmp/dump.dtb |less > > [...] > virtio_mmio@a000000 { > interrupts = <0x0 0x10 0x1>; > reg = <0x0 0xa000000 0x0 0x200>; > compatible = "virtio,mmio"; > }; > > virtio_mmio@a000200 { > interrupts = <0x0 0x11 0x1>; > reg = <0x0 0xa000200 0x0 0x200>; > compatible = "virtio,mmio"; > }; > [etc] Yes. The DTB is 100% fine. However, we're talking two independent mappings here. The lower level mapping is the DTB, and that's completely fine. It's handled by the second loop in create_virtio_devices(). But, that mapping in the DTB doesn't know about actual virtio devices at all, it only describes virtio-mmio transports. The bug is in the other mapping, ie. how devices specified with the -device options are mapped onto the (then already existing) virtio-mmio transports. The processing of -device is fine, it goes forward. The issue is that qbus_find_recursive(), in response to each -device option, finds the next free transport in decreasing address order, *because* qbus_realize() *prepended* each transport, not appended it, when create_virtio_devices() called sysbus_create_simple() in increasing address order at startup. > Device tree nodes appear in the tree lowest address > first, which is exactly what the code above is > claiming to do. The comment makes two statements, and two loops follow it. The first statement in the comment is wrong: /* Note that we have to create the transports in forwards order * so that command line devices are inserted lowest address first, The first loop matches the first statement in the comment, hence the first loop is also wrong. The second statement in the comment is correct: * and then add dtb nodes in reverse order so that they appear in * the finished device tree lowest address first. hence the second loop (which matches the second statement) is also correct. Thanks Laszlo
On 29 January 2015 at 18:29, Laszlo Ersek <lersek@redhat.com> wrote: > Yes. The DTB is 100% fine. However, we're talking two independent > mappings here. The lower level mapping is the DTB, and that's completely > fine. It's handled by the second loop in create_virtio_devices(). But, > that mapping in the DTB doesn't know about actual virtio devices at all, > it only describes virtio-mmio transports. > > The bug is in the other mapping, ie. how devices specified with the > -device options are mapped onto the (then already existing) virtio-mmio > transports. The processing of -device is fine, it goes forward. The > issue is that qbus_find_recursive(), in response to each -device option, > finds the next free transport in decreasing address order, *because* > qbus_realize() *prepended* each transport, not appended it, when > create_virtio_devices() called sysbus_create_simple() in increasing > address order at startup. > >> Device tree nodes appear in the tree lowest address >> first, which is exactly what the code above is >> claiming to do. > > The comment makes two statements, and two loops follow it. I see what you mean now. Does the ordering we have result in the guest kernel (usually) allocating vda to the first command line device, vdb to the second, and so on? I'm pretty sure I recall testing at the time and that that's why we have the funny ordering. (Looking back through the archives, when we last looked at this I think it turned out that the kernel probes the mmio devices in one order and then assigns them vda/vdb/etc in the opposite order.) > The first statement in the comment is wrong: > > /* Note that we have to create the transports in forwards order > * so that command line devices are inserted lowest address first, > > The first loop matches the first statement in the comment, hence the > first loop is also wrong. Regardless, we can't change the ordering now without breaking existing command lines (I'm sure there are people out there who aren't using UUIDs). We could fix the comment, though. -- PMM
On Thu, Jan 29, 2015 at 07:29:09PM +0100, Laszlo Ersek wrote: > On 01/29/15 19:15, Peter Maydell wrote: > > On 29 January 2015 at 17:25, Laszlo Ersek <lersek@redhat.com> wrote: > >> I find that virtio-mmio devices are assigned highest-to-lowest addresses > >> as they are found on the command line. IOW, this code (from commit > >> f5fdcd6e): > >> > >> /* Note that we have to create the transports in forwards order > >> * so that command line devices are inserted lowest address first, > >> * and then add dtb nodes in reverse order so that they appear in > >> * the finished device tree lowest address first. > >> */ > >> for (i = 0; i < NUM_VIRTIO_TRANSPORTS; i++) { > >> int irq = vbi->irqmap[VIRT_MMIO] + i; > >> hwaddr base = vbi->memmap[VIRT_MMIO].base + i * size; > >> > >> sysbus_create_simple("virtio-mmio", base, pic[irq]); > >> } > >> > >> works exactly in reverse. > > > > Note that you can't rely on device ordering anyway. Guests > > should be using UUIDs or similar to ensure they mount the > > right filesystems in the right places, because the kernel > > makes no guarantee that it will enumerate devices in the > > device tree in any particular order. > > Understood. Then we'll probably have to address this in libguestfs. That's pretty much impossible. If we can't fix qemu, then libguestfs will need to add the drives in reverse order on the command line (for this case virtio-blk && -M virt). I should note that we're only trying to use virtio-blk because virtio-scsi is broken ... Laszlo: is there a public bug for that? Rich.
On 01/29/15 20:01, Peter Maydell wrote: > On 29 January 2015 at 18:29, Laszlo Ersek <lersek@redhat.com> wrote: >> Yes. The DTB is 100% fine. However, we're talking two independent >> mappings here. The lower level mapping is the DTB, and that's completely >> fine. It's handled by the second loop in create_virtio_devices(). But, >> that mapping in the DTB doesn't know about actual virtio devices at all, >> it only describes virtio-mmio transports. >> >> The bug is in the other mapping, ie. how devices specified with the >> -device options are mapped onto the (then already existing) virtio-mmio >> transports. The processing of -device is fine, it goes forward. The >> issue is that qbus_find_recursive(), in response to each -device option, >> finds the next free transport in decreasing address order, *because* >> qbus_realize() *prepended* each transport, not appended it, when >> create_virtio_devices() called sysbus_create_simple() in increasing >> address order at startup. >> >>> Device tree nodes appear in the tree lowest address >>> first, which is exactly what the code above is >>> claiming to do. >> >> The comment makes two statements, and two loops follow it. > > I see what you mean now. > > Does the ordering we have result in the guest kernel (usually) > allocating vda to the first command line device, vdb to the > second, and so on? I'm pretty sure I recall testing at the > time and that that's why we have the funny ordering. That's exactly what we're seeing. The first virtio-blk-device on the qemu command line ends up getting the highest /dev/vdX in the guest, and the last one on the qemu cmdline becomes /dev/vda in the guest. Libguestfs expects the last cmdline option to become the highest /dev/vdX. > (Looking back through the archives, when we last looked at > this I think it turned out that the kernel probes the mmio > devices in one order and then assigns them vda/vdb/etc > in the opposite order.) Ugh! That would be the perfect explanation for the current qemu code. > >> The first statement in the comment is wrong: >> >> /* Note that we have to create the transports in forwards order >> * so that command line devices are inserted lowest address first, >> >> The first loop matches the first statement in the comment, hence the >> first loop is also wrong. > > Regardless, we can't change the ordering now without breaking > existing command lines (I'm sure there are people out there > who aren't using UUIDs). > > We could fix the comment, though. If the guest kernel changed its "assignment strategy" at some point, but earlier it used to match the comment (and the code), then whichever way we shape the comment will be wrong for the other kernel strategy. :) So, in that case this code is probably best left undisturbed. I'll try to dig out some kernel commit for more evidence. Thanks! Laszlo
On 29 January 2015 at 19:09, Richard W.M. Jones <rjones@redhat.com> wrote: > On Thu, Jan 29, 2015 at 07:29:09PM +0100, Laszlo Ersek wrote: >> On 01/29/15 19:15, Peter Maydell wrote: >> > Note that you can't rely on device ordering anyway. Guests >> > should be using UUIDs or similar to ensure they mount the >> > right filesystems in the right places, because the kernel >> > makes no guarantee that it will enumerate devices in the >> > device tree in any particular order. >> >> Understood. Then we'll probably have to address this in libguestfs. > > That's pretty much impossible. If we can't fix qemu, then libguestfs > will need to add the drives in reverse order on the command line (for > this case virtio-blk && -M virt). This is still wrong, because the kernel doesn't guarantee the device ordering. It could break tomorrow with a new kernel version. > I should note that we're only trying to use virtio-blk because > virtio-scsi is broken ... Laszlo: is there a public bug for that? I don't think the kernel guarantees device order for *anything*. -- PMM
On 01/29/15 20:09, Richard W.M. Jones wrote: > On Thu, Jan 29, 2015 at 07:29:09PM +0100, Laszlo Ersek wrote: >> On 01/29/15 19:15, Peter Maydell wrote: >>> On 29 January 2015 at 17:25, Laszlo Ersek <lersek@redhat.com> wrote: >>>> I find that virtio-mmio devices are assigned highest-to-lowest addresses >>>> as they are found on the command line. IOW, this code (from commit >>>> f5fdcd6e): >>>> >>>> /* Note that we have to create the transports in forwards order >>>> * so that command line devices are inserted lowest address first, >>>> * and then add dtb nodes in reverse order so that they appear in >>>> * the finished device tree lowest address first. >>>> */ >>>> for (i = 0; i < NUM_VIRTIO_TRANSPORTS; i++) { >>>> int irq = vbi->irqmap[VIRT_MMIO] + i; >>>> hwaddr base = vbi->memmap[VIRT_MMIO].base + i * size; >>>> >>>> sysbus_create_simple("virtio-mmio", base, pic[irq]); >>>> } >>>> >>>> works exactly in reverse. >>> >>> Note that you can't rely on device ordering anyway. Guests >>> should be using UUIDs or similar to ensure they mount the >>> right filesystems in the right places, because the kernel >>> makes no guarantee that it will enumerate devices in the >>> device tree in any particular order. >> >> Understood. Then we'll probably have to address this in libguestfs. > > That's pretty much impossible. If we can't fix qemu, then libguestfs > will need to add the drives in reverse order on the command line (for > this case virtio-blk && -M virt). That's precisely what I meant by "addressing it in libguestfs". My idea wasn't reversing the drives but to fixup the root=.... kernel cmdline parameter. But I didn't look into it at all; I certainly defer to you. > I should note that we're only trying to use virtio-blk because > virtio-scsi is broken It's not directly virtio-scsi that's broken; something in the thread pool breaks on aarch64 KVM, and only virtio-scsi exposes it, virtio-blk doesn't (although it uses the same thread pool, just with fewer middle layers). > ... Laszlo: is there a public bug for that? There isn't. I requested permission on the appropriate internal list to open up bug 1184405. I didn't get permission. I disagree with that to this day (given the actual contents of the bug). Paolo had suggested to use Launchpad, but I resisted (for good reason; Launchpad is possibly the worst bug tracker imaginable. I have specific reasons.) Working 12+ hours per day I don't feel the slightest inclination to screw around with substandard bug trackers during the original research, nor to shovel over 30+ carefully edited comments and about 10 attachments after their initial record in the RHBZ. The way forward is to open up 1184405, if we'd like it to be public. Thanks Laszlo
On 01/29/15 20:12, Laszlo Ersek wrote: > On 01/29/15 20:01, Peter Maydell wrote: >> On 29 January 2015 at 18:29, Laszlo Ersek <lersek@redhat.com> wrote: >>> Yes. The DTB is 100% fine. However, we're talking two independent >>> mappings here. The lower level mapping is the DTB, and that's completely >>> fine. It's handled by the second loop in create_virtio_devices(). But, >>> that mapping in the DTB doesn't know about actual virtio devices at all, >>> it only describes virtio-mmio transports. >>> >>> The bug is in the other mapping, ie. how devices specified with the >>> -device options are mapped onto the (then already existing) virtio-mmio >>> transports. The processing of -device is fine, it goes forward. The >>> issue is that qbus_find_recursive(), in response to each -device option, >>> finds the next free transport in decreasing address order, *because* >>> qbus_realize() *prepended* each transport, not appended it, when >>> create_virtio_devices() called sysbus_create_simple() in increasing >>> address order at startup. >>> >>>> Device tree nodes appear in the tree lowest address >>>> first, which is exactly what the code above is >>>> claiming to do. >>> >>> The comment makes two statements, and two loops follow it. >> >> I see what you mean now. >> >> Does the ordering we have result in the guest kernel (usually) >> allocating vda to the first command line device, vdb to the >> second, and so on? I'm pretty sure I recall testing at the >> time and that that's why we have the funny ordering. > > That's exactly what we're seeing. The first virtio-blk-device on the > qemu command line ends up getting the highest /dev/vdX in the guest, and > the last one on the qemu cmdline becomes /dev/vda in the guest. > Libguestfs expects the last cmdline option to become the highest /dev/vdX. > >> (Looking back through the archives, when we last looked at >> this I think it turned out that the kernel probes the mmio >> devices in one order and then assigns them vda/vdb/etc >> in the opposite order.) > > Ugh! That would be the perfect explanation for the current qemu code. > >> >>> The first statement in the comment is wrong: >>> >>> /* Note that we have to create the transports in forwards order >>> * so that command line devices are inserted lowest address first, >>> >>> The first loop matches the first statement in the comment, hence the >>> first loop is also wrong. >> >> Regardless, we can't change the ordering now without breaking >> existing command lines (I'm sure there are people out there >> who aren't using UUIDs). >> >> We could fix the comment, though. > > If the guest kernel changed its "assignment strategy" at some point, but > earlier it used to match the comment (and the code), then whichever way > we shape the comment will be wrong for the other kernel strategy. :) So, > in that case this code is probably best left undisturbed. > > I'll try to dig out some kernel commit for more evidence. http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=70161ff3 Laszlo
On 29 January 2015 at 19:47, Laszlo Ersek <lersek@redhat.com> wrote: > On 01/29/15 20:12, Laszlo Ersek wrote: >> If the guest kernel changed its "assignment strategy" at some point, but >> earlier it used to match the comment (and the code), then whichever way >> we shape the comment will be wrong for the other kernel strategy. :) So, >> in that case this code is probably best left undisturbed. >> >> I'll try to dig out some kernel commit for more evidence. > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=70161ff3 Thanks for digging that up -- I was half-thinking we might just have got it wrong two years ago when we wrote the code :-) We should probably update the comment to say (a) what we were trying to do (b) that we don't want to change it now because it would break existing setups. -- PMM
On Thu, Jan 29, 2015 at 08:05:50PM +0000, Peter Maydell wrote: > On 29 January 2015 at 19:47, Laszlo Ersek <lersek@redhat.com> wrote: > > On 01/29/15 20:12, Laszlo Ersek wrote: > >> If the guest kernel changed its "assignment strategy" at some point, but > >> earlier it used to match the comment (and the code), then whichever way > >> we shape the comment will be wrong for the other kernel strategy. :) So, > >> in that case this code is probably best left undisturbed. > >> > >> I'll try to dig out some kernel commit for more evidence. > > > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=70161ff3 > > Thanks for digging that up -- I was half-thinking we might > just have got it wrong two years ago when we wrote the code :-) > > We should probably update the comment to say (a) what we were > trying to do (b) that we don't want to change it now because > it would break existing setups. While it is clear there is no solution that works correctly with all kernels, I hate to think that we're going to stick with an ordering that is clearly wrong for modern kernels, forever going forward. The aarch64 world is only just starting out, so on balance I think we should optimize for the future rather than the past, since that gives right behaviour for orders of magnitude more people in the long term. Also can we start using a versioned machine type for ARM, and make the new machine type have the correct ordering for current kernels. Regards, Daniel
On 01/30/15 10:54, Daniel P. Berrange wrote: > On Thu, Jan 29, 2015 at 08:05:50PM +0000, Peter Maydell wrote: >> On 29 January 2015 at 19:47, Laszlo Ersek <lersek@redhat.com> wrote: >>> On 01/29/15 20:12, Laszlo Ersek wrote: >>>> If the guest kernel changed its "assignment strategy" at some point, but >>>> earlier it used to match the comment (and the code), then whichever way >>>> we shape the comment will be wrong for the other kernel strategy. :) So, >>>> in that case this code is probably best left undisturbed. >>>> >>>> I'll try to dig out some kernel commit for more evidence. >>> >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=70161ff3 >> >> Thanks for digging that up -- I was half-thinking we might >> just have got it wrong two years ago when we wrote the code :-) >> >> We should probably update the comment to say (a) what we were >> trying to do (b) that we don't want to change it now because >> it would break existing setups. > > While it is clear there is no solution that works correctly with all > kernels, I hate to think that we're going to stick with an ordering > that is clearly wrong for modern kernels, forever going forward. The > aarch64 world is only just starting out, so on balance I think we > should optimize for the future rather than the past, since that gives > right behaviour for orders of magnitude more people in the long term. > > Also can we start using a versioned machine type for ARM, and make the > new machine type have the correct ordering for current kernels. Wei recently posted a patch that introduced versioning for machvirt. (I didn't see the patch, only know about it.) If Peter agrees, I guess both Wei's patch and mine could be applied. (Although, mine should be reworked so that it affect only the new machtype.) Thanks Laszlo
On 30 January 2015 at 09:54, Daniel P. Berrange <berrange@redhat.com> wrote: > While it is clear there is no solution that works correctly with all > kernels, I hate to think that we're going to stick with an ordering > that is clearly wrong for modern kernels, forever going forward. The > aarch64 world is only just starting out, so on balance I think we > should optimize for the future rather than the past, since that gives > right behaviour for orders of magnitude more people in the long term. Yeah, I agree it's awkward. But I hate breaking people's working setups, and we have no guarantee the kernel won't change again in the future. You could try asking the kernel folk to revert that patch on the basis that it breaks things... > Also can we start using a versioned machine type for ARM, and make the > new machine type have the correct ordering for current kernels. No, not yet. We are miles and miles and miles away from being able to do cross-version migration correctly, and adding version machine types is accepting a huge maintenance burden from now extending indefinitely into the future. (It only works for x86, as I understand it, because RedHat do a lot of testing of various migration scenarios including cross-version, and even then we miss things. Nobody has as yet suggested they're doing or even planning to do that level of testing on the ARM boards. I'm not going to add versioned boards until it becomes absolutely required. Having working within-version migration would be a start. -- PMM
On Fri, Jan 30, 2015 at 10:29:46AM +0000, Peter Maydell wrote: > On 30 January 2015 at 09:54, Daniel P. Berrange <berrange@redhat.com> wrote: > > While it is clear there is no solution that works correctly with all > > kernels, I hate to think that we're going to stick with an ordering > > that is clearly wrong for modern kernels, forever going forward. The > > aarch64 world is only just starting out, so on balance I think we > > should optimize for the future rather than the past, since that gives > > right behaviour for orders of magnitude more people in the long term. > > Yeah, I agree it's awkward. But I hate breaking people's > working setups, and we have no guarantee the kernel won't > change again in the future. > > You could try asking the kernel folk to revert that patch on > the basis that it breaks things... Might be worth a shot - the patch is only a month old. Or at least do a followup patch to put the ordering back the way it was, rather than plain revert Long term though it will be much better of AArch64 would just do PCI instead of MMIO bus. Then we have proper device addressing which we can control in a predictable manner that will be stable across hotplug and unplug and migration. I hear there's work on PCI for AArch64 but is there a near term ETA yet ? Regards, Daniel
On 30 January 2015 at 10:48, Daniel P. Berrange <berrange@redhat.com> wrote: > Long term though it will be much better of AArch64 would just do PCI > instead of MMIO bus. Then we have proper device addressing which we > can control in a predictable manner that will be stable across hotplug > and unplug and migration. I hear there's work on PCI for AArch64 but > is there a near term ETA yet ? I need to review Alex's v3 patchset; if it doesn't have any issues it'll probably go into target-arm.next next week and hit mainline shortly after that. -- PMM
On 30 January 2015 at 10:54, Peter Maydell <peter.maydell@linaro.org> wrote: > On 30 January 2015 at 10:48, Daniel P. Berrange <berrange@redhat.com> wrote: >> Long term though it will be much better of AArch64 would just do PCI >> instead of MMIO bus. Then we have proper device addressing which we >> can control in a predictable manner that will be stable across hotplug >> and unplug and migration. I hear there's work on PCI for AArch64 but >> is there a near term ETA yet ? > > I need to review Alex's v3 patchset; if it doesn't have any issues > it'll probably go into target-arm.next next week and hit mainline > shortly after that. Incidentally if you have opinions about sensible minimum sizes of PCI ECAM and MMIO windows in the address space now would be a good time to air them. -- PMM
On 01/30/15 11:48, Daniel P. Berrange wrote: > On Fri, Jan 30, 2015 at 10:29:46AM +0000, Peter Maydell wrote: >> On 30 January 2015 at 09:54, Daniel P. Berrange <berrange@redhat.com> wrote: >>> While it is clear there is no solution that works correctly with all >>> kernels, I hate to think that we're going to stick with an ordering >>> that is clearly wrong for modern kernels, forever going forward. The >>> aarch64 world is only just starting out, so on balance I think we >>> should optimize for the future rather than the past, since that gives >>> right behaviour for orders of magnitude more people in the long term. >> >> Yeah, I agree it's awkward. But I hate breaking people's >> working setups, and we have no guarantee the kernel won't >> change again in the future. >> >> You could try asking the kernel folk to revert that patch on >> the basis that it breaks things... > > Might be worth a shot - the patch is only a month old. Or at least do a > followup patch to put the ordering back the way it was, rather than plain > revert Please note that (as far as I understand) the patch that I referenced is indeed very new, it's not part of v3.18, but the reversal can easily be seen with v3.18. In other words, the kernel patch I referenced introduces no functional change, it just reorganizes stuff in the kernel (AIUI), with the benefit of killing a superfluous field. The reason I referenced it because its *commit message* gives good background. If we really wanted to find the kernel change that reversed the traversal, we'd have to talk to Grant and/or bisect the kernel. Thanks Laszlo > Long term though it will be much better of AArch64 would just do PCI > instead of MMIO bus. Then we have proper device addressing which we > can control in a predictable manner that will be stable across hotplug > and unplug and migration. I hear there's work on PCI for AArch64 but > is there a near term ETA yet ? > > Regards, > Daniel >
On 01/30/15 11:54, Peter Maydell wrote: > On 30 January 2015 at 10:48, Daniel P. Berrange <berrange@redhat.com> wrote: >> Long term though it will be much better of AArch64 would just do PCI >> instead of MMIO bus. Then we have proper device addressing which we >> can control in a predictable manner that will be stable across hotplug >> and unplug and migration. I hear there's work on PCI for AArch64 but >> is there a near term ETA yet ? > > I need to review Alex's v3 patchset; if it doesn't have any issues > it'll probably go into target-arm.next next week and hit mainline > shortly after that. (And then a root bridge driver for it will be necessary in edk2. (I'm unsure if the guest kernel driver is already being worked on.)) Thanks Laszlo
On 30 January 2015 at 11:38, Laszlo Ersek <lersek@redhat.com> wrote: > Please note that (as far as I understand) the patch that I referenced is > indeed very new, it's not part of v3.18, but the reversal can easily be > seen with v3.18. In other words, the kernel patch I referenced > introduces no functional change, it just reorganizes stuff in the kernel > (AIUI), with the benefit of killing a superfluous field. > > The reason I referenced it because its *commit message* gives good > background. If we really wanted to find the kernel change that reversed > the traversal, we'd have to talk to Grant and/or bisect the kernel. Just as a sanity check, do we actually have an old kernel that enumerates in the opposite order (as opposed to my two-year-old "I'm sure this used to be right" memories...) ? -- PMM
On 30 January 2015 at 11:39, Laszlo Ersek <lersek@redhat.com> wrote: > On 01/30/15 11:54, Peter Maydell wrote: >> On 30 January 2015 at 10:48, Daniel P. Berrange <berrange@redhat.com> wrote: >>> Long term though it will be much better of AArch64 would just do PCI >>> instead of MMIO bus. Then we have proper device addressing which we >>> can control in a predictable manner that will be stable across hotplug >>> and unplug and migration. I hear there's work on PCI for AArch64 but >>> is there a near term ETA yet ? >> >> I need to review Alex's v3 patchset; if it doesn't have any issues >> it'll probably go into target-arm.next next week and hit mainline >> shortly after that. > > (And then a root bridge driver for it will be necessary in edk2. (I'm > unsure if the guest kernel driver is already being worked on.)) Guest kernel doesn't need any particular support because it uses the generic pcie-driven-by-device-tree. For 64-bit ARM there's some minor missing kernel cleanup/patch, but for 32-bit it works as is. [see Alex's notes in patch 3/4 for detail.] -- PMM
On 01/30/15 12:40, Peter Maydell wrote: > On 30 January 2015 at 11:38, Laszlo Ersek <lersek@redhat.com> wrote: >> Please note that (as far as I understand) the patch that I referenced is >> indeed very new, it's not part of v3.18, but the reversal can easily be >> seen with v3.18. In other words, the kernel patch I referenced >> introduces no functional change, it just reorganizes stuff in the kernel >> (AIUI), with the benefit of killing a superfluous field. >> >> The reason I referenced it because its *commit message* gives good >> background. If we really wanted to find the kernel change that reversed >> the traversal, we'd have to talk to Grant and/or bisect the kernel. > > Just as a sanity check, do we actually have an old kernel that > enumerates in the opposite order (as opposed to my two-year-old > "I'm sure this used to be right" memories...) ? I never saw one. :) That's why I asked in my first message if anyone had actually tested that loop / comment. Thanks, Laszlo
On 30 January 2015 at 11:48, Laszlo Ersek <lersek@redhat.com> wrote: > On 01/30/15 12:40, Peter Maydell wrote: >> Just as a sanity check, do we actually have an old kernel that >> enumerates in the opposite order (as opposed to my two-year-old >> "I'm sure this used to be right" memories...) ? > > I never saw one. :) That's why I asked in my first message if anyone had > actually tested that loop / comment. Well, I *think* I did, because I remember the go-round about it and I wouldn't have written weird code like that without a reason. But it's two years ago now... -- PMM
diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 2353440..705e623 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -441,23 +441,17 @@ static void create_virtio_devices(const VirtBoardInfo *vbi, qemu_irq *pic) int i; hwaddr size = vbi->memmap[VIRT_MMIO].size; - /* Note that we have to create the transports in forwards order + /* Note that we have to create the transports in backwards order * so that command line devices are inserted lowest address first, - * and then add dtb nodes in reverse order so that they appear in + * and then add dtb nodes in reverse order too so that they appear in * the finished device tree lowest address first. */ - for (i = 0; i < NUM_VIRTIO_TRANSPORTS; i++) { - int irq = vbi->irqmap[VIRT_MMIO] + i; - hwaddr base = vbi->memmap[VIRT_MMIO].base + i * size; - - sysbus_create_simple("virtio-mmio", base, pic[irq]); - } - for (i = NUM_VIRTIO_TRANSPORTS - 1; i >= 0; i--) { char *nodename; int irq = vbi->irqmap[VIRT_MMIO] + i; hwaddr base = vbi->memmap[VIRT_MMIO].base + i * size; + sysbus_create_simple("virtio-mmio", base, pic[irq]); nodename = g_strdup_printf("/virtio_mmio@%" PRIx64, base); qemu_fdt_add_subnode(vbi->fdt, nodename);