diff mbox

[v1,2/3] object.c: object_class_dynamic_cast return NULL if the class has no type

Message ID CAKmqyKP0-Y8h5VEAFv6Socs7RhGJUwxME=KUVCgp6jhB4JO_OA@mail.gmail.com
State New
Headers show

Commit Message

Alistair Francis Aug. 26, 2015, 8:36 p.m. UTC
On Tue, Aug 25, 2015 at 12:43 AM, Peter Crosthwaite
<crosthwaitepeter@gmail.com> wrote:
> On Mon, Aug 24, 2015 at 4:36 PM, Alistair Francis
> <alistair.francis@xilinx.com> wrote:
>> On Mon, Aug 17, 2015 at 4:37 PM, Peter Crosthwaite
>> <crosthwaitepeter@gmail.com> wrote:
>>> On Mon, Aug 17, 2015 at 3:33 PM, Andreas Färber <afaerber@suse.de> wrote:
>>>> Am 18.08.2015 um 00:24 schrieb Alistair Francis:
>>>>> On Sat, Aug 15, 2015 at 2:22 PM, Peter Crosthwaite
>>>>> <crosthwaitepeter@gmail.com> wrote:
>>>>>> On Mon, Jul 27, 2015 at 11:37 AM, Alistair Francis
>>>>>> <alistair.francis@xilinx.com> wrote:
>>>>>>> If the ObjectClass has no type return NULL instead of trying to compare
>>>>>>> the type name.
>>>>>>>
>>>>>>
>>>>>> What was the issue?
>>>>>
>>>>> There is a seg fault in object_class_dynamic_cast() because there is
>>>>> no type in the ObjectClass struct.
>>>>
>>>> That should never happen, ever since TYPE_OBJECT is no longer NULL.
>>>>
>>>>> It happens when it is trying to cast the "pci-device", which is called
>>>>> from the ahci_irq_lower() function. The function is testing if the
>>>>> device is a pci device, so it should return NULL if it isn't valid.
>>>
>>> Yes so I vaguely remember this now. It is about MSI interrupts which
>>> have nothing to do with sysbus implementation. My solution was to rip
>>> that PCI specific stuff out of AHCI completely in my branch. Should
>>> sysbus and PCI AHCI classes install their own separate logic for this
>>> part via a virtualised hook?
>>>
>>> On the topic though, I notice many PCI devices have this MSI specific
>>> logic in them. Is it possible for devs to just treat interrupts as
>>> pins and the PCI layers do the MSI vs non-MSI logic switch in core
>>> layers?
>>>
>>> If Andreas' idea don't work this is still a core QOM bug though. I
>>> think object_dynamic_cast should not have this segfault when passed a
>>> non implementing object.
>>>
>>> Regards,
>>> Peter
>>>
>>>>
>>>> It rather sounds as if some build-time dependency is wrong, which we
>>>> used to run into for the Container type before Paolo macrofied this.
>>>>
>>>> Please try again with a clean build - if it still occurs, we'll need a
>>>> reproducible test case to investigate what is going on rather than
>>>> papering over a latent bug.
>>
>> Hey,
>>
>> Sorry abut the delay, but I didn't get a chance to look at this last
>> week. I tried with a clean setup and still see the seg fault.
>>
>> I will try to look into it more this week, but if anyone is interested
>> here are the steps to reproduce:
>>
>> On the latest mainline QEMU, with my 2nd and 3rd patches applied
>> $ ./configure --target-list="aarch64-softmmu,microblazeel-softmmu"
>> --disable-pie --disable-sdl --disable-werror # This is what is
>> required at work
>> $ ./aarch64-softmmu/qemu-system-aarch64 -M xlnx-ep108 -display none
>> -kernel ./u-boot.elf -m 8000000 -nographic -serial mon:stdio # Boot
>> u-boot on QEMU
>>
>> The image I'm using is available at: http://1drv.ms/1NxDXLo
>>
>
> So it's not a core bug. That container_of in ahci_lower_irq is
> incorrectly assuming that the passed AHCIState * is always for a PCI,
> which it is not in the sysbus case. So it's incorrectly getting the
> offset of QOM the object and the QOM cast is treating some invalid
> offset into the (or past) object as a QOM object base address.
>
> The simplest solution is a back pointer in AHCIState to the
> encapsulating device (would be a DeviceState *). The container_of is
> replaced with a nav of this pointer and then the conditional PCI cast
> can work.

This seems to fix the problem. It seems hacky though, I can't find a
better way to check the validity of the PCIDevice. Any ideas?


Thanks,

Alistair

>
> Regards,
> Peter
>
>> Thanks,
>>
>> Alistair
>>
>>>>
>>>> Thanks,
>>>> Andreas
>>>>
>>>> --
>>>> SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
>>>> GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)
>>>
>

Comments

Peter Crosthwaite Aug. 26, 2015, 9:02 p.m. UTC | #1
On Wed, Aug 26, 2015 at 1:36 PM, Alistair Francis
<alistair.francis@xilinx.com> wrote:
> On Tue, Aug 25, 2015 at 12:43 AM, Peter Crosthwaite
> <crosthwaitepeter@gmail.com> wrote:
>> On Mon, Aug 24, 2015 at 4:36 PM, Alistair Francis
>> <alistair.francis@xilinx.com> wrote:
>>> On Mon, Aug 17, 2015 at 4:37 PM, Peter Crosthwaite
>>> <crosthwaitepeter@gmail.com> wrote:
>>>> On Mon, Aug 17, 2015 at 3:33 PM, Andreas Färber <afaerber@suse.de> wrote:
>>>>> Am 18.08.2015 um 00:24 schrieb Alistair Francis:
>>>>>> On Sat, Aug 15, 2015 at 2:22 PM, Peter Crosthwaite
>>>>>> <crosthwaitepeter@gmail.com> wrote:
>>>>>>> On Mon, Jul 27, 2015 at 11:37 AM, Alistair Francis
>>>>>>> <alistair.francis@xilinx.com> wrote:
>>>>>>>> If the ObjectClass has no type return NULL instead of trying to compare
>>>>>>>> the type name.
>>>>>>>>
>>>>>>>
>>>>>>> What was the issue?
>>>>>>
>>>>>> There is a seg fault in object_class_dynamic_cast() because there is
>>>>>> no type in the ObjectClass struct.
>>>>>
>>>>> That should never happen, ever since TYPE_OBJECT is no longer NULL.
>>>>>
>>>>>> It happens when it is trying to cast the "pci-device", which is called
>>>>>> from the ahci_irq_lower() function. The function is testing if the
>>>>>> device is a pci device, so it should return NULL if it isn't valid.
>>>>
>>>> Yes so I vaguely remember this now. It is about MSI interrupts which
>>>> have nothing to do with sysbus implementation. My solution was to rip
>>>> that PCI specific stuff out of AHCI completely in my branch. Should
>>>> sysbus and PCI AHCI classes install their own separate logic for this
>>>> part via a virtualised hook?
>>>>
>>>> On the topic though, I notice many PCI devices have this MSI specific
>>>> logic in them. Is it possible for devs to just treat interrupts as
>>>> pins and the PCI layers do the MSI vs non-MSI logic switch in core
>>>> layers?
>>>>
>>>> If Andreas' idea don't work this is still a core QOM bug though. I
>>>> think object_dynamic_cast should not have this segfault when passed a
>>>> non implementing object.
>>>>
>>>> Regards,
>>>> Peter
>>>>
>>>>>
>>>>> It rather sounds as if some build-time dependency is wrong, which we
>>>>> used to run into for the Container type before Paolo macrofied this.
>>>>>
>>>>> Please try again with a clean build - if it still occurs, we'll need a
>>>>> reproducible test case to investigate what is going on rather than
>>>>> papering over a latent bug.
>>>
>>> Hey,
>>>
>>> Sorry abut the delay, but I didn't get a chance to look at this last
>>> week. I tried with a clean setup and still see the seg fault.
>>>
>>> I will try to look into it more this week, but if anyone is interested
>>> here are the steps to reproduce:
>>>
>>> On the latest mainline QEMU, with my 2nd and 3rd patches applied
>>> $ ./configure --target-list="aarch64-softmmu,microblazeel-softmmu"
>>> --disable-pie --disable-sdl --disable-werror # This is what is
>>> required at work
>>> $ ./aarch64-softmmu/qemu-system-aarch64 -M xlnx-ep108 -display none
>>> -kernel ./u-boot.elf -m 8000000 -nographic -serial mon:stdio # Boot
>>> u-boot on QEMU
>>>
>>> The image I'm using is available at: http://1drv.ms/1NxDXLo
>>>
>>
>> So it's not a core bug. That container_of in ahci_lower_irq is
>> incorrectly assuming that the passed AHCIState * is always for a PCI,
>> which it is not in the sysbus case. So it's incorrectly getting the
>> offset of QOM the object and the QOM cast is treating some invalid
>> offset into the (or past) object as a QOM object base address.
>>
>> The simplest solution is a back pointer in AHCIState to the
>> encapsulating device (would be a DeviceState *). The container_of is
>> replaced with a nav of this pointer and then the conditional PCI cast
>> can work.
>
> This seems to fix the problem.

I assume you have the appropriate setters for the new variable
elsewhere in the code as well?

>It seems hacky though, I can't find a
> better way to check the validity of the PCIDevice. Any ideas?
>

So there a few problems in the way of a correct solution. The caller
for ahci_lower_irq does not have access to the QOM object pointer,
it's been abstracted away by AHCIState (which is not a QOM object). So
you would need to replumb the call path to ahci_lower_irq to pass the
QOM object. This would let you drop the container_of completely.

The next step would be to virtualise ahci_lower_irq, as this is
implementation dependent (assume specific devices really do need to
control the use of PCI MSI?), one implementation for sysbus, one for
PCI. This is blocked by the re-plumbing described above as the
virtualised called itself will need a ptr to the QOM object.

But I think the back ptr is an acceptable solution for the meantime,
this is a clear bug in Sysbus AHCI and should probably even go to
qemu-stable.

> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index 02d85fa..77e58a9 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -137,8 +137,11 @@ static void ahci_irq_raise(AHCIState *s, AHCIDevice *dev)
>  static void ahci_irq_lower(AHCIState *s, AHCIDevice *dev)
>  {
>      AHCIPCIState *d = container_of(s, AHCIPCIState, ahci);
> -    PCIDevice *pci_dev =
> -        (PCIDevice *)object_dynamic_cast(OBJECT(d), TYPE_PCI_DEVICE);
> +    PCIDevice *pci_dev = NULL;
> +
> +    if (s->parent_obj) {

I would make the parent obj compulsory for all AHCIState
implementations and drop the NULL guard.

> +        pci_dev = PCI_DEVICE(d);
> +    }
>
>      DPRINTF(0, "lower irq\n");
>
> diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
> index c055d6b..ac7d2de 100644
> --- a/hw/ide/ahci.h
> +++ b/hw/ide/ahci.h
> @@ -287,6 +287,8 @@ struct AHCIDevice {
>  };
>
>  typedef struct AHCIState {
> +    DeviceState *parent_obj;

This name is really for QOM inline parents. We decided a while back to
use "parent" for the QOM parents and "container" for non-parental
containers. Memory regions use the .container field for a similar
purpose.

Regards,
Peter

> +
>      AHCIDevice *dev;
>      AHCIControlRegs control_regs;
>      MemoryRegion mem;
>
> Thanks,
>
> Alistair
>
>>
>> Regards,
>> Peter
>>
>>> Thanks,
>>>
>>> Alistair
>>>
>>>>>
>>>>> Thanks,
>>>>> Andreas
>>>>>
>>>>> --
>>>>> SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
>>>>> GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)
>>>>
>>
John Snow Aug. 26, 2015, 9:46 p.m. UTC | #2
On 08/26/2015 05:02 PM, Peter Crosthwaite wrote:
> On Wed, Aug 26, 2015 at 1:36 PM, Alistair Francis
> <alistair.francis@xilinx.com> wrote:
>> On Tue, Aug 25, 2015 at 12:43 AM, Peter Crosthwaite
>> <crosthwaitepeter@gmail.com> wrote:
>>> On Mon, Aug 24, 2015 at 4:36 PM, Alistair Francis
>>> <alistair.francis@xilinx.com> wrote:
>>>> On Mon, Aug 17, 2015 at 4:37 PM, Peter Crosthwaite
>>>> <crosthwaitepeter@gmail.com> wrote:
>>>>> On Mon, Aug 17, 2015 at 3:33 PM, Andreas Färber <afaerber@suse.de> wrote:
>>>>>> Am 18.08.2015 um 00:24 schrieb Alistair Francis:
>>>>>>> On Sat, Aug 15, 2015 at 2:22 PM, Peter Crosthwaite
>>>>>>> <crosthwaitepeter@gmail.com> wrote:
>>>>>>>> On Mon, Jul 27, 2015 at 11:37 AM, Alistair Francis
>>>>>>>> <alistair.francis@xilinx.com> wrote:
>>>>>>>>> If the ObjectClass has no type return NULL instead of trying to compare
>>>>>>>>> the type name.
>>>>>>>>>
>>>>>>>>
>>>>>>>> What was the issue?
>>>>>>>
>>>>>>> There is a seg fault in object_class_dynamic_cast() because there is
>>>>>>> no type in the ObjectClass struct.
>>>>>>
>>>>>> That should never happen, ever since TYPE_OBJECT is no longer NULL.
>>>>>>
>>>>>>> It happens when it is trying to cast the "pci-device", which is called
>>>>>>> from the ahci_irq_lower() function. The function is testing if the
>>>>>>> device is a pci device, so it should return NULL if it isn't valid.
>>>>>
>>>>> Yes so I vaguely remember this now. It is about MSI interrupts which
>>>>> have nothing to do with sysbus implementation. My solution was to rip
>>>>> that PCI specific stuff out of AHCI completely in my branch. Should
>>>>> sysbus and PCI AHCI classes install their own separate logic for this
>>>>> part via a virtualised hook?
>>>>>
>>>>> On the topic though, I notice many PCI devices have this MSI specific
>>>>> logic in them. Is it possible for devs to just treat interrupts as
>>>>> pins and the PCI layers do the MSI vs non-MSI logic switch in core
>>>>> layers?
>>>>>
>>>>> If Andreas' idea don't work this is still a core QOM bug though. I
>>>>> think object_dynamic_cast should not have this segfault when passed a
>>>>> non implementing object.
>>>>>
>>>>> Regards,
>>>>> Peter
>>>>>
>>>>>>
>>>>>> It rather sounds as if some build-time dependency is wrong, which we
>>>>>> used to run into for the Container type before Paolo macrofied this.
>>>>>>
>>>>>> Please try again with a clean build - if it still occurs, we'll need a
>>>>>> reproducible test case to investigate what is going on rather than
>>>>>> papering over a latent bug.
>>>>
>>>> Hey,
>>>>
>>>> Sorry abut the delay, but I didn't get a chance to look at this last
>>>> week. I tried with a clean setup and still see the seg fault.
>>>>
>>>> I will try to look into it more this week, but if anyone is interested
>>>> here are the steps to reproduce:
>>>>
>>>> On the latest mainline QEMU, with my 2nd and 3rd patches applied
>>>> $ ./configure --target-list="aarch64-softmmu,microblazeel-softmmu"
>>>> --disable-pie --disable-sdl --disable-werror # This is what is
>>>> required at work
>>>> $ ./aarch64-softmmu/qemu-system-aarch64 -M xlnx-ep108 -display none
>>>> -kernel ./u-boot.elf -m 8000000 -nographic -serial mon:stdio # Boot
>>>> u-boot on QEMU
>>>>
>>>> The image I'm using is available at: http://1drv.ms/1NxDXLo
>>>>
>>>
>>> So it's not a core bug. That container_of in ahci_lower_irq is
>>> incorrectly assuming that the passed AHCIState * is always for a PCI,
>>> which it is not in the sysbus case. So it's incorrectly getting the
>>> offset of QOM the object and the QOM cast is treating some invalid
>>> offset into the (or past) object as a QOM object base address.
>>>
>>> The simplest solution is a back pointer in AHCIState to the
>>> encapsulating device (would be a DeviceState *). The container_of is
>>> replaced with a nav of this pointer and then the conditional PCI cast
>>> can work.
>>
>> This seems to fix the problem.
> 
> I assume you have the appropriate setters for the new variable
> elsewhere in the code as well?
> 
>> It seems hacky though, I can't find a
>> better way to check the validity of the PCIDevice. Any ideas?
>>
> 
> So there a few problems in the way of a correct solution. The caller
> for ahci_lower_irq does not have access to the QOM object pointer,
> it's been abstracted away by AHCIState (which is not a QOM object). So
> you would need to replumb the call path to ahci_lower_irq to pass the
> QOM object. This would let you drop the container_of completely.
> 
> The next step would be to virtualise ahci_lower_irq, as this is
> implementation dependent (assume specific devices really do need to
> control the use of PCI MSI?), one implementation for sysbus, one for
> PCI. This is blocked by the re-plumbing described above as the
> virtualised called itself will need a ptr to the QOM object.
> 
> But I think the back ptr is an acceptable solution for the meantime,
> this is a clear bug in Sysbus AHCI and should probably even go to
> qemu-stable.
> 

I'm not intricately familiar with how the QOM plumbing works, but I can
definitely see how assuming all AHCIState pointers come from
AHCIPCIState is a problem...

For the uninitiated, how does MSI work with Sysbus? What does a Sysbus
AHCI device look like to a guest, and what happens if it tries to
utilize the functionality?

(Well, segfault, I guess.)

If someone wants to clue in the device model newbie and send a patch my
way, I'll take it.

--js

>> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
>> index 02d85fa..77e58a9 100644
>> --- a/hw/ide/ahci.c
>> +++ b/hw/ide/ahci.c
>> @@ -137,8 +137,11 @@ static void ahci_irq_raise(AHCIState *s, AHCIDevice *dev)
>>  static void ahci_irq_lower(AHCIState *s, AHCIDevice *dev)
>>  {
>>      AHCIPCIState *d = container_of(s, AHCIPCIState, ahci);
>> -    PCIDevice *pci_dev =
>> -        (PCIDevice *)object_dynamic_cast(OBJECT(d), TYPE_PCI_DEVICE);
>> +    PCIDevice *pci_dev = NULL;
>> +
>> +    if (s->parent_obj) {
> 
> I would make the parent obj compulsory for all AHCIState
> implementations and drop the NULL guard.
> 
>> +        pci_dev = PCI_DEVICE(d);
>> +    }
>>
>>      DPRINTF(0, "lower irq\n");
>>
>> diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
>> index c055d6b..ac7d2de 100644
>> --- a/hw/ide/ahci.h
>> +++ b/hw/ide/ahci.h
>> @@ -287,6 +287,8 @@ struct AHCIDevice {
>>  };
>>
>>  typedef struct AHCIState {
>> +    DeviceState *parent_obj;
> 
> This name is really for QOM inline parents. We decided a while back to
> use "parent" for the QOM parents and "container" for non-parental
> containers. Memory regions use the .container field for a similar
> purpose.
> 
> Regards,
> Peter
> 
>> +
>>      AHCIDevice *dev;
>>      AHCIControlRegs control_regs;
>>      MemoryRegion mem;
>>
>> Thanks,
>>
>> Alistair
>>
>>>
>>> Regards,
>>> Peter
>>>
>>>> Thanks,
>>>>
>>>> Alistair
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Andreas
>>>>>>
>>>>>> --
>>>>>> SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
>>>>>> GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)
>>>>>
>>>
Peter Crosthwaite Aug. 26, 2015, 10:15 p.m. UTC | #3
On Wed, Aug 26, 2015 at 2:46 PM, John Snow <jsnow@redhat.com> wrote:
>
>
> On 08/26/2015 05:02 PM, Peter Crosthwaite wrote:
>> On Wed, Aug 26, 2015 at 1:36 PM, Alistair Francis
>> <alistair.francis@xilinx.com> wrote:
>>> On Tue, Aug 25, 2015 at 12:43 AM, Peter Crosthwaite
>>> <crosthwaitepeter@gmail.com> wrote:
>>>> On Mon, Aug 24, 2015 at 4:36 PM, Alistair Francis
>>>> <alistair.francis@xilinx.com> wrote:
>>>>> On Mon, Aug 17, 2015 at 4:37 PM, Peter Crosthwaite
>>>>> <crosthwaitepeter@gmail.com> wrote:
>>>>>> On Mon, Aug 17, 2015 at 3:33 PM, Andreas Färber <afaerber@suse.de> wrote:
>>>>>>> Am 18.08.2015 um 00:24 schrieb Alistair Francis:
>>>>>>>> On Sat, Aug 15, 2015 at 2:22 PM, Peter Crosthwaite
>>>>>>>> <crosthwaitepeter@gmail.com> wrote:
>>>>>>>>> On Mon, Jul 27, 2015 at 11:37 AM, Alistair Francis
>>>>>>>>> <alistair.francis@xilinx.com> wrote:
>>>>>>>>>> If the ObjectClass has no type return NULL instead of trying to compare
>>>>>>>>>> the type name.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What was the issue?
>>>>>>>>
>>>>>>>> There is a seg fault in object_class_dynamic_cast() because there is
>>>>>>>> no type in the ObjectClass struct.
>>>>>>>
>>>>>>> That should never happen, ever since TYPE_OBJECT is no longer NULL.
>>>>>>>
>>>>>>>> It happens when it is trying to cast the "pci-device", which is called
>>>>>>>> from the ahci_irq_lower() function. The function is testing if the
>>>>>>>> device is a pci device, so it should return NULL if it isn't valid.
>>>>>>
>>>>>> Yes so I vaguely remember this now. It is about MSI interrupts which
>>>>>> have nothing to do with sysbus implementation. My solution was to rip
>>>>>> that PCI specific stuff out of AHCI completely in my branch. Should
>>>>>> sysbus and PCI AHCI classes install their own separate logic for this
>>>>>> part via a virtualised hook?
>>>>>>
>>>>>> On the topic though, I notice many PCI devices have this MSI specific
>>>>>> logic in them. Is it possible for devs to just treat interrupts as
>>>>>> pins and the PCI layers do the MSI vs non-MSI logic switch in core
>>>>>> layers?
>>>>>>
>>>>>> If Andreas' idea don't work this is still a core QOM bug though. I
>>>>>> think object_dynamic_cast should not have this segfault when passed a
>>>>>> non implementing object.
>>>>>>
>>>>>> Regards,
>>>>>> Peter
>>>>>>
>>>>>>>
>>>>>>> It rather sounds as if some build-time dependency is wrong, which we
>>>>>>> used to run into for the Container type before Paolo macrofied this.
>>>>>>>
>>>>>>> Please try again with a clean build - if it still occurs, we'll need a
>>>>>>> reproducible test case to investigate what is going on rather than
>>>>>>> papering over a latent bug.
>>>>>
>>>>> Hey,
>>>>>
>>>>> Sorry abut the delay, but I didn't get a chance to look at this last
>>>>> week. I tried with a clean setup and still see the seg fault.
>>>>>
>>>>> I will try to look into it more this week, but if anyone is interested
>>>>> here are the steps to reproduce:
>>>>>
>>>>> On the latest mainline QEMU, with my 2nd and 3rd patches applied
>>>>> $ ./configure --target-list="aarch64-softmmu,microblazeel-softmmu"
>>>>> --disable-pie --disable-sdl --disable-werror # This is what is
>>>>> required at work
>>>>> $ ./aarch64-softmmu/qemu-system-aarch64 -M xlnx-ep108 -display none
>>>>> -kernel ./u-boot.elf -m 8000000 -nographic -serial mon:stdio # Boot
>>>>> u-boot on QEMU
>>>>>
>>>>> The image I'm using is available at: http://1drv.ms/1NxDXLo
>>>>>
>>>>
>>>> So it's not a core bug. That container_of in ahci_lower_irq is
>>>> incorrectly assuming that the passed AHCIState * is always for a PCI,
>>>> which it is not in the sysbus case. So it's incorrectly getting the
>>>> offset of QOM the object and the QOM cast is treating some invalid
>>>> offset into the (or past) object as a QOM object base address.
>>>>
>>>> The simplest solution is a back pointer in AHCIState to the
>>>> encapsulating device (would be a DeviceState *). The container_of is
>>>> replaced with a nav of this pointer and then the conditional PCI cast
>>>> can work.
>>>
>>> This seems to fix the problem.
>>
>> I assume you have the appropriate setters for the new variable
>> elsewhere in the code as well?
>>
>>> It seems hacky though, I can't find a
>>> better way to check the validity of the PCIDevice. Any ideas?
>>>
>>
>> So there a few problems in the way of a correct solution. The caller
>> for ahci_lower_irq does not have access to the QOM object pointer,
>> it's been abstracted away by AHCIState (which is not a QOM object). So
>> you would need to replumb the call path to ahci_lower_irq to pass the
>> QOM object. This would let you drop the container_of completely.
>>
>> The next step would be to virtualise ahci_lower_irq, as this is
>> implementation dependent (assume specific devices really do need to
>> control the use of PCI MSI?), one implementation for sysbus, one for
>> PCI. This is blocked by the re-plumbing described above as the
>> virtualised called itself will need a ptr to the QOM object.
>>
>> But I think the back ptr is an acceptable solution for the meantime,
>> this is a clear bug in Sysbus AHCI and should probably even go to
>> qemu-stable.
>>
>
> I'm not intricately familiar with how the QOM plumbing works, but I can
> definitely see how assuming all AHCIState pointers come from
> AHCIPCIState is a problem...
>
> For the uninitiated, how does MSI work with Sysbus?

No such thing :) Interrupts in Sysbus shouldn't really exist and those
that do are just a thin wrapper around raw pins. The short answer is
MSI is a PCI specific concept and sysbus is an alternative to PCI.
Long answer, is any form of MSI requires some sort of transport layer
capable of messaging an interrupt controller from a device with the
device as the initiator. This is not possible in most real busses
which we use the sysbus abstraction for.

> What does a Sysbus
> AHCI device look like to a guest, and what happens if it tries to

Sysbus AHCI is basically the raw MMIO regions mapped onto the "system
bus" by the machine model. Usually this is done as static mapping. The
most common user of this is ARM (and other embedded) SoCs. The
downside is there is no standard for remapping the device or self
identification of the device (as provided by PCI standard). The
existance of the device is made known to a kernel usually by a device
tree blob.

In the Xilinx MPSoC case, the AHCI controller is on the SoC (with the
ARM processor and many other IO peripherals). There are no cards or
even inter-chip connectivites in Alistairs AHCI connection.

> utilize the functionality?
>

If you rework an exsiting PCI driver to use raw MMIO regions and
interrupts rather than bars and MSIs it usually works with minor edits
only. SDHCI, AHCI, EHCI, xHCI at least all have both PCI
implementations and real world sysbus implementations.

What I would be interested in, is it possible to push all MSI code up
to the PCI core to completely remove the need for all PCIisms in AHCI?
This seems to be the only thing that is PCI specific.

Regards,
Peter

> (Well, segfault, I guess.)
>
> If someone wants to clue in the device model newbie and send a patch my
> way, I'll take it.
>
> --js
>
>>> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
>>> index 02d85fa..77e58a9 100644
>>> --- a/hw/ide/ahci.c
>>> +++ b/hw/ide/ahci.c
>>> @@ -137,8 +137,11 @@ static void ahci_irq_raise(AHCIState *s, AHCIDevice *dev)
>>>  static void ahci_irq_lower(AHCIState *s, AHCIDevice *dev)
>>>  {
>>>      AHCIPCIState *d = container_of(s, AHCIPCIState, ahci);
>>> -    PCIDevice *pci_dev =
>>> -        (PCIDevice *)object_dynamic_cast(OBJECT(d), TYPE_PCI_DEVICE);
>>> +    PCIDevice *pci_dev = NULL;
>>> +
>>> +    if (s->parent_obj) {
>>
>> I would make the parent obj compulsory for all AHCIState
>> implementations and drop the NULL guard.
>>
>>> +        pci_dev = PCI_DEVICE(d);
>>> +    }
>>>
>>>      DPRINTF(0, "lower irq\n");
>>>
>>> diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
>>> index c055d6b..ac7d2de 100644
>>> --- a/hw/ide/ahci.h
>>> +++ b/hw/ide/ahci.h
>>> @@ -287,6 +287,8 @@ struct AHCIDevice {
>>>  };
>>>
>>>  typedef struct AHCIState {
>>> +    DeviceState *parent_obj;
>>
>> This name is really for QOM inline parents. We decided a while back to
>> use "parent" for the QOM parents and "container" for non-parental
>> containers. Memory regions use the .container field for a similar
>> purpose.
>>
>> Regards,
>> Peter
>>
>>> +
>>>      AHCIDevice *dev;
>>>      AHCIControlRegs control_regs;
>>>      MemoryRegion mem;
>>>
>>> Thanks,
>>>
>>> Alistair
>>>
>>>>
>>>> Regards,
>>>> Peter
>>>>
>>>>> Thanks,
>>>>>
>>>>> Alistair
>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Andreas
>>>>>>>
>>>>>>> --
>>>>>>> SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
>>>>>>> GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)
>>>>>>
>>>>
>
Peter Maydell Aug. 26, 2015, 10:39 p.m. UTC | #4
On 26 August 2015 at 23:15, Peter Crosthwaite
<crosthwaitepeter@gmail.com> wrote:
> On Wed, Aug 26, 2015 at 2:46 PM, John Snow <jsnow@redhat.com> wrote:
>> For the uninitiated, how does MSI work with Sysbus?
>
> No such thing :) Interrupts in Sysbus shouldn't really exist and those
> that do are just a thin wrapper around raw pins. The short answer is
> MSI is a PCI specific concept and sysbus is an alternative to PCI.

My preferred way to think about this is that the 'sysbus' device
is the raw "whatever-it-is chip" (uart, usb controller, whatever),
and you can wire that chip up directly on a board (sysbus), or wire
it up to a pci bus. (Yeah, in practice the PCI functionality is
probably on the same bit of silicon, but as a model it works
more or less.)

You can see something similar going on in our uart models:
we have a pci-serial, an isa-serial and some embedded UARTs
which all use the same core serial code but present a different
interface to the rest of the system.


> What I would be interested in, is it possible to push all MSI code up
> to the PCI core to completely remove the need for all PCIisms in AHCI?
> This seems to be the only thing that is PCI specific.

Possibly it might have to live in a pci-ahci wrapper class,
but it ought to be possible to keep it out of the common
ahci code I hope.

-- PMM
John Snow Aug. 26, 2015, 11:11 p.m. UTC | #5
On 08/26/2015 06:15 PM, Peter Crosthwaite wrote:
> On Wed, Aug 26, 2015 at 2:46 PM, John Snow <jsnow@redhat.com> wrote:
>>
>>
>> On 08/26/2015 05:02 PM, Peter Crosthwaite wrote:

[snip]

>>>
>>> So there a few problems in the way of a correct solution. The caller
>>> for ahci_lower_irq does not have access to the QOM object pointer,
>>> it's been abstracted away by AHCIState (which is not a QOM object). So
>>> you would need to replumb the call path to ahci_lower_irq to pass the
>>> QOM object. This would let you drop the container_of completely.
>>>
>>> The next step would be to virtualise ahci_lower_irq, as this is
>>> implementation dependent (assume specific devices really do need to
>>> control the use of PCI MSI?), one implementation for sysbus, one for
>>> PCI. This is blocked by the re-plumbing described above as the
>>> virtualised called itself will need a ptr to the QOM object.
>>>
>>> But I think the back ptr is an acceptable solution for the meantime,
>>> this is a clear bug in Sysbus AHCI and should probably even go to
>>> qemu-stable.
>>>
>>
>> I'm not intricately familiar with how the QOM plumbing works, but I can
>> definitely see how assuming all AHCIState pointers come from
>> AHCIPCIState is a problem...
>>
>> For the uninitiated, how does MSI work with Sysbus?
> 
> No such thing :) Interrupts in Sysbus shouldn't really exist and those
> that do are just a thin wrapper around raw pins. The short answer is
> MSI is a PCI specific concept and sysbus is an alternative to PCI.
> Long answer, is any form of MSI requires some sort of transport layer
> capable of messaging an interrupt controller from a device with the
> device as the initiator. This is not possible in most real busses
> which we use the sysbus abstraction for.
> 

Ah, OK ... I seemed to recall there being some MSI bits in the HBA
registers, so I was confused. However, I think it's just a read-only
status bit, which allows it to just always be off for your device.

>> What does a Sysbus
>> AHCI device look like to a guest, and what happens if it tries to
> 
> Sysbus AHCI is basically the raw MMIO regions mapped onto the "system
> bus" by the machine model. Usually this is done as static mapping. The
> most common user of this is ARM (and other embedded) SoCs. The
> downside is there is no standard for remapping the device or self
> identification of the device (as provided by PCI standard). The
> existance of the device is made known to a kernel usually by a device
> tree blob.
> 
> In the Xilinx MPSoC case, the AHCI controller is on the SoC (with the
> ARM processor and many other IO peripherals). There are no cards or
> even inter-chip connectivites in Alistairs AHCI connection.
> 
>> utilize the functionality?
>>
> 
> If you rework an exsiting PCI driver to use raw MMIO regions and
> interrupts rather than bars and MSIs it usually works with minor edits
> only. SDHCI, AHCI, EHCI, xHCI at least all have both PCI
> implementations and real world sysbus implementations.
> 
> What I would be interested in, is it possible to push all MSI code up
> to the PCI core to completely remove the need for all PCIisms in AHCI?
> This seems to be the only thing that is PCI specific.

Fields that reference MSI in the AHCI spec are:

GHC.MRSM (MSI Revert To Single Message, read-only, is 0 when MSI is off)
CCC_CTL.INT (Advertises MSI vector)

You're save leaving both of these to zero, so I think there's nothing
else PCI-specific in the HBA region.

Anyway, apologies for not noticing this while I was reworking the AHCI
device!

> 
> Regards,
> Peter
> 
>> (Well, segfault, I guess.)
>>
>> If someone wants to clue in the device model newbie and send a patch my
>> way, I'll take it.
>>
>> --js
>>

[snip]
Alistair Francis Aug. 27, 2015, 6:56 p.m. UTC | #6
On Wed, Aug 26, 2015 at 4:11 PM, John Snow <jsnow@redhat.com> wrote:
>
>
> On 08/26/2015 06:15 PM, Peter Crosthwaite wrote:
>> On Wed, Aug 26, 2015 at 2:46 PM, John Snow <jsnow@redhat.com> wrote:
>>>
>>>
>>> On 08/26/2015 05:02 PM, Peter Crosthwaite wrote:
>
> [snip]
>
>>>>
>>>> So there a few problems in the way of a correct solution. The caller
>>>> for ahci_lower_irq does not have access to the QOM object pointer,
>>>> it's been abstracted away by AHCIState (which is not a QOM object). So
>>>> you would need to replumb the call path to ahci_lower_irq to pass the
>>>> QOM object. This would let you drop the container_of completely.

Ok, I think I have figured it out. It took me a while to get my head
around what is going on here. I'm just testing then I'll send a new
version of the series. Thanks for your help Peter.

Thanks,

Alistair

>>>>
>>>> The next step would be to virtualise ahci_lower_irq, as this is
>>>> implementation dependent (assume specific devices really do need to
>>>> control the use of PCI MSI?), one implementation for sysbus, one for
>>>> PCI. This is blocked by the re-plumbing described above as the
>>>> virtualised called itself will need a ptr to the QOM object.
>>>>
>>>> But I think the back ptr is an acceptable solution for the meantime,
>>>> this is a clear bug in Sysbus AHCI and should probably even go to
>>>> qemu-stable.
>>>>
>>>
>>> I'm not intricately familiar with how the QOM plumbing works, but I can
>>> definitely see how assuming all AHCIState pointers come from
>>> AHCIPCIState is a problem...
>>>
>>> For the uninitiated, how does MSI work with Sysbus?
>>
>> No such thing :) Interrupts in Sysbus shouldn't really exist and those
>> that do are just a thin wrapper around raw pins. The short answer is
>> MSI is a PCI specific concept and sysbus is an alternative to PCI.
>> Long answer, is any form of MSI requires some sort of transport layer
>> capable of messaging an interrupt controller from a device with the
>> device as the initiator. This is not possible in most real busses
>> which we use the sysbus abstraction for.
>>
>
> Ah, OK ... I seemed to recall there being some MSI bits in the HBA
> registers, so I was confused. However, I think it's just a read-only
> status bit, which allows it to just always be off for your device.
>
>>> What does a Sysbus
>>> AHCI device look like to a guest, and what happens if it tries to
>>
>> Sysbus AHCI is basically the raw MMIO regions mapped onto the "system
>> bus" by the machine model. Usually this is done as static mapping. The
>> most common user of this is ARM (and other embedded) SoCs. The
>> downside is there is no standard for remapping the device or self
>> identification of the device (as provided by PCI standard). The
>> existance of the device is made known to a kernel usually by a device
>> tree blob.
>>
>> In the Xilinx MPSoC case, the AHCI controller is on the SoC (with the
>> ARM processor and many other IO peripherals). There are no cards or
>> even inter-chip connectivites in Alistairs AHCI connection.
>>
>>> utilize the functionality?
>>>
>>
>> If you rework an exsiting PCI driver to use raw MMIO regions and
>> interrupts rather than bars and MSIs it usually works with minor edits
>> only. SDHCI, AHCI, EHCI, xHCI at least all have both PCI
>> implementations and real world sysbus implementations.
>>
>> What I would be interested in, is it possible to push all MSI code up
>> to the PCI core to completely remove the need for all PCIisms in AHCI?
>> This seems to be the only thing that is PCI specific.
>
> Fields that reference MSI in the AHCI spec are:
>
> GHC.MRSM (MSI Revert To Single Message, read-only, is 0 when MSI is off)
> CCC_CTL.INT (Advertises MSI vector)
>
> You're save leaving both of these to zero, so I think there's nothing
> else PCI-specific in the HBA region.
>
> Anyway, apologies for not noticing this while I was reworking the AHCI
> device!
>
>>
>> Regards,
>> Peter
>>
>>> (Well, segfault, I guess.)
>>>
>>> If someone wants to clue in the device model newbie and send a patch my
>>> way, I'll take it.
>>>
>>> --js
>>>
>
> [snip]
>
diff mbox

Patch

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 02d85fa..77e58a9 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -137,8 +137,11 @@  static void ahci_irq_raise(AHCIState *s, AHCIDevice *dev)
 static void ahci_irq_lower(AHCIState *s, AHCIDevice *dev)
 {
     AHCIPCIState *d = container_of(s, AHCIPCIState, ahci);
-    PCIDevice *pci_dev =
-        (PCIDevice *)object_dynamic_cast(OBJECT(d), TYPE_PCI_DEVICE);
+    PCIDevice *pci_dev = NULL;
+
+    if (s->parent_obj) {
+        pci_dev = PCI_DEVICE(d);
+    }

     DPRINTF(0, "lower irq\n");

diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index c055d6b..ac7d2de 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -287,6 +287,8 @@  struct AHCIDevice {
 };

 typedef struct AHCIState {
+    DeviceState *parent_obj;
+
     AHCIDevice *dev;
     AHCIControlRegs control_regs;
     MemoryRegion mem;