diff mbox

[v2,2/2] PCI-e device multi-function hot-add support

Message ID 1442368976-15852-3-git-send-email-caoj.fnst@cn.fujitsu.com
State New
Headers show

Commit Message

Cao jin Sept. 16, 2015, 2:02 a.m. UTC
In case user regret when hot-add multi-function, we should roll back,
device_del the function added but still not worked.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 hw/pci/pcie.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Alex Williamson Sept. 21, 2015, 6 p.m. UTC | #1
Please use different subjects that uniquely identify what each patch
does, don't simply re-use the subject for the cover patch on each.


On Wed, 2015-09-16 at 10:02 +0800, Cao jin wrote:
> In case user regret when hot-add multi-function, we should roll back,
> device_del the function added but still not worked.
> 
> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---
>  hw/pci/pcie.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 89bf61b..497f390 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -265,9 +265,27 @@ void pcie_cap_slot_hot_unplug_request_cb(HotplugHandler *hotplug_dev,
>                                           DeviceState *dev, Error **errp)
>  {
>      uint8_t *exp_cap;
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    PCIBus *bus = pci_dev->bus;
>  
>      pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, &exp_cap, errp);
>  
> +    /* handle the condition: user hot-add multi function, but regret before
> +     * finish it, and want to delete the added but not worked function. Fake
> +     * the condition: the slot is polulated, power indicator is off and power
> +     * controller is off, so device can be detached when OS write config space.
> +     */
> +    if (PCI_FUNC(pci_dev->devfn) > 0 &&
> +            bus->devices[PCI_DEVFN(0, 0)] == NULL) {
> +        pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> +                PCI_EXP_SLTSTA_PDS);

AFAICT, we're only setting this to make pcie_cap_slot_write_config()
consider this device for being unplugged.  Would it not be cleaner to
flag the device as unexposed to the guest and also use that flag to
prevent config reads and writes to the device until function 0 is
populated, so we know that the guest hasn't interacted with the device?

> +
> +        pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> +                PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);

Why do we need to test both vs just ABP, which is signaled in the
existing patch below?

> +
> +        return;
> +    }
> +
>      pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
>  }
>
Cao jin Sept. 22, 2015, 10:08 a.m. UTC | #2
Hi Alex

On 09/22/2015 02:00 AM, Alex Williamson wrote:
>
> Please use different subjects that uniquely identify what each patch
> does, don't simply re-use the subject for the cover patch on each.

OK, will change it in next version.
>
> On Wed, 2015-09-16 at 10:02 +0800, Cao jin wrote:
>> In case user regret when hot-add multi-function, we should roll back,
>> device_del the function added but still not worked.
>>
>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>> ---
>>   hw/pci/pcie.c | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>> index 89bf61b..497f390 100644
>> --- a/hw/pci/pcie.c
>> +++ b/hw/pci/pcie.c
>> @@ -265,9 +265,27 @@ void pcie_cap_slot_hot_unplug_request_cb(HotplugHandler *hotplug_dev,
>>                                            DeviceState *dev, Error **errp)
>>   {
>>       uint8_t *exp_cap;
>> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
>> +    PCIBus *bus = pci_dev->bus;
>>
>>       pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, &exp_cap, errp);
>>
>> +    /* handle the condition: user hot-add multi function, but regret before
>> +     * finish it, and want to delete the added but not worked function. Fake
>> +     * the condition: the slot is polulated, power indicator is off and power
>> +     * controller is off, so device can be detached when OS write config space.
>> +     */
>> +    if (PCI_FUNC(pci_dev->devfn) > 0 &&
>> +            bus->devices[PCI_DEVFN(0, 0)] == NULL) {
>> +        pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
>> +                PCI_EXP_SLTSTA_PDS);
>
> AFAICT, we're only setting this to make pcie_cap_slot_write_config()
> consider this device for being unplugged.  Would it not be cleaner to
> flag the device as unexposed to the guest and also use that flag to
> prevent config reads and writes to the device until function 0 is
> populated, so we know that the guest hasn't interacted with the device?
>
Yes, set PDS bit here, for the purpose that fake the unplug condition in 
pcie_cap_slot_write_config(), which means, let guest decide when to 
unplug device. So I think setting PDS bit here is necessary, am I right?

I am not quite clear about "flag device as unexposed", does the flag 
means PCI_EXP_SLTSTA_PDS bit, or anything else? Could you give more 
hints about it?
>> +
>> +        pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
>> +                PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
>
> Why do we need to test both vs just ABP, which is signaled in the
> existing patch below?
>

Test the two hotplug event, yes, ABP is enough for device_del. will 
remove PDC in next version.

>> +
>> +        return;
>> +    }
>> +
>>       pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
>>   }
>>
>
>
> .
>
Alex Williamson Sept. 22, 2015, 5:51 p.m. UTC | #3
On Tue, 2015-09-22 at 18:08 +0800, Cao jin wrote:
> Hi Alex
> 
> On 09/22/2015 02:00 AM, Alex Williamson wrote:
> >
> > Please use different subjects that uniquely identify what each patch
> > does, don't simply re-use the subject for the cover patch on each.
> 
> OK, will change it in next version.
> >
> > On Wed, 2015-09-16 at 10:02 +0800, Cao jin wrote:
> >> In case user regret when hot-add multi-function, we should roll back,
> >> device_del the function added but still not worked.
> >>
> >> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> >> ---
> >>   hw/pci/pcie.c | 18 ++++++++++++++++++
> >>   1 file changed, 18 insertions(+)
> >>
> >> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> >> index 89bf61b..497f390 100644
> >> --- a/hw/pci/pcie.c
> >> +++ b/hw/pci/pcie.c
> >> @@ -265,9 +265,27 @@ void pcie_cap_slot_hot_unplug_request_cb(HotplugHandler *hotplug_dev,
> >>                                            DeviceState *dev, Error **errp)
> >>   {
> >>       uint8_t *exp_cap;
> >> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> >> +    PCIBus *bus = pci_dev->bus;
> >>
> >>       pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, &exp_cap, errp);
> >>
> >> +    /* handle the condition: user hot-add multi function, but regret before
> >> +     * finish it, and want to delete the added but not worked function. Fake
> >> +     * the condition: the slot is polulated, power indicator is off and power
> >> +     * controller is off, so device can be detached when OS write config space.
> >> +     */
> >> +    if (PCI_FUNC(pci_dev->devfn) > 0 &&
> >> +            bus->devices[PCI_DEVFN(0, 0)] == NULL) {
> >> +        pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> >> +                PCI_EXP_SLTSTA_PDS);
> >
> > AFAICT, we're only setting this to make pcie_cap_slot_write_config()
> > consider this device for being unplugged.  Would it not be cleaner to
> > flag the device as unexposed to the guest and also use that flag to
> > prevent config reads and writes to the device until function 0 is
> > populated, so we know that the guest hasn't interacted with the device?
> >
> Yes, set PDS bit here, for the purpose that fake the unplug condition in 
> pcie_cap_slot_write_config(), which means, let guest decide when to 
> unplug device. So I think setting PDS bit here is necessary, am I right?

I would consider it a hack.  You're setting up the device a certain way
to make it appear as if the guest has configured it that way, then
effectively sending the guest a spurious hotplug request for a device
that it theoretically doesn't know about.  If we were to prevent access
to the device, couldn't we remove it directly?

> I am not quite clear about "flag device as unexposed", does the flag 
> means PCI_EXP_SLTSTA_PDS bit, or anything else? Could you give more 
> hints about it?

If function 0 doesn't exist in the slot, should the guest be able to
perform PCI config accesses to the device?  If the guest cannot do
config cycle accesses to the device, then we know the device is unused
and we don't need to involve the guest in removing it.

> >> +
> >> +        pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> >> +                PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> >
> > Why do we need to test both vs just ABP, which is signaled in the
> > existing patch below?
> >
> 
> Test the two hotplug event, yes, ABP is enough for device_del. will 
> remove PDC in next version.
> 
> >> +
> >> +        return;
> >> +    }
> >> +
> >>       pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
> >>   }
> >>
> >
> >
> > .
> >
>
Cao jin Sept. 23, 2015, 1:37 p.m. UTC | #4
Hi Alex,

On 09/23/2015 01:51 AM, Alex Williamson wrote:
> On Tue, 2015-09-22 at 18:08 +0800, Cao jin wrote:
>> Hi Alex
>>
>> On 09/22/2015 02:00 AM, Alex Williamson wrote:
>>>
>>> Please use different subjects that uniquely identify what each patch
>>> does, don't simply re-use the subject for the cover patch on each.
>>
>> OK, will change it in next version.
>>>
>>> On Wed, 2015-09-16 at 10:02 +0800, Cao jin wrote:
>>>> In case user regret when hot-add multi-function, we should roll back,
>>>> device_del the function added but still not worked.
>>>>
>>>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>>>> ---
>>>>    hw/pci/pcie.c | 18 ++++++++++++++++++
>>>>    1 file changed, 18 insertions(+)
>>>>
>>>> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>>>> index 89bf61b..497f390 100644
>>>> --- a/hw/pci/pcie.c
>>>> +++ b/hw/pci/pcie.c
>>>> @@ -265,9 +265,27 @@ void pcie_cap_slot_hot_unplug_request_cb(HotplugHandler *hotplug_dev,
>>>>                                             DeviceState *dev, Error **errp)
>>>>    {
>>>>        uint8_t *exp_cap;
>>>> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
>>>> +    PCIBus *bus = pci_dev->bus;
>>>>
>>>>        pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, &exp_cap, errp);
>>>>
>>>> +    /* handle the condition: user hot-add multi function, but regret before
>>>> +     * finish it, and want to delete the added but not worked function. Fake
>>>> +     * the condition: the slot is polulated, power indicator is off and power
>>>> +     * controller is off, so device can be detached when OS write config space.
>>>> +     */
>>>> +    if (PCI_FUNC(pci_dev->devfn) > 0 &&
>>>> +            bus->devices[PCI_DEVFN(0, 0)] == NULL) {
>>>> +        pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
>>>> +                PCI_EXP_SLTSTA_PDS);
>>>
>>> AFAICT, we're only setting this to make pcie_cap_slot_write_config()
>>> consider this device for being unplugged.  Would it not be cleaner to
>>> flag the device as unexposed to the guest and also use that flag to
>>> prevent config reads and writes to the device until function 0 is
>>> populated, so we know that the guest hasn't interacted with the device?
>>>
>> Yes, set PDS bit here, for the purpose that fake the unplug condition in
>> pcie_cap_slot_write_config(), which means, let guest decide when to
>> unplug device. So I think setting PDS bit here is necessary, am I right?
>
> I would consider it a hack.  You're setting up the device a certain way
> to make it appear as if the guest has configured it that way, then
> effectively sending the guest a spurious hotplug request for a device
> that it theoretically doesn't know about.  If we were to prevent access
> to the device, couldn't we remove it directly?
>

I agree with the judgement "hack", but I am confused about the last 
sentence. please correct me if I understand it wrong.
I design the hot-add feature via executing device_add cmd several times 
with func 0 added last. Assume we have a solution implemented to prevent 
access to the device before adding func 0, but we mustn`t remove other 
func directly, because we don`t know whether user want to add func 0 at 
last or just regret.

>> I am not quite clear about "flag device as unexposed", does the flag
>> means PCI_EXP_SLTSTA_PDS bit, or anything else? Could you give more
>> hints about it?
>
> If function 0 doesn't exist in the slot, should the guest be able to
> perform PCI config accesses to the device?  If the guest cannot do
> config cycle accesses to the device, then we know the device is unused
> and we don't need to involve the guest in removing it.

if func 0 doesn`t exist, theoretically as I think, guest has no reason 
to perform PCI config access to the device, but as you said before, if 
guest does do a gratuitous full PCI bus scan(actually I am not aware in 
what condition it will happen), guest is able to find the device without 
func 0 exist.

in the condition you said above, assume we already have the solution to 
forbidden the access to device before func 0 added, does that means the 
result: guest think there is no device in the slot, but in qemu, we 
still have device data structure in, and won`t destroy it?

or I have another solution of this feature: make multi-function hot-add 
atomic, which means creating a new api, with all func params following, 
like "multifunction_device_add func0,func1,func2...", but it will be 
more and more complicated, which maybe the last solution I prefer to choose.

another question: in what way do we flag the device unexposed to guest 
before func 0 populated? My thoughts is: return 0xFFFF as vendor id when 
being accessed, do you think it is a effective way?
>>>> +
>>>> +        pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
>>>> +                PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
>>>
>>> Why do we need to test both vs just ABP, which is signaled in the
>>> existing patch below?
>>>
>>
>> Test the two hotplug event, yes, ABP is enough for device_del. will
>> remove PDC in next version.
>>
>>>> +
>>>> +        return;
>>>> +    }
>>>> +
>>>>        pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
>>>>    }
>>>>
>>>
>>>
>>> .
>>>
>>
>
>
>
> .
>
Alex Williamson Sept. 23, 2015, 6:19 p.m. UTC | #5
On Wed, 2015-09-23 at 21:37 +0800, Cao jin wrote:
> Hi Alex,
> 
> On 09/23/2015 01:51 AM, Alex Williamson wrote:
> > On Tue, 2015-09-22 at 18:08 +0800, Cao jin wrote:
> >> Hi Alex
> >>
> >> On 09/22/2015 02:00 AM, Alex Williamson wrote:
> >>>
> >>> Please use different subjects that uniquely identify what each patch
> >>> does, don't simply re-use the subject for the cover patch on each.
> >>
> >> OK, will change it in next version.
> >>>
> >>> On Wed, 2015-09-16 at 10:02 +0800, Cao jin wrote:
> >>>> In case user regret when hot-add multi-function, we should roll back,
> >>>> device_del the function added but still not worked.
> >>>>
> >>>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> >>>> ---
> >>>>    hw/pci/pcie.c | 18 ++++++++++++++++++
> >>>>    1 file changed, 18 insertions(+)
> >>>>
> >>>> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> >>>> index 89bf61b..497f390 100644
> >>>> --- a/hw/pci/pcie.c
> >>>> +++ b/hw/pci/pcie.c
> >>>> @@ -265,9 +265,27 @@ void pcie_cap_slot_hot_unplug_request_cb(HotplugHandler *hotplug_dev,
> >>>>                                             DeviceState *dev, Error **errp)
> >>>>    {
> >>>>        uint8_t *exp_cap;
> >>>> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> >>>> +    PCIBus *bus = pci_dev->bus;
> >>>>
> >>>>        pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, &exp_cap, errp);
> >>>>
> >>>> +    /* handle the condition: user hot-add multi function, but regret before
> >>>> +     * finish it, and want to delete the added but not worked function. Fake
> >>>> +     * the condition: the slot is polulated, power indicator is off and power
> >>>> +     * controller is off, so device can be detached when OS write config space.
> >>>> +     */
> >>>> +    if (PCI_FUNC(pci_dev->devfn) > 0 &&
> >>>> +            bus->devices[PCI_DEVFN(0, 0)] == NULL) {
> >>>> +        pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> >>>> +                PCI_EXP_SLTSTA_PDS);
> >>>
> >>> AFAICT, we're only setting this to make pcie_cap_slot_write_config()
> >>> consider this device for being unplugged.  Would it not be cleaner to
> >>> flag the device as unexposed to the guest and also use that flag to
> >>> prevent config reads and writes to the device until function 0 is
> >>> populated, so we know that the guest hasn't interacted with the device?
> >>>
> >> Yes, set PDS bit here, for the purpose that fake the unplug condition in
> >> pcie_cap_slot_write_config(), which means, let guest decide when to
> >> unplug device. So I think setting PDS bit here is necessary, am I right?
> >
> > I would consider it a hack.  You're setting up the device a certain way
> > to make it appear as if the guest has configured it that way, then
> > effectively sending the guest a spurious hotplug request for a device
> > that it theoretically doesn't know about.  If we were to prevent access
> > to the device, couldn't we remove it directly?
> >
> 
> I agree with the judgement "hack", but I am confused about the last 
> sentence. please correct me if I understand it wrong.
> I design the hot-add feature via executing device_add cmd several times 
> with func 0 added last. Assume we have a solution implemented to prevent 
> access to the device before adding func 0, but we mustn`t remove other 
> func directly, because we don`t know whether user want to add func 0 at 
> last or just regret.
> 
> >> I am not quite clear about "flag device as unexposed", does the flag
> >> means PCI_EXP_SLTSTA_PDS bit, or anything else? Could you give more
> >> hints about it?
> >
> > If function 0 doesn't exist in the slot, should the guest be able to
> > perform PCI config accesses to the device?  If the guest cannot do
> > config cycle accesses to the device, then we know the device is unused
> > and we don't need to involve the guest in removing it.
> 
> if func 0 doesn`t exist, theoretically as I think, guest has no reason 
> to perform PCI config access to the device, but as you said before, if 
> guest does do a gratuitous full PCI bus scan(actually I am not aware in 
> what condition it will happen), guest is able to find the device without 
> func 0 exist.
> 
> in the condition you said above, assume we already have the solution to 
> forbidden the access to device before func 0 added, does that means the 
> result: guest think there is no device in the slot, but in qemu, we 
> still have device data structure in, and won`t destroy it?
> 
> or I have another solution of this feature: make multi-function hot-add 
> atomic, which means creating a new api, with all func params following, 
> like "multifunction_device_add func0,func1,func2...", but it will be 
> more and more complicated, which maybe the last solution I prefer to choose.

Does that really solve any problems?  The interface to qemu is atomic,
but device instantiation is not, so we end up with similar problems on
failure of one of the devices.  All it would do is shorten the timeframe
and make the eventual addition of func0 more deterministic.  It would
also require explicit support up through libvirt while simply doing
func0-last ordering of independent devices is likely fairly compatible
with libvirt as-is.

> another question: in what way do we flag the device unexposed to guest 
> before func 0 populated? My thoughts is: return 0xFFFF as vendor id when 
> being accessed, do you think it is a effective way?

The PCI spec defines that config space of a non-present device should
return 0xFF.  See pci_data_read() when we don't find a PCIDevice.  A
simple extension of that test could prevent reads from config space of a
device until func0 is present, either by explicitly looking for the
func0 PCIDevice or a flag on the PCIDevice structure to indicate that
it's active.  There's a chance that a flag like that could be useful for
other purposes as well, perhaps if we have an assigned device and the
IOMMU is still being populated we could complete initialization of the
device, but hide it from userspace until we're ready.

We would need to decide how device_del works for non-exposed devices.
If we have a full PCI slot exposed to a guest and do a device_del, all
of the functions are removed.  However, AIUI this is largely a
limitation of the hotplug interface since individual functions are
physically dependent on the slot.  I would probably suggest then that if
a device is not exposed to the user, we're not bound by that behavior
and individual functions should be removable, without interaction with
the guest.  Adding MST for his opinion.  Thanks,

Alex

> >>>> +
> >>>> +        pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> >>>> +                PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> >>>
> >>> Why do we need to test both vs just ABP, which is signaled in the
> >>> existing patch below?
> >>>
> >>
> >> Test the two hotplug event, yes, ABP is enough for device_del. will
> >> remove PDC in next version.
> >>
> >>>> +
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>>        pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
> >>>>    }
> >>>>
> >>>
> >>>
> >>> .
> >>>
> >>
> >
> >
> >
> > .
> >
>
diff mbox

Patch

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 89bf61b..497f390 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -265,9 +265,27 @@  void pcie_cap_slot_hot_unplug_request_cb(HotplugHandler *hotplug_dev,
                                          DeviceState *dev, Error **errp)
 {
     uint8_t *exp_cap;
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    PCIBus *bus = pci_dev->bus;
 
     pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, &exp_cap, errp);
 
+    /* handle the condition: user hot-add multi function, but regret before
+     * finish it, and want to delete the added but not worked function. Fake
+     * the condition: the slot is polulated, power indicator is off and power
+     * controller is off, so device can be detached when OS write config space.
+     */
+    if (PCI_FUNC(pci_dev->devfn) > 0 &&
+            bus->devices[PCI_DEVFN(0, 0)] == NULL) {
+        pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
+                PCI_EXP_SLTSTA_PDS);
+
+        pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
+                PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
+
+        return;
+    }
+
     pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
 }