diff mbox

[v8,4/6] docs: Add Documentation for Mediated devices

Message ID 1476131317-358-5-git-send-email-kwankhede@nvidia.com
State New
Headers show

Commit Message

Kirti Wankhede Oct. 10, 2016, 8:28 p.m. UTC
Add file Documentation/vfio-mediated-device.txt that include details of
mediated device framework.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Change-Id: I137dd646442936090d92008b115908b7b2c7bc5d
---
 Documentation/vfio-mdev/vfio-mediated-device.txt | 219 +++++++++++++++++++++++
 1 file changed, 219 insertions(+)
 create mode 100644 Documentation/vfio-mdev/vfio-mediated-device.txt

Comments

Daniel P. Berrangé Oct. 11, 2016, 2:14 p.m. UTC | #1
On Tue, Oct 11, 2016 at 01:58:35AM +0530, Kirti Wankhede wrote:
> Add file Documentation/vfio-mediated-device.txt that include details of
> mediated device framework.
> 
> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> Signed-off-by: Neo Jia <cjia@nvidia.com>
> Change-Id: I137dd646442936090d92008b115908b7b2c7bc5d
> ---
>  Documentation/vfio-mdev/vfio-mediated-device.txt | 219 +++++++++++++++++++++++
>  1 file changed, 219 insertions(+)
>  create mode 100644 Documentation/vfio-mdev/vfio-mediated-device.txt
> 
> diff --git a/Documentation/vfio-mdev/vfio-mediated-device.txt b/Documentation/vfio-mdev/vfio-mediated-device.txt
> new file mode 100644
> index 000000000000..c1eacb83807b
> --- /dev/null
> +++ b/Documentation/vfio-mdev/vfio-mediated-device.txt
> @@ -0,0 +1,219 @@
> +/*
> + * VFIO Mediated devices
> + *
> + * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.

Adding "All rights reserved" is bogus since you're providing it under
the GPL, but I see countless kernel source files have this, so meh.

> + *     Author: Neo Jia <cjia@nvidia.com>
> + *             Kirti Wankhede <kwankhede@nvidia.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +

> +Mediated device management interface via sysfs
> +----------------------------------------------
> +Management interface via sysfs allows user space software, like libvirt, to
> +query and configure mediated device in a HW agnostic fashion. This management
> +interface provide flexibility to underlying physical device's driver to support
> +mediated device hotplug, multiple mediated devices per virtual machine, multiple
> +mediated devices from different physical devices, etc.
> +
> +Under per-physical device sysfs:
> +--------------------------------
> +
> +* mdev_supported_types:
> +    List of current supported mediated device types and its details are added
> +in this directory in following format:
> +
> +|- <parent phy device>
> +|--- Vendor-specific-attributes [optional]
> +|--- mdev_supported_types
> +|     |--- <type id>
> +|     |   |--- create
> +|     |   |--- name
> +|     |   |--- available_instances
> +|     |   |--- description /class
> +|     |   |--- [devices]
> +|     |--- <type id>
> +|     |   |--- create
> +|     |   |--- name
> +|     |   |--- available_instances
> +|     |   |--- description /class
> +|     |   |--- [devices]
> +|     |--- <type id>
> +|          |--- create
> +|          |--- name
> +|          |--- available_instances
> +|          |--- description /class
> +|          |--- [devices]
> +
> +[TBD : description or class is yet to be decided. This will change.]

I thought that in previous discussions we had agreed to drop
the <type id> concept and use the name as the unique identifier.
When reporting these types in libvirt we won't want to report
the type id values - we'll want the name strings to be unique.

Based on this sysfs spec, the only fields we would report in
libvirt would be name + available_instances.

> +Under per mdev device:
> +----------------------
> +
> +|- <parent phy device>
> +|--- $MDEV_UUID
> +         |--- remove
> +         |--- {link to its type}
> +         |--- vendor-specific-attributes [optional]

Again, I thought we'd agreed to not have arbitrary vendor
specific attributes ?

That said, I don't mind them existing in kernel sysfs, just
be aware that we'll *not* expose any vendor specific attributes
in libvirt, so your functional implementation must not rely on
these attributes being used in any way by libvirt.



> +
> +* remove: (write only)
> +	Write '1' to 'remove' file would destroy mdev device. Vendor driver can
> +	fail remove() callback if that device is active and vendor driver
> +	doesn't support hot-unplug.
> +	Example:
> +	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove

> +Mediated device Hotplug:
> +------------------------
> +
> +Mediated devices can be created and assigned during runtime. Procedure to
> +hot-plug mediated device is same as hot-plug PCI device.

Generally this looks much saner now all the grouping stuff has gone.



Regards,
Daniel
Kirti Wankhede Oct. 11, 2016, 8:44 p.m. UTC | #2
On 10/11/2016 7:44 PM, Daniel P. Berrange wrote:
> On Tue, Oct 11, 2016 at 01:58:35AM +0530, Kirti Wankhede wrote:
>> Add file Documentation/vfio-mediated-device.txt that include details of
>> mediated device framework.
>>
>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
>> Signed-off-by: Neo Jia <cjia@nvidia.com>
>> Change-Id: I137dd646442936090d92008b115908b7b2c7bc5d
>> ---
>>  Documentation/vfio-mdev/vfio-mediated-device.txt | 219 +++++++++++++++++++++++
>>  1 file changed, 219 insertions(+)
>>  create mode 100644 Documentation/vfio-mdev/vfio-mediated-device.txt
>>
>> diff --git a/Documentation/vfio-mdev/vfio-mediated-device.txt b/Documentation/vfio-mdev/vfio-mediated-device.txt
>> new file mode 100644
>> index 000000000000..c1eacb83807b
>> --- /dev/null
>> +++ b/Documentation/vfio-mdev/vfio-mediated-device.txt
>> @@ -0,0 +1,219 @@
>> +/*
>> + * VFIO Mediated devices
>> + *
>> + * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
> 
> Adding "All rights reserved" is bogus since you're providing it under
> the GPL, but I see countless kernel source files have this, so meh.
> 
>> + *     Author: Neo Jia <cjia@nvidia.com>
>> + *             Kirti Wankhede <kwankhede@nvidia.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
> 
>> +Mediated device management interface via sysfs
>> +----------------------------------------------
>> +Management interface via sysfs allows user space software, like libvirt, to
>> +query and configure mediated device in a HW agnostic fashion. This management
>> +interface provide flexibility to underlying physical device's driver to support
>> +mediated device hotplug, multiple mediated devices per virtual machine, multiple
>> +mediated devices from different physical devices, etc.
>> +
>> +Under per-physical device sysfs:
>> +--------------------------------
>> +
>> +* mdev_supported_types:
>> +    List of current supported mediated device types and its details are added
>> +in this directory in following format:
>> +
>> +|- <parent phy device>
>> +|--- Vendor-specific-attributes [optional]
>> +|--- mdev_supported_types
>> +|     |--- <type id>
>> +|     |   |--- create
>> +|     |   |--- name
>> +|     |   |--- available_instances
>> +|     |   |--- description /class
>> +|     |   |--- [devices]
>> +|     |--- <type id>
>> +|     |   |--- create
>> +|     |   |--- name
>> +|     |   |--- available_instances
>> +|     |   |--- description /class
>> +|     |   |--- [devices]
>> +|     |--- <type id>
>> +|          |--- create
>> +|          |--- name
>> +|          |--- available_instances
>> +|          |--- description /class
>> +|          |--- [devices]
>> +
>> +[TBD : description or class is yet to be decided. This will change.]
> 
> I thought that in previous discussions we had agreed to drop
> the <type id> concept and use the name as the unique identifier.
> When reporting these types in libvirt we won't want to report
> the type id values - we'll want the name strings to be unique.
> 

The 'name' might not be unique but type_id will be. For example that Neo
pointed out in earlier discussion, virtual devices can come from two
different physical devices, end user would be presented with what they
had selected but there will be internal implementation differences. In
that case 'type_id' will be unique.

> Based on this sysfs spec, the only fields we would report in
> libvirt would be name + available_instances.
> 
>> +Under per mdev device:
>> +----------------------
>> +
>> +|- <parent phy device>
>> +|--- $MDEV_UUID
>> +         |--- remove
>> +         |--- {link to its type}
>> +         |--- vendor-specific-attributes [optional]
> 
> Again, I thought we'd agreed to not have arbitrary vendor
> specific attributes ?
> 
> That said, I don't mind them existing in kernel sysfs, just
> be aware that we'll *not* expose any vendor specific attributes
> in libvirt, so your functional implementation must not rely on
> these attributes being used in any way by libvirt.
> 
> 

Right, Libvirt would not use vendor specific attributes but admin can
use these to get/set extra information for a particular device. These
are optional, so its up to vendor to provide such attributes or not.

Thanks,
Kirti

> 
>> +
>> +* remove: (write only)
>> +	Write '1' to 'remove' file would destroy mdev device. Vendor driver can
>> +	fail remove() callback if that device is active and vendor driver
>> +	doesn't support hot-unplug.
>> +	Example:
>> +	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
> 
>> +Mediated device Hotplug:
>> +------------------------
>> +
>> +Mediated devices can be created and assigned during runtime. Procedure to
>> +hot-plug mediated device is same as hot-plug PCI device.
> 
> Generally this looks much saner now all the grouping stuff has gone.
> 
> 
> 
> Regards,
> Daniel
>
Tian, Kevin Oct. 12, 2016, 1:52 a.m. UTC | #3
> From: Kirti Wankhede [mailto:kwankhede@nvidia.com]

> Sent: Wednesday, October 12, 2016 4:45 AM

> >> +* mdev_supported_types:

> >> +    List of current supported mediated device types and its details are added

> >> +in this directory in following format:

> >> +

> >> +|- <parent phy device>

> >> +|--- Vendor-specific-attributes [optional]

> >> +|--- mdev_supported_types

> >> +|     |--- <type id>

> >> +|     |   |--- create

> >> +|     |   |--- name

> >> +|     |   |--- available_instances

> >> +|     |   |--- description /class

> >> +|     |   |--- [devices]

> >> +|     |--- <type id>

> >> +|     |   |--- create

> >> +|     |   |--- name

> >> +|     |   |--- available_instances

> >> +|     |   |--- description /class

> >> +|     |   |--- [devices]

> >> +|     |--- <type id>

> >> +|          |--- create

> >> +|          |--- name

> >> +|          |--- available_instances

> >> +|          |--- description /class

> >> +|          |--- [devices]

> >> +

> >> +[TBD : description or class is yet to be decided. This will change.]

> >

> > I thought that in previous discussions we had agreed to drop

> > the <type id> concept and use the name as the unique identifier.

> > When reporting these types in libvirt we won't want to report

> > the type id values - we'll want the name strings to be unique.

> >

> 

> The 'name' might not be unique but type_id will be. For example that Neo

> pointed out in earlier discussion, virtual devices can come from two

> different physical devices, end user would be presented with what they

> had selected but there will be internal implementation differences. In

> that case 'type_id' will be unique.

> 


Hi, Kirti, my understanding is that Neo agreed to use an unique type
string (if you still called it <type id>), and then no need of additional
'name' field which can be put inside 'description' field. See below quote:

--<from Alex>--
> I think your discovery only means that for your vendor driver, the name

> will be "11" (as a string).  Perhaps you'd like some sort of vendor

> provided description within each type, but I am not in favor of having

> an arbitrary integer value imply something specific within the sysfs

> interface.  IOW, the NVIDIA vendor driver should be able to create:

> 

> 11

> ├── create

> ├── description

> ├── etc

> └── resolution

> 

> While Intel might create:

> 

> Skylake-vGPU

> ├── create

> ├── description

> ├── etc

> └── resolution

> 

> Maybe "description" is optional for vendors that use useful names?

> Thanks,


--<From Neo>--
> I think we should be able to have a unique vendor type string instead of an

> arbitrary integer value there as long as we are allowed to have a description

> field that can be used to show to the end user as "name / label". 


Thanks
Kevin
Kirti Wankhede Oct. 12, 2016, 3:13 p.m. UTC | #4
On 10/12/2016 7:22 AM, Tian, Kevin wrote:
>> From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
>> Sent: Wednesday, October 12, 2016 4:45 AM
>>>> +* mdev_supported_types:
>>>> +    List of current supported mediated device types and its details are added
>>>> +in this directory in following format:
>>>> +
>>>> +|- <parent phy device>
>>>> +|--- Vendor-specific-attributes [optional]
>>>> +|--- mdev_supported_types
>>>> +|     |--- <type id>
>>>> +|     |   |--- create
>>>> +|     |   |--- name
>>>> +|     |   |--- available_instances
>>>> +|     |   |--- description /class
>>>> +|     |   |--- [devices]
>>>> +|     |--- <type id>
>>>> +|     |   |--- create
>>>> +|     |   |--- name
>>>> +|     |   |--- available_instances
>>>> +|     |   |--- description /class
>>>> +|     |   |--- [devices]
>>>> +|     |--- <type id>
>>>> +|          |--- create
>>>> +|          |--- name
>>>> +|          |--- available_instances
>>>> +|          |--- description /class
>>>> +|          |--- [devices]
>>>> +
>>>> +[TBD : description or class is yet to be decided. This will change.]
>>>
>>> I thought that in previous discussions we had agreed to drop
>>> the <type id> concept and use the name as the unique identifier.
>>> When reporting these types in libvirt we won't want to report
>>> the type id values - we'll want the name strings to be unique.
>>>
>>
>> The 'name' might not be unique but type_id will be. For example that Neo
>> pointed out in earlier discussion, virtual devices can come from two
>> different physical devices, end user would be presented with what they
>> had selected but there will be internal implementation differences. In
>> that case 'type_id' will be unique.
>>
> 
> Hi, Kirti, my understanding is that Neo agreed to use an unique type
> string (if you still called it <type id>), and then no need of additional
> 'name' field which can be put inside 'description' field. See below quote:
> 

We had internal discussions about this within NVIDIA and found that
'name' might not be unique where as 'type_id' would be unique. I'm
refering to Neo's mail after that, where Neo do pointed that out.

https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07714.html

Thanks,
Kirti
Alex Williamson Oct. 12, 2016, 3:59 p.m. UTC | #5
On Wed, 12 Oct 2016 20:43:48 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 10/12/2016 7:22 AM, Tian, Kevin wrote:
> >> From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> >> Sent: Wednesday, October 12, 2016 4:45 AM  
> >>>> +* mdev_supported_types:
> >>>> +    List of current supported mediated device types and its details are added
> >>>> +in this directory in following format:
> >>>> +
> >>>> +|- <parent phy device>
> >>>> +|--- Vendor-specific-attributes [optional]
> >>>> +|--- mdev_supported_types
> >>>> +|     |--- <type id>
> >>>> +|     |   |--- create
> >>>> +|     |   |--- name
> >>>> +|     |   |--- available_instances
> >>>> +|     |   |--- description /class
> >>>> +|     |   |--- [devices]
> >>>> +|     |--- <type id>
> >>>> +|     |   |--- create
> >>>> +|     |   |--- name
> >>>> +|     |   |--- available_instances
> >>>> +|     |   |--- description /class
> >>>> +|     |   |--- [devices]
> >>>> +|     |--- <type id>
> >>>> +|          |--- create
> >>>> +|          |--- name
> >>>> +|          |--- available_instances
> >>>> +|          |--- description /class
> >>>> +|          |--- [devices]
> >>>> +
> >>>> +[TBD : description or class is yet to be decided. This will change.]  
> >>>
> >>> I thought that in previous discussions we had agreed to drop
> >>> the <type id> concept and use the name as the unique identifier.
> >>> When reporting these types in libvirt we won't want to report
> >>> the type id values - we'll want the name strings to be unique.
> >>>  
> >>
> >> The 'name' might not be unique but type_id will be. For example that Neo
> >> pointed out in earlier discussion, virtual devices can come from two
> >> different physical devices, end user would be presented with what they
> >> had selected but there will be internal implementation differences. In
> >> that case 'type_id' will be unique.
> >>  
> > 
> > Hi, Kirti, my understanding is that Neo agreed to use an unique type
> > string (if you still called it <type id>), and then no need of additional
> > 'name' field which can be put inside 'description' field. See below quote:
> >   
> 
> We had internal discussions about this within NVIDIA and found that
> 'name' might not be unique where as 'type_id' would be unique. I'm
> refering to Neo's mail after that, where Neo do pointed that out.
> 
> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07714.html

Everyone not privy to those internal discussions, including me, seems to
think we dropped type_id and that if a vendor does not have a stable
name, they can compose some sort of stable type description based on the
name+id, or even vendor+id, ex. NVIDIA-11.  So please share why we
haven't managed to kill off type_id yet.  No matter what internal
representation each vendor driver has of "type_id" it seems possible
for it to come up with stable string to define a given configuration.
Thanks,

Alex
Kirti Wankhede Oct. 12, 2016, 7:02 p.m. UTC | #6
On 10/12/2016 9:29 PM, Alex Williamson wrote:
> On Wed, 12 Oct 2016 20:43:48 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
>> On 10/12/2016 7:22 AM, Tian, Kevin wrote:
>>>> From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
>>>> Sent: Wednesday, October 12, 2016 4:45 AM  
>>>>>> +* mdev_supported_types:
>>>>>> +    List of current supported mediated device types and its details are added
>>>>>> +in this directory in following format:
>>>>>> +
>>>>>> +|- <parent phy device>
>>>>>> +|--- Vendor-specific-attributes [optional]
>>>>>> +|--- mdev_supported_types
>>>>>> +|     |--- <type id>
>>>>>> +|     |   |--- create
>>>>>> +|     |   |--- name
>>>>>> +|     |   |--- available_instances
>>>>>> +|     |   |--- description /class
>>>>>> +|     |   |--- [devices]
>>>>>> +|     |--- <type id>
>>>>>> +|     |   |--- create
>>>>>> +|     |   |--- name
>>>>>> +|     |   |--- available_instances
>>>>>> +|     |   |--- description /class
>>>>>> +|     |   |--- [devices]
>>>>>> +|     |--- <type id>
>>>>>> +|          |--- create
>>>>>> +|          |--- name
>>>>>> +|          |--- available_instances
>>>>>> +|          |--- description /class
>>>>>> +|          |--- [devices]
>>>>>> +
>>>>>> +[TBD : description or class is yet to be decided. This will change.]  
>>>>>
>>>>> I thought that in previous discussions we had agreed to drop
>>>>> the <type id> concept and use the name as the unique identifier.
>>>>> When reporting these types in libvirt we won't want to report
>>>>> the type id values - we'll want the name strings to be unique.
>>>>>  
>>>>
>>>> The 'name' might not be unique but type_id will be. For example that Neo
>>>> pointed out in earlier discussion, virtual devices can come from two
>>>> different physical devices, end user would be presented with what they
>>>> had selected but there will be internal implementation differences. In
>>>> that case 'type_id' will be unique.
>>>>  
>>>
>>> Hi, Kirti, my understanding is that Neo agreed to use an unique type
>>> string (if you still called it <type id>), and then no need of additional
>>> 'name' field which can be put inside 'description' field. See below quote:
>>>   
>>
>> We had internal discussions about this within NVIDIA and found that
>> 'name' might not be unique where as 'type_id' would be unique. I'm
>> refering to Neo's mail after that, where Neo do pointed that out.
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07714.html
> 
> Everyone not privy to those internal discussions, including me, seems to
> think we dropped type_id and that if a vendor does not have a stable
> name, they can compose some sort of stable type description based on the
> name+id, or even vendor+id, ex. NVIDIA-11.  So please share why we
> haven't managed to kill off type_id yet.  No matter what internal
> representation each vendor driver has of "type_id" it seems possible
> for it to come up with stable string to define a given configuration.


The 'type_id' is unique and the 'name' are not, the name is just a
virtual device name/ human readable name. Because at this moment Intel
can't define a proper GPU class, we have to add a 'description' field
there as well to represent the features of this virtual device, once we
have all agreed with the GPU class and its mandatory attributes, the
'description' field can be removed. Here is an example,
type_id/type_name = NVIDIA_11,
name=M60-M0Q,
description=2560x1600, 2 displays, 512MB"

Neo's previous comment only applies to the situation where we will have
the GPU class or optional attributes defined and recognized by libvirt,
since that is not going to happen any time soon, we will have to have
the new 'description' field, and we don't want to have it mixed up with
'name' field.

We can definitely have something like name+id as Alex recommended to
remove the 'name' field, but it will just require libvirt to have more
logic to parse that string.

Thanks,
Kirti
Tian, Kevin Oct. 13, 2016, 3:27 a.m. UTC | #7
> From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> Sent: Thursday, October 13, 2016 3:03 AM
> 
> 
> On 10/12/2016 9:29 PM, Alex Williamson wrote:
> > On Wed, 12 Oct 2016 20:43:48 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >
> >> On 10/12/2016 7:22 AM, Tian, Kevin wrote:
> >>>> From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> >>>> Sent: Wednesday, October 12, 2016 4:45 AM
> >>>>>> +* mdev_supported_types:
> >>>>>> +    List of current supported mediated device types and its details are added
> >>>>>> +in this directory in following format:
> >>>>>> +
> >>>>>> +|- <parent phy device>
> >>>>>> +|--- Vendor-specific-attributes [optional]
> >>>>>> +|--- mdev_supported_types
> >>>>>> +|     |--- <type id>
> >>>>>> +|     |   |--- create
> >>>>>> +|     |   |--- name
> >>>>>> +|     |   |--- available_instances
> >>>>>> +|     |   |--- description /class
> >>>>>> +|     |   |--- [devices]
> >>>>>> +|     |--- <type id>
> >>>>>> +|     |   |--- create
> >>>>>> +|     |   |--- name
> >>>>>> +|     |   |--- available_instances
> >>>>>> +|     |   |--- description /class
> >>>>>> +|     |   |--- [devices]
> >>>>>> +|     |--- <type id>
> >>>>>> +|          |--- create
> >>>>>> +|          |--- name
> >>>>>> +|          |--- available_instances
> >>>>>> +|          |--- description /class
> >>>>>> +|          |--- [devices]
> >>>>>> +
> >>>>>> +[TBD : description or class is yet to be decided. This will change.]
> >>>>>
> >>>>> I thought that in previous discussions we had agreed to drop
> >>>>> the <type id> concept and use the name as the unique identifier.
> >>>>> When reporting these types in libvirt we won't want to report
> >>>>> the type id values - we'll want the name strings to be unique.
> >>>>>
> >>>>
> >>>> The 'name' might not be unique but type_id will be. For example that Neo
> >>>> pointed out in earlier discussion, virtual devices can come from two
> >>>> different physical devices, end user would be presented with what they
> >>>> had selected but there will be internal implementation differences. In
> >>>> that case 'type_id' will be unique.
> >>>>
> >>>
> >>> Hi, Kirti, my understanding is that Neo agreed to use an unique type
> >>> string (if you still called it <type id>), and then no need of additional
> >>> 'name' field which can be put inside 'description' field. See below quote:
> >>>
> >>
> >> We had internal discussions about this within NVIDIA and found that
> >> 'name' might not be unique where as 'type_id' would be unique. I'm
> >> refering to Neo's mail after that, where Neo do pointed that out.
> >>
> >> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07714.html
> >
> > Everyone not privy to those internal discussions, including me, seems to
> > think we dropped type_id and that if a vendor does not have a stable
> > name, they can compose some sort of stable type description based on the
> > name+id, or even vendor+id, ex. NVIDIA-11.  So please share why we
> > haven't managed to kill off type_id yet.  No matter what internal
> > representation each vendor driver has of "type_id" it seems possible
> > for it to come up with stable string to define a given configuration.
> 
> 
> The 'type_id' is unique and the 'name' are not, the name is just a
> virtual device name/ human readable name. Because at this moment Intel
> can't define a proper GPU class, we have to add a 'description' field
> there as well to represent the features of this virtual device, once we
> have all agreed with the GPU class and its mandatory attributes, the
> 'description' field can be removed. Here is an example,
> type_id/type_name = NVIDIA_11,
> name=M60-M0Q,
> description=2560x1600, 2 displays, 512MB"

As I commented earlier, I didn't see how above attributes can be defined
mandatory:

- #displays, is concerned only for VDI usage, where remote user may
care about how many virtual displays it could be use. What about using
vGPU in non-VDI usage, e.g. purely media transcoding case where 
#displays is just nothing? Then for media transcoding do we want to
further introduce attributes like H.265?

- framebuffer size (512MB) might make sense to discrete card like
NVIDIA. In your case the graphics memory is on-card, so the memory
size is critical to performance so user might want to know. However for 
integrated card like Intel, we just use system memory as 'virtual' graphics
memory through GPU page tables. There is one global GPU page table
(GGTT) partitioned between vGPUs, but the majority of rendering happens
on per-process GPU page table (PPGTT) which can be fully managed by
each VM. In this sense, the size of GGTT resource has little performance
implication (mostly an indirect functionality sense, such as #displays) User
cannot make clear expectation on it, so we don't have plan to expose it.

> 
> Neo's previous comment only applies to the situation where we will have
> the GPU class or optional attributes defined and recognized by libvirt,
> since that is not going to happen any time soon, we will have to have
> the new 'description' field, and we don't want to have it mixed up with
> 'name' field.

As explained above, I don't see necessity of defining a GPU class now, 
at least not based on your examples (except resolution which might be
one). If required in the future, it must include mandatory attributes 
cross virtualizations usages and hardware vendors. Vendor can choose
to document the relationship between vGPU type and applied virtualization
usages in their own user manual.

> 
> We can definitely have something like name+id as Alex recommended to
> remove the 'name' field, but it will just require libvirt to have more
> logic to parse that string.
> 

In this manner I prefer to have name+id style. libvirt doesn't need to
parse it, just treat the whole name+id as the unique type identification.
'description' field is kept for vendor specific descriptive purpose where
you can add above attributes. libvirt doesn't need to parse too, just 
pass it up.

Thanks
Kevin
Jike Song Oct. 14, 2016, 2:22 a.m. UTC | #8
On 10/11/2016 04:28 AM, Kirti Wankhede wrote:
> +
> +Under per-physical device sysfs:
> +--------------------------------
> +
> +* mdev_supported_types:
> +    List of current supported mediated device types and its details are added
> +in this directory in following format:
> +
> +|- <parent phy device>
> +|--- Vendor-specific-attributes [optional]
> +|--- mdev_supported_types
> +|     |--- <type id>
> +|     |   |--- create
> +|     |   |--- name
> +|     |   |--- available_instances
> +|     |   |--- description /class
> +|     |   |--- [devices]
> +|     |--- <type id>
> +|     |   |--- create
> +|     |   |--- name
> +|     |   |--- available_instances
> +|     |   |--- description /class
> +|     |   |--- [devices]
> +|     |--- <type id>
> +|          |--- create
> +|          |--- name
> +|          |--- available_instances
> +|          |--- description /class
> +|          |--- [devices]
> +
> +[TBD : description or class is yet to be decided. This will change.]
> +
> +Under per mdev device:
> +----------------------
> +
> +|- <parent phy device>
> +|--- $MDEV_UUID
> +         |--- remove
> +         |--- {link to its type}
> +         |--- vendor-specific-attributes [optional]
> +

All mdev directories are placed under physical device directly.

Looking at the sysfs directory of physical device, you get:

        <parent phy device>
        |--- mdev_supported_types/
        |        |--- type1/
        |        |--- type2/
        |        |--- type3/
        |--- mdev1/
        |--- mdev2/



With an independent device between physical and mdev, and names
simplified, you will get:

        <parent phy device>
        |--- mdev/
        |        |--- supported_type1/
        |        |--- supported_type2/
        |        |--- supported_type3/
        |        |--- mdev1/
        |        |--- mdev2/

i.e. everything related to mdev are placed under one single directory -
the same as SR-IOV.  I'm not sure if it is possible without
introducing an independent device (which you apparently dislike), but
placing so many mdev directories under physical doesn't seems clean.



--
Thanks,
Jike
Kirti Wankhede Oct. 14, 2016, 3:15 a.m. UTC | #9
On 10/14/2016 7:52 AM, Jike Song wrote:
> On 10/11/2016 04:28 AM, Kirti Wankhede wrote:
>> +
>> +Under per-physical device sysfs:
>> +--------------------------------
>> +
>> +* mdev_supported_types:
>> +    List of current supported mediated device types and its details are added
>> +in this directory in following format:
>> +
>> +|- <parent phy device>
>> +|--- Vendor-specific-attributes [optional]
>> +|--- mdev_supported_types
>> +|     |--- <type id>
>> +|     |   |--- create
>> +|     |   |--- name
>> +|     |   |--- available_instances
>> +|     |   |--- description /class
>> +|     |   |--- [devices]
>> +|     |--- <type id>
>> +|     |   |--- create
>> +|     |   |--- name
>> +|     |   |--- available_instances
>> +|     |   |--- description /class
>> +|     |   |--- [devices]
>> +|     |--- <type id>
>> +|          |--- create
>> +|          |--- name
>> +|          |--- available_instances
>> +|          |--- description /class
>> +|          |--- [devices]
>> +
>> +[TBD : description or class is yet to be decided. This will change.]
>> +
>> +Under per mdev device:
>> +----------------------
>> +
>> +|- <parent phy device>
>> +|--- $MDEV_UUID
>> +         |--- remove
>> +         |--- {link to its type}
>> +         |--- vendor-specific-attributes [optional]
>> +
> 
> All mdev directories are placed under physical device directly.
> 
> Looking at the sysfs directory of physical device, you get:
> 
>         <parent phy device>
>         |--- mdev_supported_types/
>         |        |--- type1/
>         |        |--- type2/
>         |        |--- type3/
>         |--- mdev1/
>         |--- mdev2/
> 
> 
> 
> With an independent device between physical and mdev, and names
> simplified, you will get:
> 
>         <parent phy device>
>         |--- mdev/
>         |        |--- supported_type1/
>         |        |--- supported_type2/
>         |        |--- supported_type3/
>         |        |--- mdev1/
>         |        |--- mdev2/
> 
> i.e. everything related to mdev are placed under one single directory -
> the same as SR-IOV.  I'm not sure if it is possible without
> introducing an independent device (which you apparently dislike), but
> placing so many mdev directories under physical doesn't seems clean.
> 
> 

I'm repeating the same example that I had in reply to Alex's question,
the parent-child relationship between devices is reflected in sysfs.
There are cases when there are multiple children and all are placed in
same parent directory:

80:01.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express
Root Port 1a (rev 07)
80:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express
Root Port 2a (rev 07)
80:03.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express
Root Port 3a in PCI Express Mode (rev 07)
80:04.0 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
0 (rev 07)
80:04.1 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
1 (rev 07)
80:04.2 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
2 (rev 07)
80:04.3 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
3 (rev 07)
80:04.4 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
4 (rev 07)
80:04.5 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
5 (rev 07)
80:04.6 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
6 (rev 07)
80:04.7 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
7 (rev 07)
80:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address
Map, VTd_Misc, System Management (rev 07)
80:05.2 System peripheral: Intel Corporation Xeon E5/Core i7 Control
Status and Global Errors (rev 07)
80:05.4 PIC: Intel Corporation Xeon E5/Core i7 I/O APIC (rev 07)
81:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection (rev 01)
81:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection (rev 01)
83:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI
Express Gen 3 (8.0 GT/s) Switch (rev ca)
84:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI
Express Gen 3 (8.0 GT/s) Switch (rev ca)
84:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI
Express Gen 3 (8.0 GT/s) Switch (rev ca)
85:00.0 VGA compatible controller: NVIDIA Corporation Device 13f2 (rev a1)
86:00.0 VGA compatible controller: NVIDIA Corporation Device 13f2 (rev a1)

In sysfs, those are in same parent folder of its parent root port:

# ls /sys/devices/pci0000\:80/ -l
total 0
drwxr-xr-x 8 root root    0 Oct 13 13:30 0000:80:01.0
drwxr-xr-x 7 root root    0 Oct 13 13:30 0000:80:02.0
drwxr-xr-x 6 root root    0 Oct 13 13:30 0000:80:03.0
drwxr-xr-x 6 root root    0 Oct 13 13:30 0000:80:04.0
drwxr-xr-x 6 root root    0 Oct 13 13:30 0000:80:04.1
drwxr-xr-x 6 root root    0 Oct 13 13:30 0000:80:04.2
drwxr-xr-x 6 root root    0 Oct 13 13:30 0000:80:04.3
drwxr-xr-x 6 root root    0 Oct 13 13:30 0000:80:04.4
drwxr-xr-x 6 root root    0 Oct 13 13:30 0000:80:04.5
drwxr-xr-x 6 root root    0 Oct 13 13:30 0000:80:04.6
drwxr-xr-x 6 root root    0 Oct 13 13:30 0000:80:04.7
drwxr-xr-x 3 root root    0 Oct 13 13:30 0000:80:05.0
drwxr-xr-x 3 root root    0 Oct 13 13:30 0000:80:05.2
drwxr-xr-x 3 root root    0 Oct 13 13:30 0000:80:05.4
lrwxrwxrwx 1 root root    0 Oct 13 13:30 firmware_node ->
../LNXSYSTM:00/LNXSYBUS:00/PNP0A08:01
drwxr-xr-x 3 root root    0 Oct 13 13:30 pci_bus
drwxr-xr-x 2 root root    0 Oct 13 13:30 power
-rw-r--r-- 1 root root 4096 Oct 13 13:25 uevent


This is not only the case with pci devices, another example, i2c
devices. Nvidia driver registers to i2c bus and when nvidia.ko is
loaded, you can see i2c devices, which forms child of GPU @0000:85:00.0,
are places in 0000:85:00.0 directory.

/sys/bus/pci/devices/0000:85:00.0/i2c-4
/sys/bus/pci/devices/0000:85:00.0/i2c-5

Thanks,
Kirti
diff mbox

Patch

diff --git a/Documentation/vfio-mdev/vfio-mediated-device.txt b/Documentation/vfio-mdev/vfio-mediated-device.txt
new file mode 100644
index 000000000000..c1eacb83807b
--- /dev/null
+++ b/Documentation/vfio-mdev/vfio-mediated-device.txt
@@ -0,0 +1,219 @@ 
+/*
+ * VFIO Mediated devices
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *             Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+VFIO Mediated devices [1]
+-------------------------
+
+There are more and more use cases/demands to virtualize the DMA devices which
+don't have SR_IOV capability built-in. To do this, drivers of different
+devices had to develop their own management interface and set of APIs and then
+integrate it to user space software. We've identified common requirements and
+unified management interface for such devices to make user space software
+integration easier and simplify the device driver implementation to support such
+I/O virtualization solution.
+
+The VFIO driver framework provides unified APIs for direct device access from
+user space. It is an IOMMU/device agnostic framework for exposing direct device
+access to user space, in a secure, IOMMU protected environment. This framework
+is used for multiple devices like GPUs, network adapters and compute
+accelerators. With direct device access, virtual machines or user space
+applications have direct access of physical device. This framework is reused
+for mediated devices.
+
+Mediated core driver provides a common interface for mediated device management
+that can be used by drivers of different devices. This module provides a generic
+interface to create/destroy a mediated device, add/remove it to mediated bus
+driver and add/remove device to an IOMMU group. It also provides an interface to
+register bus driver, for example, Mediated VFIO mdev driver is designed for
+mediated devices and supports VFIO APIs. Mediated bus driver add/delete mediated
+device to VFIO Group.
+
+Below is the high Level block diagram, with NVIDIA, Intel and IBM devices
+as example, since these are the devices which are going to actively use
+this module as of now.
+
+     +---------------+
+     |               |
+     | +-----------+ |  mdev_register_driver() +--------------+
+     | |           | +<------------------------+              |
+     | |  mdev     | |                         |              |
+     | |  bus      | +------------------------>+ vfio_mdev.ko |<-> VFIO user
+     | |  driver   | |     probe()/remove()    |              |    APIs
+     | |           | |                         +--------------+
+     | +-----------+ |
+     |               |
+     |  MDEV CORE    |
+     |   MODULE      |
+     |   mdev.ko     |
+     | +-----------+ |  mdev_register_device() +--------------+
+     | |           | +<------------------------+              |
+     | |           | |                         |  nvidia.ko   |<-> physical
+     | |           | +------------------------>+              |    device
+     | |           | |        callbacks        +--------------+
+     | | Physical  | |
+     | |  device   | |  mdev_register_device() +--------------+
+     | | interface | |<------------------------+              |
+     | |           | |                         |  i915.ko     |<-> physical
+     | |           | +------------------------>+              |    device
+     | |           | |        callbacks        +--------------+
+     | |           | |
+     | |           | |  mdev_register_device() +--------------+
+     | |           | +<------------------------+              |
+     | |           | |                         | ccw_device.ko|<-> physical
+     | |           | +------------------------>+              |    device
+     | |           | |        callbacks        +--------------+
+     | +-----------+ |
+     +---------------+
+
+
+Registration Interfaces:
+------------------------
+
+Mediated core driver provides two types of registration interfaces:
+
+1. Registration interface for mediated bus driver:
+--------------------------------------------------
+     /*
+      * struct mdev_driver [2] - Mediated device's driver
+      * @name: driver name
+      * @probe: called when new device created
+      * @remove: called when device removed
+      * @driver: device driver structure
+      */
+     struct mdev_driver {
+	     const char *name;
+	     int  (*probe)  (struct device *dev);
+	     void (*remove) (struct device *dev);
+	     struct device_driver    driver;
+     };
+
+Mediated bus driver for mdev should use this interface to register and
+unregister with core driver respectively:
+
+extern int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
+extern void mdev_unregister_driver(struct mdev_driver *drv);
+
+Mediated bus driver is responsible to add/delete mediated devices to/from VFIO
+group when devices are bound and unbound to the driver.
+
+2. Physical device driver interface:
+------------------------------------
+This interface [3] provides a set of APIs to manage physical device related work
+in its driver. APIs are:
+
+* dev_attr_groups: attributes of parent device.
+* mdev_attr_groups: attributes of mediated device.
+* supported_type_groups: attributes to define supported types. It is mandatory
+			 to provide supported types.
+* create: to allocate basic resources in driver for a mediated device.
+* remove: to free resources in driver when mediated device is destroyed.
+* open: open callback of mediated device
+* release: close callback of mediated device
+* read : read emulation callback.
+* write: write emulation callback.
+* mmap: mmap emulation callback.
+* ioctl: ioctl callback.
+
+Drivers should use these interfaces to register and unregister device to mdev
+core driver respectively:
+
+extern int  mdev_register_device(struct device *dev,
+                                 const struct parent_ops *ops);
+extern void mdev_unregister_device(struct device *dev);
+
+Mediated device management interface via sysfs
+----------------------------------------------
+Management interface via sysfs allows user space software, like libvirt, to
+query and configure mediated device in a HW agnostic fashion. This management
+interface provide flexibility to underlying physical device's driver to support
+mediated device hotplug, multiple mediated devices per virtual machine, multiple
+mediated devices from different physical devices, etc.
+
+Under per-physical device sysfs:
+--------------------------------
+
+* mdev_supported_types:
+    List of current supported mediated device types and its details are added
+in this directory in following format:
+
+|- <parent phy device>
+|--- Vendor-specific-attributes [optional]
+|--- mdev_supported_types
+|     |--- <type id>
+|     |   |--- create
+|     |   |--- name
+|     |   |--- available_instances
+|     |   |--- description /class
+|     |   |--- [devices]
+|     |--- <type id>
+|     |   |--- create
+|     |   |--- name
+|     |   |--- available_instances
+|     |   |--- description /class
+|     |   |--- [devices]
+|     |--- <type id>
+|          |--- create
+|          |--- name
+|          |--- available_instances
+|          |--- description /class
+|          |--- [devices]
+
+[TBD : description or class is yet to be decided. This will change.]
+
+Under per mdev device:
+----------------------
+
+|- <parent phy device>
+|--- $MDEV_UUID
+         |--- remove
+         |--- {link to its type}
+         |--- vendor-specific-attributes [optional]
+
+* remove: (write only)
+	Write '1' to 'remove' file would destroy mdev device. Vendor driver can
+	fail remove() callback if that device is active and vendor driver
+	doesn't support hot-unplug.
+	Example:
+	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
+
+
+Mediated device Hotplug:
+------------------------
+
+Mediated devices can be created and assigned during runtime. Procedure to
+hot-plug mediated device is same as hot-plug PCI device.
+
+Translation APIs for Mediated device
+------------------------------------
+
+Below APIs are provided for user pfn to host pfn translation in VFIO driver:
+
+extern long vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
+                           long npage, int prot, unsigned long *phys_pfn);
+
+extern long vfio_unpin_pages(struct device *dev, unsigned long *pfn,
+			     long npage);
+
+These functions call back into the backend IOMMU module using two callbacks of
+struct vfio_iommu_driver_ops, pin_pages and unpin_pages [4]. Currently these are
+supported in TYPE1 IOMMU module. To enable the same for other IOMMU backend
+modules, such as PPC64 sPAPR module, they need to provide these two callback
+functions.
+
+References
+----------
+
+[1] See Documentation/vfio.txt for more information on VFIO.
+[2] struct mdev_driver in include/linux/mdev.h
+[3] struct parent_ops in include/linux/mdev.h
+[4] struct vfio_iommu_driver_ops in include/linux/vfio.h
+