Message ID | 20101108162633.GA7962@redhat.com |
---|---|
State | New |
Headers | show |
On Mon, 2010-11-08 at 18:26 +0200, Michael S. Tsirkin wrote: > On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote: > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote: > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote: > > > > pcibus_dev_print() was erroneously retrieving the device bus > > > > number from the secondary bus number offset of the device > > > > instead of the bridge above the device. This ends of landing > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully > > > > is usually zero. pcibus_get_dev_path() copied this code, > > > > inheriting the same bug. pcibus_get_dev_path() is used for > > > > ramblock naming, so changing it can effect migration. However, > > > > I've only seen this byte be non-zero for an assigned device, > > > > which can't migrate anyway, so hopefully we won't run into > > > > any issues. > > > > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > > > > Good catch. Applied. > > > I don't really see why do we put the dev path > > > in the bus object: why not let device supply its name? > > > > Because the device name is not unique. This came about from the > > discussion about how to create a canonical device path that Gleb and > > Markus are again trying to hash out. If we go up to the bus and get the > > bus address, we have a VM unique name. Unfortunately, it's difficult to > > define what the bus should print in all cases (ISA), but since they > > don't do hotplug and typically don't allocate ramblocks, we can mostly > > ignore it for this use case. > > > > > And I think this will affect nested bridges. However they are currently > > > broken anyway: we really must convert to topological names as bus number > > > is guest-assigned - they don't have to be unique, even. > > > > Yes, nested bridges are a problem. How can the seg/bus/devfn not be > > unique? > > Bus numbers for nested bridges are guest assigned. We start with 0 after reset. Right, invalid bus numbers are not unique. > > > What does fixing this involve? Just changing pcibus_get_dev_path? > > > > How do you plan to fix it? Don't forget that migration depends on these > > names, so some kind of compatibility layer would be required. Thanks, > > > > Alex > > Replace bus number with slot numbers of parent bridges up to the root. > This works for root bridge in a compatible way because bus number there > is hard-coded to 0. > IMO nested bridges are broken anyway, no way to be compatible there. > > > Gleb, Markus, I think the following should be sufficient for PCI. What > do you think? Also - do we need to update QMP/monitor to teach them to > work with these paths? > > This is on top of Alex's patch, completely untested. This function was originally intended to be recursive, so that each bus could call the parent until we hit a canonical name. It just so happens that the root bus provides that for non-nested devices. So I think instead we should have a different function for bridges that appends to what the root get_dev_path returns. Alex > pci: fix device path for devices behind nested bridges > > We were using bus number in the device path, which is clearly > broken as this number is guest-assigned for all devices > except the root. > > Fix by using hierarchical list of slots, walking the path > from root down to device, instead. Add :00 as bus number > so that if there are no nested bridges, this is compatible > with what we have now. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > diff --git a/hw/pci.c b/hw/pci.c > index 7d12473..fa98d94 100644 > --- a/hw/pci.c > +++ b/hw/pci.c > @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) > > static char *pcibus_get_dev_path(DeviceState *dev) > { > - PCIDevice *d = (PCIDevice *)dev; > - char path[16]; > - > - snprintf(path, sizeof(path), "%04x:%02x:%02x.%x", > - pci_find_domain(d->bus), pci_bus_num(d->bus), > - PCI_SLOT(d->devfn), PCI_FUNC(d->devfn)); > - > - return strdup(path); > + PCIDevice *d = container_of(dev, PCIDevice, qdev); > + PCIDevice *t; > + int slot_depth; > + /* Path format: Domain:00:Slot:Slot....:Slot.Function. > + * 00 is added here to make this format compatible with > + * domain:Bus:Slot.Func for systems without nested PCI bridges. > + * Slot list specifies the slot numbers for all devices on the > + * path from root to the specific device. */ > + int domain_len = strlen("DDDD:00"); > + int func_len = strlen(".F"); > + int slot_len = strlen(":SS"); > + int path_len; > + char *path, *p; > + > + /* Calculate # of slots on path between device and root. */; > + slot_depth = 0; > + for (t = d; t; t = t->bus->parent_dev) > + ++slot_depth; > + > + path_len = domain_len + bus_len + slot_len * slot_depth + func_len; > + > + /* Allocate memory, fill in the terminating null byte. */ > + path = malloc(path_len + 1 /* For '\0' */); > + path[path_len] = '\0'; > + > + /* First field is the domain. */ > + snprintf(path, domain_len, "%04x", pci_find_domain(d->bus)); > + > + /* Leave space for slot numbers and fill in function number. */ > + p = path + domain_len + slot_len * slot_depth; > + snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn)); > + > + /* Fill in slot numbers. We walk up from device to root, so need to print > + * them in the reverse order, last to first. */ > + for (t = d; t; t = t->bus->parent_dev) { > + p -= slot_len; > + snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn)); > + } > + > + return path; > } >
On Mon, Nov 08, 2010 at 09:36:59AM -0700, Alex Williamson wrote: > On Mon, 2010-11-08 at 18:26 +0200, Michael S. Tsirkin wrote: > > On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote: > > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote: > > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote: > > > > > pcibus_dev_print() was erroneously retrieving the device bus > > > > > number from the secondary bus number offset of the device > > > > > instead of the bridge above the device. This ends of landing > > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully > > > > > is usually zero. pcibus_get_dev_path() copied this code, > > > > > inheriting the same bug. pcibus_get_dev_path() is used for > > > > > ramblock naming, so changing it can effect migration. However, > > > > > I've only seen this byte be non-zero for an assigned device, > > > > > which can't migrate anyway, so hopefully we won't run into > > > > > any issues. > > > > > > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > > > > > > Good catch. Applied. > > > > I don't really see why do we put the dev path > > > > in the bus object: why not let device supply its name? > > > > > > Because the device name is not unique. This came about from the > > > discussion about how to create a canonical device path that Gleb and > > > Markus are again trying to hash out. If we go up to the bus and get the > > > bus address, we have a VM unique name. Unfortunately, it's difficult to > > > define what the bus should print in all cases (ISA), but since they > > > don't do hotplug and typically don't allocate ramblocks, we can mostly > > > ignore it for this use case. > > > > > > > And I think this will affect nested bridges. However they are currently > > > > broken anyway: we really must convert to topological names as bus number > > > > is guest-assigned - they don't have to be unique, even. > > > > > > Yes, nested bridges are a problem. How can the seg/bus/devfn not be > > > unique? > > > > Bus numbers for nested bridges are guest assigned. We start with 0 after reset. > > Right, invalid bus numbers are not unique. Well, you can call them invalid, but this is the only number there is until guest assigns something :) > > > > What does fixing this involve? Just changing pcibus_get_dev_path? > > > > > > How do you plan to fix it? Don't forget that migration depends on these > > > names, so some kind of compatibility layer would be required. Thanks, > > > > > > Alex > > > > Replace bus number with slot numbers of parent bridges up to the root. > > This works for root bridge in a compatible way because bus number there > > is hard-coded to 0. > > IMO nested bridges are broken anyway, no way to be compatible there. > > > > > > Gleb, Markus, I think the following should be sufficient for PCI. What > > do you think? Also - do we need to update QMP/monitor to teach them to > > work with these paths? > > > > This is on top of Alex's patch, completely untested. > > This function was originally intended to be recursive, so that each bus > could call the parent until we hit a canonical name. It just so happens > that the root bus provides that for non-nested devices. So I think > instead we should have a different function for bridges that appends to > what the root get_dev_path returns. > > Alex This won't be compatible: the format that we have is DDDD:BB:SS.F which means that string concatenation can not get you from bus path to device path (remember that bridge is a PCI device too). And frankly if we can make it non-recursive I'd rather have it that way. We are not solving the tower of hanoi here.
On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote: > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote: > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote: > > > > pcibus_dev_print() was erroneously retrieving the device bus > > > > number from the secondary bus number offset of the device > > > > instead of the bridge above the device. This ends of landing > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully > > > > is usually zero. pcibus_get_dev_path() copied this code, > > > > inheriting the same bug. pcibus_get_dev_path() is used for > > > > ramblock naming, so changing it can effect migration. However, > > > > I've only seen this byte be non-zero for an assigned device, > > > > which can't migrate anyway, so hopefully we won't run into > > > > any issues. > > > > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > > > > Good catch. Applied. > > > I don't really see why do we put the dev path > > > in the bus object: why not let device supply its name? > > > > Because the device name is not unique. This came about from the > > discussion about how to create a canonical device path that Gleb and > > Markus are again trying to hash out. If we go up to the bus and get the > > bus address, we have a VM unique name. Unfortunately, it's difficult to > > define what the bus should print in all cases (ISA), but since they > > don't do hotplug and typically don't allocate ramblocks, we can mostly > > ignore it for this use case. > > > > > And I think this will affect nested bridges. However they are currently > > > broken anyway: we really must convert to topological names as bus number > > > is guest-assigned - they don't have to be unique, even. > > > > Yes, nested bridges are a problem. How can the seg/bus/devfn not be > > unique? > > Bus numbers for nested bridges are guest assigned. We start with 0 after reset. > > > > What does fixing this involve? Just changing pcibus_get_dev_path? > > > > How do you plan to fix it? Don't forget that migration depends on these > > names, so some kind of compatibility layer would be required. Thanks, > > > > Alex > > Replace bus number with slot numbers of parent bridges up to the root. > This works for root bridge in a compatible way because bus number there > is hard-coded to 0. > IMO nested bridges are broken anyway, no way to be compatible there. > > > Gleb, Markus, I think the following should be sufficient for PCI. What > do you think? Also - do we need to update QMP/monitor to teach them to > work with these paths? > I am no longer use bus's get_dev_path callback for my purpose (I added another one get_fw_dev_path) since it is used for migration it should be done for all buses to work properly. And by properly I mean produce full path from system root to the device itself recursively. But what I learned is that by changing get_dev_path output you will break migration from older guest to newer once (something we have to support). And of course, as Alex said already, you need to traverse bridges recursively and domain does not provide any meaningful information. > This is on top of Alex's patch, completely untested. > > > pci: fix device path for devices behind nested bridges > > We were using bus number in the device path, which is clearly > broken as this number is guest-assigned for all devices > except the root. > > Fix by using hierarchical list of slots, walking the path > from root down to device, instead. Add :00 as bus number > so that if there are no nested bridges, this is compatible > with what we have now. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > diff --git a/hw/pci.c b/hw/pci.c > index 7d12473..fa98d94 100644 > --- a/hw/pci.c > +++ b/hw/pci.c > @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) > > static char *pcibus_get_dev_path(DeviceState *dev) > { > - PCIDevice *d = (PCIDevice *)dev; > - char path[16]; > - > - snprintf(path, sizeof(path), "%04x:%02x:%02x.%x", > - pci_find_domain(d->bus), pci_bus_num(d->bus), > - PCI_SLOT(d->devfn), PCI_FUNC(d->devfn)); > - > - return strdup(path); > + PCIDevice *d = container_of(dev, PCIDevice, qdev); > + PCIDevice *t; > + int slot_depth; > + /* Path format: Domain:00:Slot:Slot....:Slot.Function. > + * 00 is added here to make this format compatible with > + * domain:Bus:Slot.Func for systems without nested PCI bridges. > + * Slot list specifies the slot numbers for all devices on the > + * path from root to the specific device. */ > + int domain_len = strlen("DDDD:00"); > + int func_len = strlen(".F"); > + int slot_len = strlen(":SS"); > + int path_len; > + char *path, *p; > + > + /* Calculate # of slots on path between device and root. */; > + slot_depth = 0; > + for (t = d; t; t = t->bus->parent_dev) > + ++slot_depth; > + > + path_len = domain_len + bus_len + slot_len * slot_depth + func_len; > + > + /* Allocate memory, fill in the terminating null byte. */ > + path = malloc(path_len + 1 /* For '\0' */); > + path[path_len] = '\0'; > + > + /* First field is the domain. */ > + snprintf(path, domain_len, "%04x", pci_find_domain(d->bus)); > + > + /* Leave space for slot numbers and fill in function number. */ > + p = path + domain_len + slot_len * slot_depth; > + snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn)); > + > + /* Fill in slot numbers. We walk up from device to root, so need to print > + * them in the reverse order, last to first. */ > + for (t = d; t; t = t->bus->parent_dev) { > + p -= slot_len; > + snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn)); > + } > + > + return path; > } > -- Gleb.
On Mon, Nov 08, 2010 at 07:00:15PM +0200, Gleb Natapov wrote: > On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote: > > On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote: > > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote: > > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote: > > > > > pcibus_dev_print() was erroneously retrieving the device bus > > > > > number from the secondary bus number offset of the device > > > > > instead of the bridge above the device. This ends of landing > > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully > > > > > is usually zero. pcibus_get_dev_path() copied this code, > > > > > inheriting the same bug. pcibus_get_dev_path() is used for > > > > > ramblock naming, so changing it can effect migration. However, > > > > > I've only seen this byte be non-zero for an assigned device, > > > > > which can't migrate anyway, so hopefully we won't run into > > > > > any issues. > > > > > > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > > > > > > Good catch. Applied. > > > > I don't really see why do we put the dev path > > > > in the bus object: why not let device supply its name? > > > > > > Because the device name is not unique. This came about from the > > > discussion about how to create a canonical device path that Gleb and > > > Markus are again trying to hash out. If we go up to the bus and get the > > > bus address, we have a VM unique name. Unfortunately, it's difficult to > > > define what the bus should print in all cases (ISA), but since they > > > don't do hotplug and typically don't allocate ramblocks, we can mostly > > > ignore it for this use case. > > > > > > > And I think this will affect nested bridges. However they are currently > > > > broken anyway: we really must convert to topological names as bus number > > > > is guest-assigned - they don't have to be unique, even. > > > > > > Yes, nested bridges are a problem. How can the seg/bus/devfn not be > > > unique? > > > > Bus numbers for nested bridges are guest assigned. We start with 0 after reset. > > > > > > What does fixing this involve? Just changing pcibus_get_dev_path? > > > > > > How do you plan to fix it? Don't forget that migration depends on these > > > names, so some kind of compatibility layer would be required. Thanks, > > > > > > Alex > > > > Replace bus number with slot numbers of parent bridges up to the root. > > This works for root bridge in a compatible way because bus number there > > is hard-coded to 0. > > IMO nested bridges are broken anyway, no way to be compatible there. > > > > > > Gleb, Markus, I think the following should be sufficient for PCI. What > > do you think? Also - do we need to update QMP/monitor to teach them to > > work with these paths? > > > > I am no longer use bus's get_dev_path callback for my purpose (I added > another one get_fw_dev_path) Why? Because get_dev_path is unstable? to avoid changing for migration? So with this patch, you get to use it again. IMO two ways to address devices is a bad idea. > since it is used for migration what is used for migration? > it should > be done for all buses to work properly. And by properly I mean produce > full path from system root to the device itself recursively. This is what the code does. Recursion is not needed here though. > But what I > learned is that by changing get_dev_path output you will break migration > from older guest to newer once (something we have to support). Well I think migration for sytems with nested buses are broken anyway: it's just a new feature where we simply did not figure out the migration andgle before the last release. Without nesting my code returns exactly the existing output so no, it's not broken. > And of > course, as Alex said already, you need to traverse bridges recursively Well this is an implementation detail. loop is functionally equivalent, only cleaner and easier to understand. > and > domain does not provide any meaningful information. I dont understand yet, sorry. Code I posted returns exactly what's there today. If domain does not provide any meaningful information, How do things work then? And how do you address roots on a system with multiple roots? > > This is on top of Alex's patch, completely untested. > > > > > > pci: fix device path for devices behind nested bridges > > > > We were using bus number in the device path, which is clearly > > broken as this number is guest-assigned for all devices > > except the root. > > > > Fix by using hierarchical list of slots, walking the path > > from root down to device, instead. Add :00 as bus number > > so that if there are no nested bridges, this is compatible > > with what we have now. > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > diff --git a/hw/pci.c b/hw/pci.c > > index 7d12473..fa98d94 100644 > > --- a/hw/pci.c > > +++ b/hw/pci.c > > @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) > > > > static char *pcibus_get_dev_path(DeviceState *dev) > > { > > - PCIDevice *d = (PCIDevice *)dev; > > - char path[16]; > > - > > - snprintf(path, sizeof(path), "%04x:%02x:%02x.%x", > > - pci_find_domain(d->bus), pci_bus_num(d->bus), > > - PCI_SLOT(d->devfn), PCI_FUNC(d->devfn)); > > - > > - return strdup(path); > > + PCIDevice *d = container_of(dev, PCIDevice, qdev); > > + PCIDevice *t; > > + int slot_depth; > > + /* Path format: Domain:00:Slot:Slot....:Slot.Function. > > + * 00 is added here to make this format compatible with > > + * domain:Bus:Slot.Func for systems without nested PCI bridges. > > + * Slot list specifies the slot numbers for all devices on the > > + * path from root to the specific device. */ > > + int domain_len = strlen("DDDD:00"); > > + int func_len = strlen(".F"); > > + int slot_len = strlen(":SS"); > > + int path_len; > > + char *path, *p; > > + > > + /* Calculate # of slots on path between device and root. */; > > + slot_depth = 0; > > + for (t = d; t; t = t->bus->parent_dev) > > + ++slot_depth; > > + > > + path_len = domain_len + bus_len + slot_len * slot_depth + func_len; > > + > > + /* Allocate memory, fill in the terminating null byte. */ > > + path = malloc(path_len + 1 /* For '\0' */); > > + path[path_len] = '\0'; > > + > > + /* First field is the domain. */ > > + snprintf(path, domain_len, "%04x", pci_find_domain(d->bus)); > > + > > + /* Leave space for slot numbers and fill in function number. */ > > + p = path + domain_len + slot_len * slot_depth; > > + snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn)); > > + > > + /* Fill in slot numbers. We walk up from device to root, so need to print > > + * them in the reverse order, last to first. */ > > + for (t = d; t; t = t->bus->parent_dev) { > > + p -= slot_len; > > + snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn)); > > + } > > + > > + return path; > > } > > > > -- > Gleb.
On Mon, Nov 08, 2010 at 07:08:37PM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 08, 2010 at 07:00:15PM +0200, Gleb Natapov wrote: > > On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote: > > > On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote: > > > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote: > > > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote: > > > > > > pcibus_dev_print() was erroneously retrieving the device bus > > > > > > number from the secondary bus number offset of the device > > > > > > instead of the bridge above the device. This ends of landing > > > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully > > > > > > is usually zero. pcibus_get_dev_path() copied this code, > > > > > > inheriting the same bug. pcibus_get_dev_path() is used for > > > > > > ramblock naming, so changing it can effect migration. However, > > > > > > I've only seen this byte be non-zero for an assigned device, > > > > > > which can't migrate anyway, so hopefully we won't run into > > > > > > any issues. > > > > > > > > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > > > > > > > > Good catch. Applied. > > > > > I don't really see why do we put the dev path > > > > > in the bus object: why not let device supply its name? > > > > > > > > Because the device name is not unique. This came about from the > > > > discussion about how to create a canonical device path that Gleb and > > > > Markus are again trying to hash out. If we go up to the bus and get the > > > > bus address, we have a VM unique name. Unfortunately, it's difficult to > > > > define what the bus should print in all cases (ISA), but since they > > > > don't do hotplug and typically don't allocate ramblocks, we can mostly > > > > ignore it for this use case. > > > > > > > > > And I think this will affect nested bridges. However they are currently > > > > > broken anyway: we really must convert to topological names as bus number > > > > > is guest-assigned - they don't have to be unique, even. > > > > > > > > Yes, nested bridges are a problem. How can the seg/bus/devfn not be > > > > unique? > > > > > > Bus numbers for nested bridges are guest assigned. We start with 0 after reset. > > > > > > > > What does fixing this involve? Just changing pcibus_get_dev_path? > > > > > > > > How do you plan to fix it? Don't forget that migration depends on these > > > > names, so some kind of compatibility layer would be required. Thanks, > > > > > > > > Alex > > > > > > Replace bus number with slot numbers of parent bridges up to the root. > > > This works for root bridge in a compatible way because bus number there > > > is hard-coded to 0. > > > IMO nested bridges are broken anyway, no way to be compatible there. > > > > > > > > > Gleb, Markus, I think the following should be sufficient for PCI. What > > > do you think? Also - do we need to update QMP/monitor to teach them to > > > work with these paths? > > > > > > > I am no longer use bus's get_dev_path callback for my purpose (I added > > another one get_fw_dev_path) > > Why? Because get_dev_path is unstable? to avoid changing for migration? Because to get functional get_dev_path() we need to implemented it recursively (hint domain should not be part of PCI bus get_dev_path but its parent bus instead). And since get_dev_path() is used in migration code you have all or nothing situation. Either you implement it for all devices properly or for none. > So with this patch, you get to use it again. Unfortunately not. > IMO two ways to address devices is a bad idea. > > > since it is used for migration > > what is used for migration? > get_dev_path() callback is used to create unique instance id. Actually it is the only use of it so simple grep will show you this :) > > it should > > be done for all buses to work properly. And by properly I mean produce > > full path from system root to the device itself recursively. > > This is what the code does. Recursion is not needed here though. > Recursion is needed to create full device path. I am not interested in only PCI here. In fact PCI one is the easiest. > > But what I > > learned is that by changing get_dev_path output you will break migration > > from older guest to newer once (something we have to support). > > Well I think migration for sytems with nested buses are broken anyway: > it's just a new feature where we simply did not figure out the migration > andgle before the last release. > Without nesting my code returns exactly the existing output > so no, it's not broken. > It does not drops bus id part? If without bridges the output is exactly same then migration is OK with the patch. > > And of > > course, as Alex said already, you need to traverse bridges recursively > > Well this is an implementation detail. loop is functionally equivalent, > only cleaner and easier to understand. > With that I can agree. I am referring more to recursion down device tree. > > and > > domain does not provide any meaningful information. > > I dont understand yet, sorry. Code I posted returns exactly > what's there today. If domain does not provide any meaningful > information, How do things work then? And how do you address > roots on a system with multiple roots? > The current code is broken. Look at my patch series. > > > This is on top of Alex's patch, completely untested. > > > > > > > > > pci: fix device path for devices behind nested bridges > > > > > > We were using bus number in the device path, which is clearly > > > broken as this number is guest-assigned for all devices > > > except the root. > > > > > > Fix by using hierarchical list of slots, walking the path > > > from root down to device, instead. Add :00 as bus number > > > so that if there are no nested bridges, this is compatible > > > with what we have now. > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > > > diff --git a/hw/pci.c b/hw/pci.c > > > index 7d12473..fa98d94 100644 > > > --- a/hw/pci.c > > > +++ b/hw/pci.c > > > @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) > > > > > > static char *pcibus_get_dev_path(DeviceState *dev) > > > { > > > - PCIDevice *d = (PCIDevice *)dev; > > > - char path[16]; > > > - > > > - snprintf(path, sizeof(path), "%04x:%02x:%02x.%x", > > > - pci_find_domain(d->bus), pci_bus_num(d->bus), > > > - PCI_SLOT(d->devfn), PCI_FUNC(d->devfn)); > > > - > > > - return strdup(path); > > > + PCIDevice *d = container_of(dev, PCIDevice, qdev); > > > + PCIDevice *t; > > > + int slot_depth; > > > + /* Path format: Domain:00:Slot:Slot....:Slot.Function. > > > + * 00 is added here to make this format compatible with > > > + * domain:Bus:Slot.Func for systems without nested PCI bridges. > > > + * Slot list specifies the slot numbers for all devices on the > > > + * path from root to the specific device. */ > > > + int domain_len = strlen("DDDD:00"); > > > + int func_len = strlen(".F"); > > > + int slot_len = strlen(":SS"); > > > + int path_len; > > > + char *path, *p; > > > + > > > + /* Calculate # of slots on path between device and root. */; > > > + slot_depth = 0; > > > + for (t = d; t; t = t->bus->parent_dev) > > > + ++slot_depth; > > > + > > > + path_len = domain_len + bus_len + slot_len * slot_depth + func_len; > > > + > > > + /* Allocate memory, fill in the terminating null byte. */ > > > + path = malloc(path_len + 1 /* For '\0' */); > > > + path[path_len] = '\0'; > > > + > > > + /* First field is the domain. */ > > > + snprintf(path, domain_len, "%04x", pci_find_domain(d->bus)); > > > + > > > + /* Leave space for slot numbers and fill in function number. */ > > > + p = path + domain_len + slot_len * slot_depth; > > > + snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn)); > > > + > > > + /* Fill in slot numbers. We walk up from device to root, so need to print > > > + * them in the reverse order, last to first. */ > > > + for (t = d; t; t = t->bus->parent_dev) { > > > + p -= slot_len; > > > + snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn)); > > > + } > > > + > > > + return path; > > > } > > > > > > > -- > > Gleb. -- Gleb.
On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote: > Replace bus number with slot numbers of parent bridges up to the root. > This works for root bridge in a compatible way because bus number there > is hard-coded to 0. > IMO nested bridges are broken anyway, no way to be compatible there. > > > Gleb, Markus, I think the following should be sufficient for PCI. What > do you think? Also - do we need to update QMP/monitor to teach them to > work with these paths? > > This is on top of Alex's patch, completely untested. > > > pci: fix device path for devices behind nested bridges > > We were using bus number in the device path, which is clearly > broken as this number is guest-assigned for all devices > except the root. > > Fix by using hierarchical list of slots, walking the path > from root down to device, instead. Add :00 as bus number > so that if there are no nested bridges, this is compatible > with what we have now. This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work because pci-to-pci bridge is pci function. So the format should be Domain:00:Slot.Function:Slot.Function....:Slot.Function thanks, > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > diff --git a/hw/pci.c b/hw/pci.c > index 7d12473..fa98d94 100644 > --- a/hw/pci.c > +++ b/hw/pci.c > @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) > > static char *pcibus_get_dev_path(DeviceState *dev) > { > - PCIDevice *d = (PCIDevice *)dev; > - char path[16]; > - > - snprintf(path, sizeof(path), "%04x:%02x:%02x.%x", > - pci_find_domain(d->bus), pci_bus_num(d->bus), > - PCI_SLOT(d->devfn), PCI_FUNC(d->devfn)); > - > - return strdup(path); > + PCIDevice *d = container_of(dev, PCIDevice, qdev); > + PCIDevice *t; > + int slot_depth; > + /* Path format: Domain:00:Slot:Slot....:Slot.Function. > + * 00 is added here to make this format compatible with > + * domain:Bus:Slot.Func for systems without nested PCI bridges. > + * Slot list specifies the slot numbers for all devices on the > + * path from root to the specific device. */ > + int domain_len = strlen("DDDD:00"); > + int func_len = strlen(".F"); > + int slot_len = strlen(":SS"); > + int path_len; > + char *path, *p; > + > + /* Calculate # of slots on path between device and root. */; > + slot_depth = 0; > + for (t = d; t; t = t->bus->parent_dev) > + ++slot_depth; > + > + path_len = domain_len + bus_len + slot_len * slot_depth + func_len; > + > + /* Allocate memory, fill in the terminating null byte. */ > + path = malloc(path_len + 1 /* For '\0' */); > + path[path_len] = '\0'; > + > + /* First field is the domain. */ > + snprintf(path, domain_len, "%04x", pci_find_domain(d->bus)); > + > + /* Leave space for slot numbers and fill in function number. */ > + p = path + domain_len + slot_len * slot_depth; > + snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn)); > + > + /* Fill in slot numbers. We walk up from device to root, so need to print > + * them in the reverse order, last to first. */ > + for (t = d; t; t = t->bus->parent_dev) { > + p -= slot_len; > + snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn)); > + } > + > + return path; > } > >
On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote: > On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote: > > Replace bus number with slot numbers of parent bridges up to the root. > > This works for root bridge in a compatible way because bus number there > > is hard-coded to 0. > > IMO nested bridges are broken anyway, no way to be compatible there. > > > > > > Gleb, Markus, I think the following should be sufficient for PCI. What > > do you think? Also - do we need to update QMP/monitor to teach them to > > work with these paths? > > > > This is on top of Alex's patch, completely untested. > > > > > > pci: fix device path for devices behind nested bridges > > > > We were using bus number in the device path, which is clearly > > broken as this number is guest-assigned for all devices > > except the root. > > > > Fix by using hierarchical list of slots, walking the path > > from root down to device, instead. Add :00 as bus number > > so that if there are no nested bridges, this is compatible > > with what we have now. > > This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work > because pci-to-pci bridge is pci function. > So the format should be > Domain:00:Slot.Function:Slot.Function....:Slot.Function > > thanks, Hmm, interesting. If we do this we aren't backwards compatible though, so maybe we could try using openfirmware paths, just as well. > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > diff --git a/hw/pci.c b/hw/pci.c > > index 7d12473..fa98d94 100644 > > --- a/hw/pci.c > > +++ b/hw/pci.c > > @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) > > > > static char *pcibus_get_dev_path(DeviceState *dev) > > { > > - PCIDevice *d = (PCIDevice *)dev; > > - char path[16]; > > - > > - snprintf(path, sizeof(path), "%04x:%02x:%02x.%x", > > - pci_find_domain(d->bus), pci_bus_num(d->bus), > > - PCI_SLOT(d->devfn), PCI_FUNC(d->devfn)); > > - > > - return strdup(path); > > + PCIDevice *d = container_of(dev, PCIDevice, qdev); > > + PCIDevice *t; > > + int slot_depth; > > + /* Path format: Domain:00:Slot:Slot....:Slot.Function. > > + * 00 is added here to make this format compatible with > > + * domain:Bus:Slot.Func for systems without nested PCI bridges. > > + * Slot list specifies the slot numbers for all devices on the > > + * path from root to the specific device. */ > > + int domain_len = strlen("DDDD:00"); > > + int func_len = strlen(".F"); > > + int slot_len = strlen(":SS"); > > + int path_len; > > + char *path, *p; > > + > > + /* Calculate # of slots on path between device and root. */; > > + slot_depth = 0; > > + for (t = d; t; t = t->bus->parent_dev) > > + ++slot_depth; > > + > > + path_len = domain_len + bus_len + slot_len * slot_depth + func_len; > > + > > + /* Allocate memory, fill in the terminating null byte. */ > > + path = malloc(path_len + 1 /* For '\0' */); > > + path[path_len] = '\0'; > > + > > + /* First field is the domain. */ > > + snprintf(path, domain_len, "%04x", pci_find_domain(d->bus)); > > + > > + /* Leave space for slot numbers and fill in function number. */ > > + p = path + domain_len + slot_len * slot_depth; > > + snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn)); > > + > > + /* Fill in slot numbers. We walk up from device to root, so need to print > > + * them in the reverse order, last to first. */ > > + for (t = d; t; t = t->bus->parent_dev) { > > + p -= slot_len; > > + snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn)); > > + } > > + > > + return path; > > } > > > > > > -- > yamahata
"Michael S. Tsirkin" <mst@redhat.com> writes: > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote: >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote: >> > Replace bus number with slot numbers of parent bridges up to the root. >> > This works for root bridge in a compatible way because bus number there >> > is hard-coded to 0. >> > IMO nested bridges are broken anyway, no way to be compatible there. >> > >> > >> > Gleb, Markus, I think the following should be sufficient for PCI. What >> > do you think? Also - do we need to update QMP/monitor to teach them to >> > work with these paths? >> > >> > This is on top of Alex's patch, completely untested. >> > >> > >> > pci: fix device path for devices behind nested bridges >> > >> > We were using bus number in the device path, which is clearly >> > broken as this number is guest-assigned for all devices >> > except the root. >> > >> > Fix by using hierarchical list of slots, walking the path >> > from root down to device, instead. Add :00 as bus number >> > so that if there are no nested bridges, this is compatible >> > with what we have now. >> >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work >> because pci-to-pci bridge is pci function. >> So the format should be >> Domain:00:Slot.Function:Slot.Function....:Slot.Function >> >> thanks, > > Hmm, interesting. If we do this we aren't backwards compatible > though, so maybe we could try using openfirmware paths, just as well. Whatever we do, we need to make it work for all (qdevified) devices and buses. It should also be possible to use canonical addressing with device_add & friends. I.e. permit naming a device by (a unique abbreviation of) its canonical address in addition to naming it by its user-defined ID. For instance, something like device_del /pci/@1,1 in addition to device_del ID Open Firmware is a useful source of inspiration there, but should it come into conflict with usability, we should let usability win.
On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote: > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote: > >> > Replace bus number with slot numbers of parent bridges up to the root. > >> > This works for root bridge in a compatible way because bus number there > >> > is hard-coded to 0. > >> > IMO nested bridges are broken anyway, no way to be compatible there. > >> > > >> > > >> > Gleb, Markus, I think the following should be sufficient for PCI. What > >> > do you think? Also - do we need to update QMP/monitor to teach them to > >> > work with these paths? > >> > > >> > This is on top of Alex's patch, completely untested. > >> > > >> > > >> > pci: fix device path for devices behind nested bridges > >> > > >> > We were using bus number in the device path, which is clearly > >> > broken as this number is guest-assigned for all devices > >> > except the root. > >> > > >> > Fix by using hierarchical list of slots, walking the path > >> > from root down to device, instead. Add :00 as bus number > >> > so that if there are no nested bridges, this is compatible > >> > with what we have now. > >> > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work > >> because pci-to-pci bridge is pci function. > >> So the format should be > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function > >> > >> thanks, > > > > Hmm, interesting. If we do this we aren't backwards compatible > > though, so maybe we could try using openfirmware paths, just as well. > > Whatever we do, we need to make it work for all (qdevified) devices and > buses. > > It should also be possible to use canonical addressing with device_add & > friends. I.e. permit naming a device by (a unique abbreviation of) its > canonical address in addition to naming it by its user-defined ID. For > instance, something like > > device_del /pci/@1,1 > FWIW openbios allows this kind of abbreviation. > in addition to > > device_del ID > > Open Firmware is a useful source of inspiration there, but should it > come into conflict with usability, we should let usability win. -- Gleb.
On Fri, Nov 19, 2010 at 10:38:42PM +0200, Gleb Natapov wrote: > On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote: > > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote: > > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote: > > >> > Replace bus number with slot numbers of parent bridges up to the root. > > >> > This works for root bridge in a compatible way because bus number there > > >> > is hard-coded to 0. > > >> > IMO nested bridges are broken anyway, no way to be compatible there. > > >> > > > >> > > > >> > Gleb, Markus, I think the following should be sufficient for PCI. What > > >> > do you think? Also - do we need to update QMP/monitor to teach them to > > >> > work with these paths? > > >> > > > >> > This is on top of Alex's patch, completely untested. > > >> > > > >> > > > >> > pci: fix device path for devices behind nested bridges > > >> > > > >> > We were using bus number in the device path, which is clearly > > >> > broken as this number is guest-assigned for all devices > > >> > except the root. > > >> > > > >> > Fix by using hierarchical list of slots, walking the path > > >> > from root down to device, instead. Add :00 as bus number > > >> > so that if there are no nested bridges, this is compatible > > >> > with what we have now. > > >> > > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work > > >> because pci-to-pci bridge is pci function. > > >> So the format should be > > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function > > >> > > >> thanks, > > > > > > Hmm, interesting. If we do this we aren't backwards compatible > > > though, so maybe we could try using openfirmware paths, just as well. > > > > Whatever we do, we need to make it work for all (qdevified) devices and > > buses. > > > > It should also be possible to use canonical addressing with device_add & > > friends. I.e. permit naming a device by (a unique abbreviation of) its > > canonical address in addition to naming it by its user-defined ID. For > > instance, something like > > > > device_del /pci/@1,1 > > > FWIW openbios allows this kind of abbreviation. > > > in addition to > > > > device_del ID > > > > Open Firmware is a useful source of inspiration there, but should it > > come into conflict with usability, we should let usability win. > > -- > Gleb. I think that the domain (PCI segment group), bus, slot, function way to address pci devices is still the most familiar and the easiest to map to functionality in the guests. Qemu is buggy in the moment in that is uses the bus addresses assigned by guest and not the ones in ACPI, but that can be fixed. That should be enough for e.g. device_del. We do have the need to describe the topology when we interface with firmware, e.g. to describe the ACPI tables themselves to qemu (this is what Gleb's patches deal with), but that's probably the only case.
On Sat, Nov 20, 2010 at 10:17:09PM +0200, Michael S. Tsirkin wrote: > On Fri, Nov 19, 2010 at 10:38:42PM +0200, Gleb Natapov wrote: > > On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote: > > > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > > > > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote: > > > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote: > > > >> > Replace bus number with slot numbers of parent bridges up to the root. > > > >> > This works for root bridge in a compatible way because bus number there > > > >> > is hard-coded to 0. > > > >> > IMO nested bridges are broken anyway, no way to be compatible there. > > > >> > > > > >> > > > > >> > Gleb, Markus, I think the following should be sufficient for PCI. What > > > >> > do you think? Also - do we need to update QMP/monitor to teach them to > > > >> > work with these paths? > > > >> > > > > >> > This is on top of Alex's patch, completely untested. > > > >> > > > > >> > > > > >> > pci: fix device path for devices behind nested bridges > > > >> > > > > >> > We were using bus number in the device path, which is clearly > > > >> > broken as this number is guest-assigned for all devices > > > >> > except the root. > > > >> > > > > >> > Fix by using hierarchical list of slots, walking the path > > > >> > from root down to device, instead. Add :00 as bus number > > > >> > so that if there are no nested bridges, this is compatible > > > >> > with what we have now. > > > >> > > > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work > > > >> because pci-to-pci bridge is pci function. > > > >> So the format should be > > > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function > > > >> > > > >> thanks, > > > > > > > > Hmm, interesting. If we do this we aren't backwards compatible > > > > though, so maybe we could try using openfirmware paths, just as well. > > > > > > Whatever we do, we need to make it work for all (qdevified) devices and > > > buses. > > > > > > It should also be possible to use canonical addressing with device_add & > > > friends. I.e. permit naming a device by (a unique abbreviation of) its > > > canonical address in addition to naming it by its user-defined ID. For > > > instance, something like > > > > > > device_del /pci/@1,1 > > > > > FWIW openbios allows this kind of abbreviation. > > > > > in addition to > > > > > > device_del ID > > > > > > Open Firmware is a useful source of inspiration there, but should it > > > come into conflict with usability, we should let usability win. > > > > -- > > Gleb. > > > I think that the domain (PCI segment group), bus, slot, function way to > address pci devices is still the most familiar and the easiest to map to Most familiar to whom? It looks like you identify yourself with most of qemu users, but if most qemu users are like you then qemu has not enough users :) Most users that consider themselves to be "advanced" may know what eth1 or /dev/sdb means. This doesn't mean we should provide "device_del eth1" or "device_add /dev/sdb" command though. More important is that "domain" (encoded as number like you used to) and "bus number" has no meaning from inside qemu. So while I said many times I don't care about exact CLI syntax to much it should make sense at least. It can use id to specify PCI bus in CLI like this: device_del pci.0:1.1. Or it can even use device id too like this: device_del pci.0:ide.0. Or it can use HW topology like in FO device path. But doing ah-hoc device enumeration inside qemu and then using it for CLI is not it. > functionality in the guests. Qemu is buggy in the moment in that is > uses the bus addresses assigned by guest and not the ones in ACPI, > but that can be fixed. It looks like you confused ACPI _SEG for something it isn't. ACPI spec says that PCI segment group is purely software concept managed by system firmware. In fact one segment may include multiple PCI host bridges. _SEG is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS (Current Resource Settings) for that just like OF. No surprise there. > > That should be enough for e.g. device_del. We do have the need to > describe the topology when we interface with firmware, e.g. to describe > the ACPI tables themselves to qemu (this is what Gleb's patches deal > with), but that's probably the only case. > Describing HW topology is the only way to unambiguously describe device to something or someone outside qemu and have persistent device naming between different HW configuration. -- Gleb.
On Sun, Nov 21, 2010 at 10:32:11AM +0200, Gleb Natapov wrote: > On Sat, Nov 20, 2010 at 10:17:09PM +0200, Michael S. Tsirkin wrote: > > On Fri, Nov 19, 2010 at 10:38:42PM +0200, Gleb Natapov wrote: > > > On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote: > > > > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > > > > > > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote: > > > > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote: > > > > >> > Replace bus number with slot numbers of parent bridges up to the root. > > > > >> > This works for root bridge in a compatible way because bus number there > > > > >> > is hard-coded to 0. > > > > >> > IMO nested bridges are broken anyway, no way to be compatible there. > > > > >> > > > > > >> > > > > > >> > Gleb, Markus, I think the following should be sufficient for PCI. What > > > > >> > do you think? Also - do we need to update QMP/monitor to teach them to > > > > >> > work with these paths? > > > > >> > > > > > >> > This is on top of Alex's patch, completely untested. > > > > >> > > > > > >> > > > > > >> > pci: fix device path for devices behind nested bridges > > > > >> > > > > > >> > We were using bus number in the device path, which is clearly > > > > >> > broken as this number is guest-assigned for all devices > > > > >> > except the root. > > > > >> > > > > > >> > Fix by using hierarchical list of slots, walking the path > > > > >> > from root down to device, instead. Add :00 as bus number > > > > >> > so that if there are no nested bridges, this is compatible > > > > >> > with what we have now. > > > > >> > > > > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work > > > > >> because pci-to-pci bridge is pci function. > > > > >> So the format should be > > > > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function > > > > >> > > > > >> thanks, > > > > > > > > > > Hmm, interesting. If we do this we aren't backwards compatible > > > > > though, so maybe we could try using openfirmware paths, just as well. > > > > > > > > Whatever we do, we need to make it work for all (qdevified) devices and > > > > buses. > > > > > > > > It should also be possible to use canonical addressing with device_add & > > > > friends. I.e. permit naming a device by (a unique abbreviation of) its > > > > canonical address in addition to naming it by its user-defined ID. For > > > > instance, something like > > > > > > > > device_del /pci/@1,1 > > > > > > > FWIW openbios allows this kind of abbreviation. > > > > > > > in addition to > > > > > > > > device_del ID > > > > > > > > Open Firmware is a useful source of inspiration there, but should it > > > > come into conflict with usability, we should let usability win. > > > > > > -- > > > Gleb. > > > > > > I think that the domain (PCI segment group), bus, slot, function way to > > address pci devices is still the most familiar and the easiest to map to > Most familiar to whom? The guests. For CLI, we need an easy way to map a device in guest to the device in qemu and back. > It looks like you identify yourself with most of > qemu users, but if most qemu users are like you then qemu has not enough > users :) Most users that consider themselves to be "advanced" may know > what eth1 or /dev/sdb means. This doesn't mean we should provide > "device_del eth1" or "device_add /dev/sdb" command though. > > More important is that "domain" (encoded as number like you used to) > and "bus number" has no meaning from inside qemu. > So while I said many > times I don't care about exact CLI syntax to much it should make sense > at least. It can use id to specify PCI bus in CLI like this: > device_del pci.0:1.1. Or it can even use device id too like this: > device_del pci.0:ide.0. Or it can use HW topology like in FO device > path. But doing ah-hoc device enumeration inside qemu and then using it > for CLI is not it. > > > functionality in the guests. Qemu is buggy in the moment in that is > > uses the bus addresses assigned by guest and not the ones in ACPI, > > but that can be fixed. > It looks like you confused ACPI _SEG for something it isn't. Maybe I did. This is what linux does: struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root *root) { struct acpi_device *device = root->device; int domain = root->segment; int busnum = root->secondary.start; And I think this is consistent with the spec. > ACPI spec > says that PCI segment group is purely software concept managed by system > firmware. In fact one segment may include multiple PCI host bridges. It can't I think: Multiple Host Bridges A platform may have multiple PCI Express or PCI-X host bridges. The base address for the MMCONFIG space for these host bridges may need to be allocated at different locations. In such cases, using MCFG table and _CBA method as defined in this section means that each of these host bridges must be in its own PCI Segment Group. > _SEG > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS > (Current Resource Settings) for that just like OF. No surprise there. OSPM uses both I think. All I see linux do with CRS is get the bus number range. And the spec says, e.g.: the memory mapped configuration base address (always corresponds to bus number 0) for the PCI Segment Group of the host bridge is provided by _CBA and the bus range covered by the base address is indicated by the corresponding bus range specified in _CRS. > > > > That should be enough for e.g. device_del. We do have the need to > > describe the topology when we interface with firmware, e.g. to describe > > the ACPI tables themselves to qemu (this is what Gleb's patches deal > > with), but that's probably the only case. > > > Describing HW topology is the only way to unambiguously describe device to > something or someone outside qemu and have persistent device naming > between different HW configuration. Not really, since ACPI is a binary blob programmed by qemu. > -- > Gleb.
On Sun, Nov 21, 2010 at 11:50:18AM +0200, Michael S. Tsirkin wrote: > On Sun, Nov 21, 2010 at 10:32:11AM +0200, Gleb Natapov wrote: > > On Sat, Nov 20, 2010 at 10:17:09PM +0200, Michael S. Tsirkin wrote: > > > On Fri, Nov 19, 2010 at 10:38:42PM +0200, Gleb Natapov wrote: > > > > On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote: > > > > > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > > > > > > > > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote: > > > > > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote: > > > > > >> > Replace bus number with slot numbers of parent bridges up to the root. > > > > > >> > This works for root bridge in a compatible way because bus number there > > > > > >> > is hard-coded to 0. > > > > > >> > IMO nested bridges are broken anyway, no way to be compatible there. > > > > > >> > > > > > > >> > > > > > > >> > Gleb, Markus, I think the following should be sufficient for PCI. What > > > > > >> > do you think? Also - do we need to update QMP/monitor to teach them to > > > > > >> > work with these paths? > > > > > >> > > > > > > >> > This is on top of Alex's patch, completely untested. > > > > > >> > > > > > > >> > > > > > > >> > pci: fix device path for devices behind nested bridges > > > > > >> > > > > > > >> > We were using bus number in the device path, which is clearly > > > > > >> > broken as this number is guest-assigned for all devices > > > > > >> > except the root. > > > > > >> > > > > > > >> > Fix by using hierarchical list of slots, walking the path > > > > > >> > from root down to device, instead. Add :00 as bus number > > > > > >> > so that if there are no nested bridges, this is compatible > > > > > >> > with what we have now. > > > > > >> > > > > > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work > > > > > >> because pci-to-pci bridge is pci function. > > > > > >> So the format should be > > > > > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function > > > > > >> > > > > > >> thanks, > > > > > > > > > > > > Hmm, interesting. If we do this we aren't backwards compatible > > > > > > though, so maybe we could try using openfirmware paths, just as well. > > > > > > > > > > Whatever we do, we need to make it work for all (qdevified) devices and > > > > > buses. > > > > > > > > > > It should also be possible to use canonical addressing with device_add & > > > > > friends. I.e. permit naming a device by (a unique abbreviation of) its > > > > > canonical address in addition to naming it by its user-defined ID. For > > > > > instance, something like > > > > > > > > > > device_del /pci/@1,1 > > > > > > > > > FWIW openbios allows this kind of abbreviation. > > > > > > > > > in addition to > > > > > > > > > > device_del ID > > > > > > > > > > Open Firmware is a useful source of inspiration there, but should it > > > > > come into conflict with usability, we should let usability win. > > > > > > > > -- > > > > Gleb. > > > > > > > > > I think that the domain (PCI segment group), bus, slot, function way to > > > address pci devices is still the most familiar and the easiest to map to > > Most familiar to whom? > > The guests. Which one? There are many guests. Your favorite? > For CLI, we need an easy way to map a device in guest to the > device in qemu and back. Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what you are saying is "lets use name that guest assigned to a device". > > > It looks like you identify yourself with most of > > qemu users, but if most qemu users are like you then qemu has not enough > > users :) Most users that consider themselves to be "advanced" may know > > what eth1 or /dev/sdb means. This doesn't mean we should provide > > "device_del eth1" or "device_add /dev/sdb" command though. > > > > More important is that "domain" (encoded as number like you used to) > > and "bus number" has no meaning from inside qemu. > > So while I said many > > times I don't care about exact CLI syntax to much it should make sense > > at least. It can use id to specify PCI bus in CLI like this: > > device_del pci.0:1.1. Or it can even use device id too like this: > > device_del pci.0:ide.0. Or it can use HW topology like in FO device > > path. But doing ah-hoc device enumeration inside qemu and then using it > > for CLI is not it. > > > > > functionality in the guests. Qemu is buggy in the moment in that is > > > uses the bus addresses assigned by guest and not the ones in ACPI, > > > but that can be fixed. > > It looks like you confused ACPI _SEG for something it isn't. > > Maybe I did. This is what linux does: > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root > *root) > { > struct acpi_device *device = root->device; > int domain = root->segment; > int busnum = root->secondary.start; > > And I think this is consistent with the spec. > It means that one domain may include several host bridges. At that level domain is defined as something that have unique name for each device inside it thus no two buses in one segment/domain can have same bus number. This is what PCI spec tells you. And this further shows that using "domain" as defined by guest is very bad idea. > > ACPI spec > > says that PCI segment group is purely software concept managed by system > > firmware. In fact one segment may include multiple PCI host bridges. > > It can't I think: Read _BBN definition: The _BBN object is located under a PCI host bridge and must be unique for every host bridge within a segment since it is the PCI bus number. Clearly above speaks about multiple host bridge within a segment. > Multiple Host Bridges > > A platform may have multiple PCI Express or PCI-X host bridges. The base > address for the > MMCONFIG space for these host bridges may need to be allocated at > different locations. In such > cases, using MCFG table and _CBA method as defined in this section means > that each of these host > bridges must be in its own PCI Segment Group. > This is not from ACPI spec, but without going to deep into it above paragraph talks about some particular case when each host bridge must be in its own PCI Segment Group with is a definite prove that in other cases multiple host bridges can be in on segment group. > > > _SEG > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS > > (Current Resource Settings) for that just like OF. No surprise there. > > OSPM uses both I think. > > All I see linux do with CRS is get the bus number range. So lets assume that HW has two PCI host bridges and ACPI has: Device(PCI0) { Name (_HID, EisaId ("PNP0A03")) Name (_SEG, 0x00) } Device(PCI1) { Name (_HID, EisaId ("PNP0A03")) Name (_SEG, 0x01) } I.e no _CRS to describe resources. How do you think OSPM knows which of two pci host bridges is PCI0 and which one is PCI1? > And the spec says, e.g.: > > the memory mapped configuration base > address (always corresponds to bus number 0) for the PCI Segment Group > of the host bridge is provided by _CBA and the bus range covered by the > base address is indicated by the corresponding bus range specified in > _CRS. > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets solve the problem for PCI first and then move to PCI Express. Jumping from one to another destruct us from main discussion. > > > > > > > That should be enough for e.g. device_del. We do have the need to > > > describe the topology when we interface with firmware, e.g. to describe > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal > > > with), but that's probably the only case. > > > > > Describing HW topology is the only way to unambiguously describe device to > > something or someone outside qemu and have persistent device naming > > between different HW configuration. > > Not really, since ACPI is a binary blob programmed by qemu. > APCI is part of the guest, not qemu. Just saying "not really" doesn't prove much. I still haven't seen any proposition from you that actually solve the problem. No, "lets use guest naming" is not it. There is no such thing as "The Guest". -- Gleb.
On Sun, Nov 21, 2010 at 12:19:03PM +0200, Gleb Natapov wrote: > On Sun, Nov 21, 2010 at 11:50:18AM +0200, Michael S. Tsirkin wrote: > > On Sun, Nov 21, 2010 at 10:32:11AM +0200, Gleb Natapov wrote: > > > On Sat, Nov 20, 2010 at 10:17:09PM +0200, Michael S. Tsirkin wrote: > > > > On Fri, Nov 19, 2010 at 10:38:42PM +0200, Gleb Natapov wrote: > > > > > On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote: > > > > > > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > > > > > > > > > > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote: > > > > > > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote: > > > > > > >> > Replace bus number with slot numbers of parent bridges up to the root. > > > > > > >> > This works for root bridge in a compatible way because bus number there > > > > > > >> > is hard-coded to 0. > > > > > > >> > IMO nested bridges are broken anyway, no way to be compatible there. > > > > > > >> > > > > > > > >> > > > > > > > >> > Gleb, Markus, I think the following should be sufficient for PCI. What > > > > > > >> > do you think? Also - do we need to update QMP/monitor to teach them to > > > > > > >> > work with these paths? > > > > > > >> > > > > > > > >> > This is on top of Alex's patch, completely untested. > > > > > > >> > > > > > > > >> > > > > > > > >> > pci: fix device path for devices behind nested bridges > > > > > > >> > > > > > > > >> > We were using bus number in the device path, which is clearly > > > > > > >> > broken as this number is guest-assigned for all devices > > > > > > >> > except the root. > > > > > > >> > > > > > > > >> > Fix by using hierarchical list of slots, walking the path > > > > > > >> > from root down to device, instead. Add :00 as bus number > > > > > > >> > so that if there are no nested bridges, this is compatible > > > > > > >> > with what we have now. > > > > > > >> > > > > > > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work > > > > > > >> because pci-to-pci bridge is pci function. > > > > > > >> So the format should be > > > > > > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function > > > > > > >> > > > > > > >> thanks, > > > > > > > > > > > > > > Hmm, interesting. If we do this we aren't backwards compatible > > > > > > > though, so maybe we could try using openfirmware paths, just as well. > > > > > > > > > > > > Whatever we do, we need to make it work for all (qdevified) devices and > > > > > > buses. > > > > > > > > > > > > It should also be possible to use canonical addressing with device_add & > > > > > > friends. I.e. permit naming a device by (a unique abbreviation of) its > > > > > > canonical address in addition to naming it by its user-defined ID. For > > > > > > instance, something like > > > > > > > > > > > > device_del /pci/@1,1 > > > > > > > > > > > FWIW openbios allows this kind of abbreviation. > > > > > > > > > > > in addition to > > > > > > > > > > > > device_del ID > > > > > > > > > > > > Open Firmware is a useful source of inspiration there, but should it > > > > > > come into conflict with usability, we should let usability win. > > > > > > > > > > -- > > > > > Gleb. > > > > > > > > > > > > I think that the domain (PCI segment group), bus, slot, function way to > > > > address pci devices is still the most familiar and the easiest to map to > > > Most familiar to whom? > > > > The guests. > Which one? There are many guests. Your favorite? > > > For CLI, we need an easy way to map a device in guest to the > > device in qemu and back. > Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what > you are saying is "lets use name that guest assigned to a device". No I am saying let's use the name that our ACPI tables assigned. > > > > > It looks like you identify yourself with most of > > > qemu users, but if most qemu users are like you then qemu has not enough > > > users :) Most users that consider themselves to be "advanced" may know > > > what eth1 or /dev/sdb means. This doesn't mean we should provide > > > "device_del eth1" or "device_add /dev/sdb" command though. > > > > > > More important is that "domain" (encoded as number like you used to) > > > and "bus number" has no meaning from inside qemu. > > > So while I said many > > > times I don't care about exact CLI syntax to much it should make sense > > > at least. It can use id to specify PCI bus in CLI like this: > > > device_del pci.0:1.1. Or it can even use device id too like this: > > > device_del pci.0:ide.0. Or it can use HW topology like in FO device > > > path. But doing ah-hoc device enumeration inside qemu and then using it > > > for CLI is not it. > > > > > > > functionality in the guests. Qemu is buggy in the moment in that is > > > > uses the bus addresses assigned by guest and not the ones in ACPI, > > > > but that can be fixed. > > > It looks like you confused ACPI _SEG for something it isn't. > > > > Maybe I did. This is what linux does: > > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root > > *root) > > { > > struct acpi_device *device = root->device; > > int domain = root->segment; > > int busnum = root->secondary.start; > > > > And I think this is consistent with the spec. > > > It means that one domain may include several host bridges. > At that level > domain is defined as something that have unique name for each device > inside it thus no two buses in one segment/domain can have same bus > number. This is what PCI spec tells you. And that really is enough for CLI because all we need is locate the specific slot in a unique way. > And this further shows that using "domain" as defined by guest is very > bad idea. As defined by ACPI, really. > > > ACPI spec > > > says that PCI segment group is purely software concept managed by system > > > firmware. In fact one segment may include multiple PCI host bridges. > > > > It can't I think: > Read _BBN definition: > The _BBN object is located under a PCI host bridge and must be unique for > every host bridge within a segment since it is the PCI bus number. > > Clearly above speaks about multiple host bridge within a segment. Yes, it looks like the firmware spec allows that. > > Multiple Host Bridges > > > > A platform may have multiple PCI Express or PCI-X host bridges. The base > > address for the > > MMCONFIG space for these host bridges may need to be allocated at > > different locations. In such > > cases, using MCFG table and _CBA method as defined in this section means > > that each of these host > > bridges must be in its own PCI Segment Group. > > > This is not from ACPI spec, PCI Firmware Specification 3.0 > but without going to deep into it above > paragraph talks about some particular case when each host bridge must > be in its own PCI Segment Group with is a definite prove that in other > cases multiple host bridges can be in on segment group. I stand corrected. I think you are right. But note that if they are, they must have distinct bus numbers assigned by ACPI. > > > > > _SEG > > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS > > > (Current Resource Settings) for that just like OF. No surprise there. > > > > OSPM uses both I think. > > > > All I see linux do with CRS is get the bus number range. > So lets assume that HW has two PCI host bridges and ACPI has: > Device(PCI0) { > Name (_HID, EisaId ("PNP0A03")) > Name (_SEG, 0x00) > } > Device(PCI1) { > Name (_HID, EisaId ("PNP0A03")) > Name (_SEG, 0x01) > } > I.e no _CRS to describe resources. How do you think OSPM knows which of > two pci host bridges is PCI0 and which one is PCI1? You must be able to uniquely address any bridge using the combination of _SEG and _BBN. > > And the spec says, e.g.: > > > > the memory mapped configuration base > > address (always corresponds to bus number 0) for the PCI Segment Group > > of the host bridge is provided by _CBA and the bus range covered by the > > base address is indicated by the corresponding bus range specified in > > _CRS. > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets > solve the problem for PCI first and then move to PCI Express. Jumping from one > to another destruct us from main discussion. I think this is what confuses us. As long as you are using cf8/cfc there's no concept of a domain really. Thus: /pci@i0cf8 is probably enough for BIOS boot because we'll need to make root bus numbers unique for legacy guests/option ROMs. But this is not a hardware requirement and might become easier to ignore eith EFI. > > > > > > > > > > That should be enough for e.g. device_del. We do have the need to > > > > describe the topology when we interface with firmware, e.g. to describe > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal > > > > with), but that's probably the only case. > > > > > > > Describing HW topology is the only way to unambiguously describe device to > > > something or someone outside qemu and have persistent device naming > > > between different HW configuration. > > > > Not really, since ACPI is a binary blob programmed by qemu. > > > APCI is part of the guest, not qemu. Yes it runs in the guest but it's generated by qemu. On real hardware, it's supplied by the motherboard. > Just saying "not really" doesn't > prove much. I still haven't seen any proposition from you that actually > solve the problem. No, "lets use guest naming" is not it. There is no > such thing as "The Guest". > > -- > Gleb. I am sorry if I didn't make this clear. I think we should use the domain:bus pair to name the root device. As these are unique and
On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote: > > > The guests. > > Which one? There are many guests. Your favorite? > > > > > For CLI, we need an easy way to map a device in guest to the > > > device in qemu and back. > > Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what > > you are saying is "lets use name that guest assigned to a device". > > No I am saying let's use the name that our ACPI tables assigned. > ACPI does not assign any name. In a best case ACPI tables describe resources used by a device. And not all guests qemu supports has support for ACPI. Qemu even support machines types that do not support ACPI. > > > > > > > > > > It looks like you identify yourself with most of > > > > qemu users, but if most qemu users are like you then qemu has not enough > > > > users :) Most users that consider themselves to be "advanced" may know > > > > what eth1 or /dev/sdb means. This doesn't mean we should provide > > > > "device_del eth1" or "device_add /dev/sdb" command though. > > > > > > > > More important is that "domain" (encoded as number like you used to) > > > > and "bus number" has no meaning from inside qemu. > > > > So while I said many > > > > times I don't care about exact CLI syntax to much it should make sense > > > > at least. It can use id to specify PCI bus in CLI like this: > > > > device_del pci.0:1.1. Or it can even use device id too like this: > > > > device_del pci.0:ide.0. Or it can use HW topology like in FO device > > > > path. But doing ah-hoc device enumeration inside qemu and then using it > > > > for CLI is not it. > > > > > > > > > functionality in the guests. Qemu is buggy in the moment in that is > > > > > uses the bus addresses assigned by guest and not the ones in ACPI, > > > > > but that can be fixed. > > > > It looks like you confused ACPI _SEG for something it isn't. > > > > > > Maybe I did. This is what linux does: > > > > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root > > > *root) > > > { > > > struct acpi_device *device = root->device; > > > int domain = root->segment; > > > int busnum = root->secondary.start; > > > > > > And I think this is consistent with the spec. > > > > > It means that one domain may include several host bridges. > > At that level > > domain is defined as something that have unique name for each device > > inside it thus no two buses in one segment/domain can have same bus > > number. This is what PCI spec tells you. > > And that really is enough for CLI because all we need is locate the > specific slot in a unique way. > At qemu level we do not have bus numbers. They are assigned by a guest. So inside a guest domain:bus:slot.func points you to a device, but in qemu does not enumerate buses. > > And this further shows that using "domain" as defined by guest is very > > bad idea. > > As defined by ACPI, really. > ACPI is a part of a guest software that may not event present in the guest. How is it relevant? > > > > ACPI spec > > > > says that PCI segment group is purely software concept managed by system > > > > firmware. In fact one segment may include multiple PCI host bridges. > > > > > > It can't I think: > > Read _BBN definition: > > The _BBN object is located under a PCI host bridge and must be unique for > > every host bridge within a segment since it is the PCI bus number. > > > > Clearly above speaks about multiple host bridge within a segment. > > Yes, it looks like the firmware spec allows that. It even have explicit example that shows it. > > > > Multiple Host Bridges > > > > > > A platform may have multiple PCI Express or PCI-X host bridges. The base > > > address for the > > > MMCONFIG space for these host bridges may need to be allocated at > > > different locations. In such > > > cases, using MCFG table and _CBA method as defined in this section means > > > that each of these host > > > bridges must be in its own PCI Segment Group. > > > > > This is not from ACPI spec, > > PCI Firmware Specification 3.0 > > > but without going to deep into it above > > paragraph talks about some particular case when each host bridge must > > be in its own PCI Segment Group with is a definite prove that in other > > cases multiple host bridges can be in on segment group. > > I stand corrected. I think you are right. But note that if they are, > they must have distinct bus numbers assigned by ACPI. ACPI does not assign any numbers. Bios enumerates buses and assign numbers. ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits one layer below all this and does not enumerate PC buses. Even if we make it to do so there is not way to guaranty that guest will enumerate them in the same order since there is more then one way to do enumeration. I repeated this numerous times to you already. > > > > > > > > _SEG > > > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS > > > > (Current Resource Settings) for that just like OF. No surprise there. > > > > > > OSPM uses both I think. > > > > > > All I see linux do with CRS is get the bus number range. > > So lets assume that HW has two PCI host bridges and ACPI has: > > Device(PCI0) { > > Name (_HID, EisaId ("PNP0A03")) > > Name (_SEG, 0x00) > > } > > Device(PCI1) { > > Name (_HID, EisaId ("PNP0A03")) > > Name (_SEG, 0x01) > > } > > I.e no _CRS to describe resources. How do you think OSPM knows which of > > two pci host bridges is PCI0 and which one is PCI1? > > You must be able to uniquely address any bridge using the combination of _SEG > and _BBN. No at all. And saying "you must be able" without actually show how doesn't prove anything. _SEG is relevant only for those host bridges that support MMCONFIG (not all of them do, and none that qemu support does yet). _SEG points to specific entry in MCFG table and MCFG entry holds base address for MMCONFIG space for the bridge (this address is configured by a guest). This is all _SEG does really, no magic at all. _BBN returns bus number assigned by the BIOS to host bridge. Nothing qemu visible again. So _SEG and _BBN gives you two numbers assigned by a guest FW. Nothing qemu can use to identify a device. > > > > And the spec says, e.g.: > > > > > > the memory mapped configuration base > > > address (always corresponds to bus number 0) for the PCI Segment Group > > > of the host bridge is provided by _CBA and the bus range covered by the > > > base address is indicated by the corresponding bus range specified in > > > _CRS. > > > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets > > solve the problem for PCI first and then move to PCI Express. Jumping from one > > to another destruct us from main discussion. > > I think this is what confuses us. As long as you are using cf8/cfc there's no > concept of a domain really. > Thus: > /pci@i0cf8 > > is probably enough for BIOS boot because we'll need to make root bus numbers > unique for legacy guests/option ROMs. But this is not a hardware requirement > and might become easier to ignore eith EFI. > You do not need MMCONFIG to have multiple PCI domains. You can have one configured via standard cf8/cfc and another one on ef8/efc and one more at mmio fce00000 and you can address all of them: /pci@i0cf8 /pci@i0ef8 /pci@fce00000 And each one of those PCI domains can have 256 subbridges. > > > > > > > > > > > > > That should be enough for e.g. device_del. We do have the need to > > > > > describe the topology when we interface with firmware, e.g. to describe > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal > > > > > with), but that's probably the only case. > > > > > > > > > Describing HW topology is the only way to unambiguously describe device to > > > > something or someone outside qemu and have persistent device naming > > > > between different HW configuration. > > > > > > Not really, since ACPI is a binary blob programmed by qemu. > > > > > APCI is part of the guest, not qemu. > > Yes it runs in the guest but it's generated by qemu. On real hardware, > it's supplied by the motherboard. > It is not generated by qemu. Parts of it depend on HW and other part depend on how BIOS configure HW. _BBN for instance is clearly defined to return address assigned bu the BIOS. > > Just saying "not really" doesn't > > prove much. I still haven't seen any proposition from you that actually > > solve the problem. No, "lets use guest naming" is not it. There is no > > such thing as "The Guest". > > > > -- > > Gleb. > > I am sorry if I didn't make this clear. I think we should use the domain:bus > pair to name the root device. As these are unique and > You forgot to complete the sentence :) But you made it clear enough and it is incorrect. domain:bus pair not only not unique they do not exist in qemu at all and as such can't be used to address device. They are product of HW enumeration done by a guest OS just like eth0 or C:. -- Gleb.
On Sun, Nov 21, 2010 at 02:50:14PM +0200, Gleb Natapov wrote: > On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote: > > > > The guests. > > > Which one? There are many guests. Your favorite? > > > > > > > For CLI, we need an easy way to map a device in guest to the > > > > device in qemu and back. > > > Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what > > > you are saying is "lets use name that guest assigned to a device". > > > > No I am saying let's use the name that our ACPI tables assigned. > > > ACPI does not assign any name. In a best case ACPI tables describe resources > used by a device. Not only that. bus number and segment aren't resources as such. They describe addressing. > And not all guests qemu supports has support for ACPI. Qemu > even support machines types that do not support ACPI. So? Different machines -> different names. > > > > > > > > > > > > > It looks like you identify yourself with most of > > > > > qemu users, but if most qemu users are like you then qemu has not enough > > > > > users :) Most users that consider themselves to be "advanced" may know > > > > > what eth1 or /dev/sdb means. This doesn't mean we should provide > > > > > "device_del eth1" or "device_add /dev/sdb" command though. > > > > > > > > > > More important is that "domain" (encoded as number like you used to) > > > > > and "bus number" has no meaning from inside qemu. > > > > > So while I said many > > > > > times I don't care about exact CLI syntax to much it should make sense > > > > > at least. It can use id to specify PCI bus in CLI like this: > > > > > device_del pci.0:1.1. Or it can even use device id too like this: > > > > > device_del pci.0:ide.0. Or it can use HW topology like in FO device > > > > > path. But doing ah-hoc device enumeration inside qemu and then using it > > > > > for CLI is not it. > > > > > > > > > > > functionality in the guests. Qemu is buggy in the moment in that is > > > > > > uses the bus addresses assigned by guest and not the ones in ACPI, > > > > > > but that can be fixed. > > > > > It looks like you confused ACPI _SEG for something it isn't. > > > > > > > > Maybe I did. This is what linux does: > > > > > > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root > > > > *root) > > > > { > > > > struct acpi_device *device = root->device; > > > > int domain = root->segment; > > > > int busnum = root->secondary.start; > > > > > > > > And I think this is consistent with the spec. > > > > > > > It means that one domain may include several host bridges. > > > At that level > > > domain is defined as something that have unique name for each device > > > inside it thus no two buses in one segment/domain can have same bus > > > number. This is what PCI spec tells you. > > > > And that really is enough for CLI because all we need is locate the > > specific slot in a unique way. > > > At qemu level we do not have bus numbers. They are assigned by a guest. > So inside a guest domain:bus:slot.func points you to a device, but in > qemu does not enumerate buses. > > > > And this further shows that using "domain" as defined by guest is very > > > bad idea. > > > > As defined by ACPI, really. > > > ACPI is a part of a guest software that may not event present in the > guest. How is it relevant? It's relevant because this is what guests use. To access the root device with cf8/cfc you need to know the bus number assigned to it by firmware. How that was assigned is of interest to BIOS/ACPI but not really interesting to the user or, I suspect, guest OS. > > > > > ACPI spec > > > > > says that PCI segment group is purely software concept managed by system > > > > > firmware. In fact one segment may include multiple PCI host bridges. > > > > > > > > It can't I think: > > > Read _BBN definition: > > > The _BBN object is located under a PCI host bridge and must be unique for > > > every host bridge within a segment since it is the PCI bus number. > > > > > > Clearly above speaks about multiple host bridge within a segment. > > > > Yes, it looks like the firmware spec allows that. > It even have explicit example that shows it. > > > > > > > Multiple Host Bridges > > > > > > > > A platform may have multiple PCI Express or PCI-X host bridges. The base > > > > address for the > > > > MMCONFIG space for these host bridges may need to be allocated at > > > > different locations. In such > > > > cases, using MCFG table and _CBA method as defined in this section means > > > > that each of these host > > > > bridges must be in its own PCI Segment Group. > > > > > > > This is not from ACPI spec, > > > > PCI Firmware Specification 3.0 > > > > > but without going to deep into it above > > > paragraph talks about some particular case when each host bridge must > > > be in its own PCI Segment Group with is a definite prove that in other > > > cases multiple host bridges can be in on segment group. > > > > I stand corrected. I think you are right. But note that if they are, > > they must have distinct bus numbers assigned by ACPI. > ACPI does not assign any numbers. For all root pci devices firmware must supply BBN number. This is the bus number, isn't it? For nested buses, this is optional. > Bios enumerates buses and assign > numbers. There's no standard way to enumerate pci root devices in guest AFAIK. The spec says: Firmware must configure all Host Bridges in the systems, even if they are not connected to a console or boot device. Firmware must configure Host Bridges in order to allow operating systems to use the devices below the Host Bridges. This is because the Host Bridges programming model is not defined by the PCI Specifications. > ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits > one layer below all this and does not enumerate PC buses. Even if we make > it to do so there is not way to guaranty that guest will enumerate them > in the same order since there is more then one way to do enumeration. I > repeated this numerous times to you already. ACPI is really part of the motherboard. Calling it the guest just confuses things. Guest OS can override bus numbering for nested buses but not for root buses. > > > > > > > > > > > _SEG > > > > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS > > > > > (Current Resource Settings) for that just like OF. No surprise there. > > > > > > > > OSPM uses both I think. > > > > > > > > All I see linux do with CRS is get the bus number range. > > > So lets assume that HW has two PCI host bridges and ACPI has: > > > Device(PCI0) { > > > Name (_HID, EisaId ("PNP0A03")) > > > Name (_SEG, 0x00) > > > } > > > Device(PCI1) { > > > Name (_HID, EisaId ("PNP0A03")) > > > Name (_SEG, 0x01) > > > } > > > I.e no _CRS to describe resources. How do you think OSPM knows which of > > > two pci host bridges is PCI0 and which one is PCI1? > > > > You must be able to uniquely address any bridge using the combination of _SEG > > and _BBN. > > No at all. And saying "you must be able" without actually show how > doesn't prove anything. _SEG is relevant only for those host bridges > that support MMCONFIG (not all of them do, and none that qemu support > does yet). _SEG points to specific entry in MCFG table and MCFG entry > holds base address for MMCONFIG space for the bridge (this address > is configured by a guest). This is all _SEG does really, no magic at > all. _BBN returns bus number assigned by the BIOS to host bridge. Nothing > qemu visible again. > So _SEG and _BBN gives you two numbers assigned by > a guest FW. Nothing qemu can use to identify a device. This FW is given to guest by qemu. It only assigns bus numbers because qemu told it to do so. > > > > > > And the spec says, e.g.: > > > > > > > > the memory mapped configuration base > > > > address (always corresponds to bus number 0) for the PCI Segment Group > > > > of the host bridge is provided by _CBA and the bus range covered by the > > > > base address is indicated by the corresponding bus range specified in > > > > _CRS. > > > > > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets > > > solve the problem for PCI first and then move to PCI Express. Jumping from one > > > to another destruct us from main discussion. > > > > I think this is what confuses us. As long as you are using cf8/cfc there's no > > concept of a domain really. > > Thus: > > /pci@i0cf8 > > > > is probably enough for BIOS boot because we'll need to make root bus numbers > > unique for legacy guests/option ROMs. But this is not a hardware requirement > > and might become easier to ignore eith EFI. > > > You do not need MMCONFIG to have multiple PCI domains. You can have one > configured via standard cf8/cfc and another one on ef8/efc and one more > at mmio fce00000 and you can address all of them: > /pci@i0cf8 > /pci@i0ef8 > /pci@fce00000 > > And each one of those PCI domains can have 256 subbridges. Will common guests such as windows or linux be able to use them? This seems to be outside the scope of the PCI Firmware specification, which says that bus numbers must be unique. > > > > > > > > > > > > > > > > That should be enough for e.g. device_del. We do have the need to > > > > > > describe the topology when we interface with firmware, e.g. to describe > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal > > > > > > with), but that's probably the only case. > > > > > > > > > > > Describing HW topology is the only way to unambiguously describe device to > > > > > something or someone outside qemu and have persistent device naming > > > > > between different HW configuration. > > > > > > > > Not really, since ACPI is a binary blob programmed by qemu. > > > > > > > APCI is part of the guest, not qemu. > > > > Yes it runs in the guest but it's generated by qemu. On real hardware, > > it's supplied by the motherboard. > > > It is not generated by qemu. Parts of it depend on HW and other part depend > on how BIOS configure HW. _BBN for instance is clearly defined to return > address assigned bu the BIOS. BIOS is supplied on the motherboard and in our case by qemu as well. There's no standard way for BIOS to assign bus number to the pci root, so it does it in device-specific way. Why should a management tool or a CLI user care about these? As far as they are concerned we could use some PV scheme to find root devices and assign bus numbers, and it would be exactly the same. > > > Just saying "not really" doesn't > > > prove much. I still haven't seen any proposition from you that actually > > > solve the problem. No, "lets use guest naming" is not it. There is no > > > such thing as "The Guest". > > > > > > -- > > > Gleb. > > > > I am sorry if I didn't make this clear. I think we should use the domain:bus > > pair to name the root device. As these are unique and > > > You forgot to complete the sentence :) But you made it clear enough and > it is incorrect. domain:bus pair not only not unique they do not exist > in qemu at all Sure they do. domain maps to mcfg address for express. bus is used for cf8/cfc addressing. They are assigned by BIOS but since BIOS is supplied with hardware the point is moot. > and as such can't be used to address device. They are > product of HW enumeration done by a guest OS just like eth0 or C:. > > -- > Gleb. There's a huge difference between BIOS and guest OS, and between bus numbers of pci root and of nested bridges. Describing hardware io ports makes sense if you are trying to communicate data from qemu to the BIOS. But the rest of the world might not care.
On Sun, Nov 21, 2010 at 04:48:44PM +0200, Michael S. Tsirkin wrote: > On Sun, Nov 21, 2010 at 02:50:14PM +0200, Gleb Natapov wrote: > > On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote: > > > > > The guests. > > > > Which one? There are many guests. Your favorite? > > > > > > > > > For CLI, we need an easy way to map a device in guest to the > > > > > device in qemu and back. > > > > Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what > > > > you are saying is "lets use name that guest assigned to a device". > > > > > > No I am saying let's use the name that our ACPI tables assigned. > > > > > ACPI does not assign any name. In a best case ACPI tables describe resources > > used by a device. > > Not only that. bus number and segment aren't resources as such. > They describe addressing. > > > And not all guests qemu supports has support for ACPI. Qemu > > even support machines types that do not support ACPI. > > So? Different machines -> different names. > You want to have different cli for different type of machines qemu supports? > > > > > > > > > > > > > > > > It looks like you identify yourself with most of > > > > > > qemu users, but if most qemu users are like you then qemu has not enough > > > > > > users :) Most users that consider themselves to be "advanced" may know > > > > > > what eth1 or /dev/sdb means. This doesn't mean we should provide > > > > > > "device_del eth1" or "device_add /dev/sdb" command though. > > > > > > > > > > > > More important is that "domain" (encoded as number like you used to) > > > > > > and "bus number" has no meaning from inside qemu. > > > > > > So while I said many > > > > > > times I don't care about exact CLI syntax to much it should make sense > > > > > > at least. It can use id to specify PCI bus in CLI like this: > > > > > > device_del pci.0:1.1. Or it can even use device id too like this: > > > > > > device_del pci.0:ide.0. Or it can use HW topology like in FO device > > > > > > path. But doing ah-hoc device enumeration inside qemu and then using it > > > > > > for CLI is not it. > > > > > > > > > > > > > functionality in the guests. Qemu is buggy in the moment in that is > > > > > > > uses the bus addresses assigned by guest and not the ones in ACPI, > > > > > > > but that can be fixed. > > > > > > It looks like you confused ACPI _SEG for something it isn't. > > > > > > > > > > Maybe I did. This is what linux does: > > > > > > > > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root > > > > > *root) > > > > > { > > > > > struct acpi_device *device = root->device; > > > > > int domain = root->segment; > > > > > int busnum = root->secondary.start; > > > > > > > > > > And I think this is consistent with the spec. > > > > > > > > > It means that one domain may include several host bridges. > > > > At that level > > > > domain is defined as something that have unique name for each device > > > > inside it thus no two buses in one segment/domain can have same bus > > > > number. This is what PCI spec tells you. > > > > > > And that really is enough for CLI because all we need is locate the > > > specific slot in a unique way. > > > > > At qemu level we do not have bus numbers. They are assigned by a guest. > > So inside a guest domain:bus:slot.func points you to a device, but in > > qemu does not enumerate buses. > > > > > > And this further shows that using "domain" as defined by guest is very > > > > bad idea. > > > > > > As defined by ACPI, really. > > > > > ACPI is a part of a guest software that may not event present in the > > guest. How is it relevant? > > It's relevant because this is what guests use. To access the root > device with cf8/cfc you need to know the bus number assigned to it > by firmware. How that was assigned is of interest to BIOS/ACPI but not > really interesting to the user or, I suspect, guest OS. > Of course this is incorrect. OS can re-enumerate PCI if it wishes. Linux have cmd just for that. And saying that ACPI is relevant because this is what guest software use in a reply to sentence that states that not all guest even use ACPI is, well, strange. And ACPI describes only HW that present at boot time. What if you hot-plugged root pci bridge? How non existent PCI naming helps you? > > > > > > ACPI spec > > > > > > says that PCI segment group is purely software concept managed by system > > > > > > firmware. In fact one segment may include multiple PCI host bridges. > > > > > > > > > > It can't I think: > > > > Read _BBN definition: > > > > The _BBN object is located under a PCI host bridge and must be unique for > > > > every host bridge within a segment since it is the PCI bus number. > > > > > > > > Clearly above speaks about multiple host bridge within a segment. > > > > > > Yes, it looks like the firmware spec allows that. > > It even have explicit example that shows it. > > > > > > > > > > Multiple Host Bridges > > > > > > > > > > A platform may have multiple PCI Express or PCI-X host bridges. The base > > > > > address for the > > > > > MMCONFIG space for these host bridges may need to be allocated at > > > > > different locations. In such > > > > > cases, using MCFG table and _CBA method as defined in this section means > > > > > that each of these host > > > > > bridges must be in its own PCI Segment Group. > > > > > > > > > This is not from ACPI spec, > > > > > > PCI Firmware Specification 3.0 > > > > > > > but without going to deep into it above > > > > paragraph talks about some particular case when each host bridge must > > > > be in its own PCI Segment Group with is a definite prove that in other > > > > cases multiple host bridges can be in on segment group. > > > > > > I stand corrected. I think you are right. But note that if they are, > > > they must have distinct bus numbers assigned by ACPI. > > ACPI does not assign any numbers. > > For all root pci devices firmware must supply BBN number. This is the > bus number, isn't it? For nested buses, this is optional. Nonsense. _BBN is optional and does not present in Seabios DSDT. As far as I can tell it is only needed if PCI segment group has more then one pci host bridges. > > > Bios enumerates buses and assign > > numbers. > > There's no standard way to enumerate pci root devices in guest AFAIK. > The spec says: > Firmware must configure all Host Bridges in the systems, even if > they are not connected to a console or boot device. Firmware must > configure Host Bridges in order to allow operating systems to use the > devices below the Host Bridges. This is because the Host Bridges > programming model is not defined by the PCI Specifications. > > Guest should be aware of HW to use it. Be it through bios or driver. > > ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits > > one layer below all this and does not enumerate PC buses. Even if we make > > it to do so there is not way to guaranty that guest will enumerate them > > in the same order since there is more then one way to do enumeration. I > > repeated this numerous times to you already. > > ACPI is really part of the motherboard. Calling it the guest just > confuses things. Guest OS can override bus numbering for nested buses > but not for root buses. > If calling ACPI part of a guest confuses you then you are already confused. Guest OS can do whatever it wishes with any enumeration FW did if it knows better. > > > > > > > > > > > > > > _SEG > > > > > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS > > > > > > (Current Resource Settings) for that just like OF. No surprise there. > > > > > > > > > > OSPM uses both I think. > > > > > > > > > > All I see linux do with CRS is get the bus number range. > > > > So lets assume that HW has two PCI host bridges and ACPI has: > > > > Device(PCI0) { > > > > Name (_HID, EisaId ("PNP0A03")) > > > > Name (_SEG, 0x00) > > > > } > > > > Device(PCI1) { > > > > Name (_HID, EisaId ("PNP0A03")) > > > > Name (_SEG, 0x01) > > > > } > > > > I.e no _CRS to describe resources. How do you think OSPM knows which of > > > > two pci host bridges is PCI0 and which one is PCI1? > > > > > > You must be able to uniquely address any bridge using the combination of _SEG > > > and _BBN. > > > > No at all. And saying "you must be able" without actually show how > > doesn't prove anything. _SEG is relevant only for those host bridges > > that support MMCONFIG (not all of them do, and none that qemu support > > does yet). _SEG points to specific entry in MCFG table and MCFG entry > > holds base address for MMCONFIG space for the bridge (this address > > is configured by a guest). This is all _SEG does really, no magic at > > all. _BBN returns bus number assigned by the BIOS to host bridge. Nothing > > qemu visible again. > > So _SEG and _BBN gives you two numbers assigned by > > a guest FW. Nothing qemu can use to identify a device. > > This FW is given to guest by qemu. It only assigns bus numbers > because qemu told it to do so. Seabios is just a guest qemu ships. There are other FW for qemu. Bochs bios, openfirmware, efi. All of them where developed outside of qemu project and all of them are usable without qemu. You can't consider them be part of qemu any more then Linux/Windows with virtio drivers. > > > > > > > > > And the spec says, e.g.: > > > > > > > > > > the memory mapped configuration base > > > > > address (always corresponds to bus number 0) for the PCI Segment Group > > > > > of the host bridge is provided by _CBA and the bus range covered by the > > > > > base address is indicated by the corresponding bus range specified in > > > > > _CRS. > > > > > > > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets > > > > solve the problem for PCI first and then move to PCI Express. Jumping from one > > > > to another destruct us from main discussion. > > > > > > I think this is what confuses us. As long as you are using cf8/cfc there's no > > > concept of a domain really. > > > Thus: > > > /pci@i0cf8 > > > > > > is probably enough for BIOS boot because we'll need to make root bus numbers > > > unique for legacy guests/option ROMs. But this is not a hardware requirement > > > and might become easier to ignore eith EFI. > > > > > You do not need MMCONFIG to have multiple PCI domains. You can have one > > configured via standard cf8/cfc and another one on ef8/efc and one more > > at mmio fce00000 and you can address all of them: > > /pci@i0cf8 > > /pci@i0ef8 > > /pci@fce00000 > > > > And each one of those PCI domains can have 256 subbridges. > > Will common guests such as windows or linux be able to use them? This With proper drivers yes. There is HW with more then one PCI bus and I think qemu emulates some of it (PPC MAC for instance). > seems to be outside the scope of the PCI Firmware specification, which > says that bus numbers must be unique. They must be unique per PCI segment group. > > > > > > > > > > > > > > > > > > > > That should be enough for e.g. device_del. We do have the need to > > > > > > > describe the topology when we interface with firmware, e.g. to describe > > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal > > > > > > > with), but that's probably the only case. > > > > > > > > > > > > > Describing HW topology is the only way to unambiguously describe device to > > > > > > something or someone outside qemu and have persistent device naming > > > > > > between different HW configuration. > > > > > > > > > > Not really, since ACPI is a binary blob programmed by qemu. > > > > > > > > > APCI is part of the guest, not qemu. > > > > > > Yes it runs in the guest but it's generated by qemu. On real hardware, > > > it's supplied by the motherboard. > > > > > It is not generated by qemu. Parts of it depend on HW and other part depend > > on how BIOS configure HW. _BBN for instance is clearly defined to return > > address assigned bu the BIOS. > > BIOS is supplied on the motherboard and in our case by qemu as well. You can replace MB bios by coreboot+seabios on some of them. Manufacturer don't want you to do it and make it hard to do, but otherwise this is just software, not some magic dust. > There's no standard way for BIOS to assign bus number to the pci root, > so it does it in device-specific way. Why should a management tool > or a CLI user care about these? As far as they are concerned > we could use some PV scheme to find root devices and assign bus > numbers, and it would be exactly the same. > Go write KVM userspace that does that. AFAIK there is project out there that tries to do that. No luck so far. Your world view is very x86/Linux centric. You need to broaden it a little bit. Next time you propose something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd. > > > > Just saying "not really" doesn't > > > > prove much. I still haven't seen any proposition from you that actually > > > > solve the problem. No, "lets use guest naming" is not it. There is no > > > > such thing as "The Guest". > > > > > > > > -- > > > > Gleb. > > > > > > I am sorry if I didn't make this clear. I think we should use the domain:bus > > > pair to name the root device. As these are unique and > > > > > You forgot to complete the sentence :) But you made it clear enough and > > it is incorrect. domain:bus pair not only not unique they do not exist > > in qemu at all > > Sure they do. domain maps to mcfg address for express. bus is used for mcfg is optional as far as I can see. You can compile out MMCONFIG support on Linux. > cf8/cfc addressing. They are assigned by BIOS but since BIOS > is supplied with hardware the point is moot. Most PC hardware is supplied with Windows, so what? BIOS is a code that runs in a guest. It is part of a guest. Every line of code executed by vcpu belongs to a guest. No need to redefine things to prove you point. > > > and as such can't be used to address device. They are > > product of HW enumeration done by a guest OS just like eth0 or C:. > > > > -- > > Gleb. > > There's a huge difference between BIOS and guest OS, Not true. > and between bus > numbers of pci root and of nested bridges. Really? What is it? > > Describing hardware io ports makes sense if you are trying to > communicate data from qemu to the BIOS. But the rest of the world might > not care. > The part of the world that manage HW cares. You may need to add device from monitor before first line of BIOS is event executed. How can you rely on BIOS enumerate of devices in this case? -- Gleb.
On Sun, Nov 21, 2010 at 06:01:11PM +0200, Gleb Natapov wrote: > On Sun, Nov 21, 2010 at 04:48:44PM +0200, Michael S. Tsirkin wrote: > > On Sun, Nov 21, 2010 at 02:50:14PM +0200, Gleb Natapov wrote: > > > On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote: > > > > > > The guests. > > > > > Which one? There are many guests. Your favorite? > > > > > > > > > > > For CLI, we need an easy way to map a device in guest to the > > > > > > device in qemu and back. > > > > > Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what > > > > > you are saying is "lets use name that guest assigned to a device". > > > > > > > > No I am saying let's use the name that our ACPI tables assigned. > > > > > > > ACPI does not assign any name. In a best case ACPI tables describe resources > > > used by a device. > > > > Not only that. bus number and segment aren't resources as such. > > They describe addressing. > > > > > And not all guests qemu supports has support for ACPI. Qemu > > > even support machines types that do not support ACPI. > > > > So? Different machines -> different names. > > > You want to have different cli for different type of machines qemu > supports? Different device names. > > > > > > > > > > > > > > > > > > > It looks like you identify yourself with most of > > > > > > > qemu users, but if most qemu users are like you then qemu has not enough > > > > > > > users :) Most users that consider themselves to be "advanced" may know > > > > > > > what eth1 or /dev/sdb means. This doesn't mean we should provide > > > > > > > "device_del eth1" or "device_add /dev/sdb" command though. > > > > > > > > > > > > > > More important is that "domain" (encoded as number like you used to) > > > > > > > and "bus number" has no meaning from inside qemu. > > > > > > > So while I said many > > > > > > > times I don't care about exact CLI syntax to much it should make sense > > > > > > > at least. It can use id to specify PCI bus in CLI like this: > > > > > > > device_del pci.0:1.1. Or it can even use device id too like this: > > > > > > > device_del pci.0:ide.0. Or it can use HW topology like in FO device > > > > > > > path. But doing ah-hoc device enumeration inside qemu and then using it > > > > > > > for CLI is not it. > > > > > > > > > > > > > > > functionality in the guests. Qemu is buggy in the moment in that is > > > > > > > > uses the bus addresses assigned by guest and not the ones in ACPI, > > > > > > > > but that can be fixed. > > > > > > > It looks like you confused ACPI _SEG for something it isn't. > > > > > > > > > > > > Maybe I did. This is what linux does: > > > > > > > > > > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root > > > > > > *root) > > > > > > { > > > > > > struct acpi_device *device = root->device; > > > > > > int domain = root->segment; > > > > > > int busnum = root->secondary.start; > > > > > > > > > > > > And I think this is consistent with the spec. > > > > > > > > > > > It means that one domain may include several host bridges. > > > > > At that level > > > > > domain is defined as something that have unique name for each device > > > > > inside it thus no two buses in one segment/domain can have same bus > > > > > number. This is what PCI spec tells you. > > > > > > > > And that really is enough for CLI because all we need is locate the > > > > specific slot in a unique way. > > > > > > > At qemu level we do not have bus numbers. They are assigned by a guest. > > > So inside a guest domain:bus:slot.func points you to a device, but in > > > qemu does not enumerate buses. > > > > > > > > And this further shows that using "domain" as defined by guest is very > > > > > bad idea. > > > > > > > > As defined by ACPI, really. > > > > > > > ACPI is a part of a guest software that may not event present in the > > > guest. How is it relevant? > > > > It's relevant because this is what guests use. To access the root > > device with cf8/cfc you need to know the bus number assigned to it > > by firmware. How that was assigned is of interest to BIOS/ACPI but not > > really interesting to the user or, I suspect, guest OS. > > > Of course this is incorrect. OS can re-enumerate PCI if it wishes. Linux > have cmd just for that. I haven't looked but I suspect linux will simply assume cf8/cfc and and start doing it from there. If that doesn't get you the root device you wanted, tough. > And saying that ACPI is relevant because this is > what guest software use in a reply to sentence that states that not all > guest even use ACPI is, well, strange. > > And ACPI describes only HW that present at boot time. What if you > hot-plugged root pci bridge? How non existent PCI naming helps you? that's described by ACPI as well. > > > > > > > ACPI spec > > > > > > > says that PCI segment group is purely software concept managed by system > > > > > > > firmware. In fact one segment may include multiple PCI host bridges. > > > > > > > > > > > > It can't I think: > > > > > Read _BBN definition: > > > > > The _BBN object is located under a PCI host bridge and must be unique for > > > > > every host bridge within a segment since it is the PCI bus number. > > > > > > > > > > Clearly above speaks about multiple host bridge within a segment. > > > > > > > > Yes, it looks like the firmware spec allows that. > > > It even have explicit example that shows it. > > > > > > > > > > > > > Multiple Host Bridges > > > > > > > > > > > > A platform may have multiple PCI Express or PCI-X host bridges. The base > > > > > > address for the > > > > > > MMCONFIG space for these host bridges may need to be allocated at > > > > > > different locations. In such > > > > > > cases, using MCFG table and _CBA method as defined in this section means > > > > > > that each of these host > > > > > > bridges must be in its own PCI Segment Group. > > > > > > > > > > > This is not from ACPI spec, > > > > > > > > PCI Firmware Specification 3.0 > > > > > > > > > but without going to deep into it above > > > > > paragraph talks about some particular case when each host bridge must > > > > > be in its own PCI Segment Group with is a definite prove that in other > > > > > cases multiple host bridges can be in on segment group. > > > > > > > > I stand corrected. I think you are right. But note that if they are, > > > > they must have distinct bus numbers assigned by ACPI. > > > ACPI does not assign any numbers. > > > > For all root pci devices firmware must supply BBN number. This is the > > bus number, isn't it? For nested buses, this is optional. > Nonsense. _BBN is optional and does not present in Seabios DSDT. The spec says it's not optional for host bridges: Firmware must report Host Bridges in the ACPI name space. Each Host Bridge object must contain the following objects: _HID and _CID _CRS to determine all resources consumed and produced (passed through to the secondary bus) by the host bridge. Firmware allocates resources (Memory Addresses, I/O Port, etc.) to Host Bridges. The _CRS descriptor informs the operating system of the resources it may use for configuring devices below the Host Bridge. ● _TRA, _TTP, and _TRS translation offsets to inform the operating system of the mapping between the primary bus and the secondary bus. _PRT and the interrupt descriptor to determine interrupt routing. _BBN to obtain a bus number. so seabios seems to be out of spec. > As far > as I can tell it is only needed if PCI segment group has more then one > pci host bridges. No. Because cfc/cf8 are not aware of _SEG. > > > > > Bios enumerates buses and assign > > > numbers. > > > > There's no standard way to enumerate pci root devices in guest AFAIK. > > The spec says: > > Firmware must configure all Host Bridges in the systems, even if > > they are not connected to a console or boot device. Firmware must > > configure Host Bridges in order to allow operating systems to use the > > devices below the Host Bridges. This is because the Host Bridges > > programming model is not defined by the PCI Specifications. > > > > > Guest should be aware of HW to use it. Be it through bios or driver. Why should it? You get bus number and stiuck it in cf8/cfc, you get a config cycle. No magic HW awareness needed. > > > ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits > > > one layer below all this and does not enumerate PC buses. Even if we make > > > it to do so there is not way to guaranty that guest will enumerate them > > > in the same order since there is more then one way to do enumeration. I > > > repeated this numerous times to you already. > > > > ACPI is really part of the motherboard. Calling it the guest just > > confuses things. Guest OS can override bus numbering for nested buses > > but not for root buses. > > > If calling ACPI part of a guest confuses you then you are already > confused. Guest OS can do whatever it wishes with any enumeration FW did > if it knows better. > > > > > > > > > > > > > > > > > > _SEG > > > > > > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS > > > > > > > (Current Resource Settings) for that just like OF. No surprise there. > > > > > > > > > > > > OSPM uses both I think. > > > > > > > > > > > > All I see linux do with CRS is get the bus number range. > > > > > So lets assume that HW has two PCI host bridges and ACPI has: > > > > > Device(PCI0) { > > > > > Name (_HID, EisaId ("PNP0A03")) > > > > > Name (_SEG, 0x00) > > > > > } > > > > > Device(PCI1) { > > > > > Name (_HID, EisaId ("PNP0A03")) > > > > > Name (_SEG, 0x01) > > > > > } > > > > > I.e no _CRS to describe resources. How do you think OSPM knows which of > > > > > two pci host bridges is PCI0 and which one is PCI1? > > > > > > > > You must be able to uniquely address any bridge using the combination of _SEG > > > > and _BBN. > > > > > > No at all. And saying "you must be able" without actually show how > > > doesn't prove anything. _SEG is relevant only for those host bridges > > > that support MMCONFIG (not all of them do, and none that qemu support > > > does yet). _SEG points to specific entry in MCFG table and MCFG entry > > > holds base address for MMCONFIG space for the bridge (this address > > > is configured by a guest). This is all _SEG does really, no magic at > > > all. _BBN returns bus number assigned by the BIOS to host bridge. Nothing > > > qemu visible again. > > > So _SEG and _BBN gives you two numbers assigned by > > > a guest FW. Nothing qemu can use to identify a device. > > > > This FW is given to guest by qemu. It only assigns bus numbers > > because qemu told it to do so. > Seabios is just a guest qemu ships. There are other FW for qemu. Bochs > bios, openfirmware, efi. All of them where developed outside of qemu > project and all of them are usable without qemu. You can't consider them > be part of qemu any more then Linux/Windows with virtio drivers. > > > > > > > > > > > > > And the spec says, e.g.: > > > > > > > > > > > > the memory mapped configuration base > > > > > > address (always corresponds to bus number 0) for the PCI Segment Group > > > > > > of the host bridge is provided by _CBA and the bus range covered by the > > > > > > base address is indicated by the corresponding bus range specified in > > > > > > _CRS. > > > > > > > > > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets > > > > > solve the problem for PCI first and then move to PCI Express. Jumping from one > > > > > to another destruct us from main discussion. > > > > > > > > I think this is what confuses us. As long as you are using cf8/cfc there's no > > > > concept of a domain really. > > > > Thus: > > > > /pci@i0cf8 > > > > > > > > is probably enough for BIOS boot because we'll need to make root bus numbers > > > > unique for legacy guests/option ROMs. But this is not a hardware requirement > > > > and might become easier to ignore eith EFI. > > > > > > > You do not need MMCONFIG to have multiple PCI domains. You can have one > > > configured via standard cf8/cfc and another one on ef8/efc and one more > > > at mmio fce00000 and you can address all of them: > > > /pci@i0cf8 > > > /pci@i0ef8 > > > /pci@fce00000 > > > > > > And each one of those PCI domains can have 256 subbridges. > > > > Will common guests such as windows or linux be able to use them? This > With proper drivers yes. There is HW with more then one PCI bus and I > think qemu emulates some of it (PPC MAC for instance). > > > seems to be outside the scope of the PCI Firmware specification, which > > says that bus numbers must be unique. > They must be unique per PCI segment group. > > > > > > > > > > > > > > > > > > > > > > > > > That should be enough for e.g. device_del. We do have the need to > > > > > > > > describe the topology when we interface with firmware, e.g. to describe > > > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal > > > > > > > > with), but that's probably the only case. > > > > > > > > > > > > > > > Describing HW topology is the only way to unambiguously describe device to > > > > > > > something or someone outside qemu and have persistent device naming > > > > > > > between different HW configuration. > > > > > > > > > > > > Not really, since ACPI is a binary blob programmed by qemu. > > > > > > > > > > > APCI is part of the guest, not qemu. > > > > > > > > Yes it runs in the guest but it's generated by qemu. On real hardware, > > > > it's supplied by the motherboard. > > > > > > > It is not generated by qemu. Parts of it depend on HW and other part depend > > > on how BIOS configure HW. _BBN for instance is clearly defined to return > > > address assigned bu the BIOS. > > > > BIOS is supplied on the motherboard and in our case by qemu as well. > You can replace MB bios by coreboot+seabios on some of them. > Manufacturer don't want you to do it and make it hard to do, but > otherwise this is just software, not some magic dust. > > > There's no standard way for BIOS to assign bus number to the pci root, > > so it does it in device-specific way. Why should a management tool > > or a CLI user care about these? As far as they are concerned > > we could use some PV scheme to find root devices and assign bus > > numbers, and it would be exactly the same. > > > Go write KVM userspace that does that. AFAIK there is project out there > that tries to do that. No luck so far. Your world view is very x86/Linux > centric. You need to broaden it a little bit. Next time you propose > something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd. > > > > > > > Just saying "not really" doesn't > > > > > prove much. I still haven't seen any proposition from you that actually > > > > > solve the problem. No, "lets use guest naming" is not it. There is no > > > > > such thing as "The Guest". > > > > > > > > > > -- > > > > > Gleb. > > > > > > > > I am sorry if I didn't make this clear. I think we should use the domain:bus > > > > pair to name the root device. As these are unique and > > > > > > > You forgot to complete the sentence :) But you made it clear enough and > > > it is incorrect. domain:bus pair not only not unique they do not exist > > > in qemu at all > > > > Sure they do. domain maps to mcfg address for express. bus is used for > mcfg is optional as far as I can see. You can compile out MMCONFIG > support on Linux. > > > cf8/cfc addressing. They are assigned by BIOS but since BIOS > > is supplied with hardware the point is moot. > Most PC hardware is supplied with Windows, so what? BIOS is a code that > runs in a guest. It is part of a guest. Every line of code executed by > vcpu belongs to a guest. No need to redefine things to prove you point. > > > > > > and as such can't be used to address device. They are > > > product of HW enumeration done by a guest OS just like eth0 or C:. > > > > > > -- > > > Gleb. > > > > There's a huge difference between BIOS and guest OS, > Not true. > > > and between bus > > numbers of pci root and of nested bridges. > Really? What is it? > > > > > Describing hardware io ports makes sense if you are trying to > > communicate data from qemu to the BIOS. But the rest of the world might > > not care. > > > The part of the world that manage HW cares. You may need to add device > from monitor before first line of BIOS is event executed. How can you > rely on BIOS enumerate of devices in this case? > > > -- > Gleb.
On Sun, Nov 21, 2010 at 06:38:31PM +0200, Michael S. Tsirkin wrote: > On Sun, Nov 21, 2010 at 06:01:11PM +0200, Gleb Natapov wrote: > > On Sun, Nov 21, 2010 at 04:48:44PM +0200, Michael S. Tsirkin wrote: > > > On Sun, Nov 21, 2010 at 02:50:14PM +0200, Gleb Natapov wrote: > > > > On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote: > > > > > > > The guests. > > > > > > Which one? There are many guests. Your favorite? > > > > > > > > > > > > > For CLI, we need an easy way to map a device in guest to the > > > > > > > device in qemu and back. > > > > > > Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what > > > > > > you are saying is "lets use name that guest assigned to a device". > > > > > > > > > > No I am saying let's use the name that our ACPI tables assigned. > > > > > > > > > ACPI does not assign any name. In a best case ACPI tables describe resources > > > > used by a device. > > > > > > Not only that. bus number and segment aren't resources as such. > > > They describe addressing. > > > > > > > And not all guests qemu supports has support for ACPI. Qemu > > > > even support machines types that do not support ACPI. > > > > > > So? Different machines -> different names. > > > > > You want to have different cli for different type of machines qemu > > supports? > > Different device names. > You mean on qemu-sparc for deleting PCI device you will use different device naming scheme?! > > > > > > > > > > > > > > > > > > > > > > It looks like you identify yourself with most of > > > > > > > > qemu users, but if most qemu users are like you then qemu has not enough > > > > > > > > users :) Most users that consider themselves to be "advanced" may know > > > > > > > > what eth1 or /dev/sdb means. This doesn't mean we should provide > > > > > > > > "device_del eth1" or "device_add /dev/sdb" command though. > > > > > > > > > > > > > > > > More important is that "domain" (encoded as number like you used to) > > > > > > > > and "bus number" has no meaning from inside qemu. > > > > > > > > So while I said many > > > > > > > > times I don't care about exact CLI syntax to much it should make sense > > > > > > > > at least. It can use id to specify PCI bus in CLI like this: > > > > > > > > device_del pci.0:1.1. Or it can even use device id too like this: > > > > > > > > device_del pci.0:ide.0. Or it can use HW topology like in FO device > > > > > > > > path. But doing ah-hoc device enumeration inside qemu and then using it > > > > > > > > for CLI is not it. > > > > > > > > > > > > > > > > > functionality in the guests. Qemu is buggy in the moment in that is > > > > > > > > > uses the bus addresses assigned by guest and not the ones in ACPI, > > > > > > > > > but that can be fixed. > > > > > > > > It looks like you confused ACPI _SEG for something it isn't. > > > > > > > > > > > > > > Maybe I did. This is what linux does: > > > > > > > > > > > > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root > > > > > > > *root) > > > > > > > { > > > > > > > struct acpi_device *device = root->device; > > > > > > > int domain = root->segment; > > > > > > > int busnum = root->secondary.start; > > > > > > > > > > > > > > And I think this is consistent with the spec. > > > > > > > > > > > > > It means that one domain may include several host bridges. > > > > > > At that level > > > > > > domain is defined as something that have unique name for each device > > > > > > inside it thus no two buses in one segment/domain can have same bus > > > > > > number. This is what PCI spec tells you. > > > > > > > > > > And that really is enough for CLI because all we need is locate the > > > > > specific slot in a unique way. > > > > > > > > > At qemu level we do not have bus numbers. They are assigned by a guest. > > > > So inside a guest domain:bus:slot.func points you to a device, but in > > > > qemu does not enumerate buses. > > > > > > > > > > And this further shows that using "domain" as defined by guest is very > > > > > > bad idea. > > > > > > > > > > As defined by ACPI, really. > > > > > > > > > ACPI is a part of a guest software that may not event present in the > > > > guest. How is it relevant? > > > > > > It's relevant because this is what guests use. To access the root > > > device with cf8/cfc you need to know the bus number assigned to it > > > by firmware. How that was assigned is of interest to BIOS/ACPI but not > > > really interesting to the user or, I suspect, guest OS. > > > > > Of course this is incorrect. OS can re-enumerate PCI if it wishes. Linux > > have cmd just for that. > > I haven't looked but I suspect linux will simply assume cf8/cfc and > and start doing it from there. If that doesn't get you the root > device you wanted, tough. > Linux runs on many platforms and on most of them there is no such thing as IO space. Also I believe Linux on x86 supported multi-root without mmconfig in the past. It may be enough to describe IO resource second pci host bridge uses in ACPI for Linux to use it. > > And saying that ACPI is relevant because this is > > what guest software use in a reply to sentence that states that not all > > guest even use ACPI is, well, strange. > > > > And ACPI describes only HW that present at boot time. What if you > > hot-plugged root pci bridge? How non existent PCI naming helps you? > > that's described by ACPI as well. > > > > > > > > > ACPI spec > > > > > > > > says that PCI segment group is purely software concept managed by system > > > > > > > > firmware. In fact one segment may include multiple PCI host bridges. > > > > > > > > > > > > > > It can't I think: > > > > > > Read _BBN definition: > > > > > > The _BBN object is located under a PCI host bridge and must be unique for > > > > > > every host bridge within a segment since it is the PCI bus number. > > > > > > > > > > > > Clearly above speaks about multiple host bridge within a segment. > > > > > > > > > > Yes, it looks like the firmware spec allows that. > > > > It even have explicit example that shows it. > > > > > > > > > > > > > > > > Multiple Host Bridges > > > > > > > > > > > > > > A platform may have multiple PCI Express or PCI-X host bridges. The base > > > > > > > address for the > > > > > > > MMCONFIG space for these host bridges may need to be allocated at > > > > > > > different locations. In such > > > > > > > cases, using MCFG table and _CBA method as defined in this section means > > > > > > > that each of these host > > > > > > > bridges must be in its own PCI Segment Group. > > > > > > > > > > > > > This is not from ACPI spec, > > > > > > > > > > PCI Firmware Specification 3.0 > > > > > > > > > > > but without going to deep into it above > > > > > > paragraph talks about some particular case when each host bridge must > > > > > > be in its own PCI Segment Group with is a definite prove that in other > > > > > > cases multiple host bridges can be in on segment group. > > > > > > > > > > I stand corrected. I think you are right. But note that if they are, > > > > > they must have distinct bus numbers assigned by ACPI. > > > > ACPI does not assign any numbers. > > > > > > For all root pci devices firmware must supply BBN number. This is the > > > bus number, isn't it? For nested buses, this is optional. > > Nonsense. _BBN is optional and does not present in Seabios DSDT. > > The spec says it's not optional for host bridges: > > Firmware must report Host Bridges in the ACPI name space. Each Host > Bridge object must contain the following objects: > _HID and _CID > _CRS to determine all resources consumed and produced (passed through > to the secondary bus) > by the host bridge. Firmware allocates resources (Memory Addresses, > I/O Port, etc.) to Host > Bridges. The _CRS descriptor informs the operating system of the > resources it may use for > configuring devices below the Host Bridge. > ● _TRA, _TTP, and _TRS translation offsets to inform the operating > system of the mapping > between the primary bus and the secondary bus. > _PRT and the interrupt descriptor to determine interrupt routing. > > _BBN to obtain a bus number. > > so seabios seems to be out of spec. > Looks like it is. Well implementation will be easy, just return 0. Also same spec says that Firmware don't have to fully enumerate all PCI buses: "Firmware is not required to configure devices other than boot and console devices." > > As far > > as I can tell it is only needed if PCI segment group has more then one > > pci host bridges. > > No. Because cfc/cf8 are not aware of _SEG. More importantly Qemu is not ware of _SEG. As spec says: "it is purely a software concept managed by system firmware". > > > > > > > > Bios enumerates buses and assign > > > > numbers. > > > > > > There's no standard way to enumerate pci root devices in guest AFAIK. > > > The spec says: > > > Firmware must configure all Host Bridges in the systems, even if > > > they are not connected to a console or boot device. Firmware must > > > configure Host Bridges in order to allow operating systems to use the > > > devices below the Host Bridges. This is because the Host Bridges > > > programming model is not defined by the PCI Specifications. > > > > > > > > Guest should be aware of HW to use it. Be it through bios or driver. > > Why should it? You get bus number and stiuck it in cf8/cfc, > you get a config cycle. No magic HW awareness needed. > SW should be aware of cf8/cfc. Where will it stick bus number otherwise? Another PCI host bridge may sit on another IO or MMIO address and SW should be aware of it so it will be able to stick bus number there too. You ignored all other point below. Why? > > > > ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits > > > > one layer below all this and does not enumerate PC buses. Even if we make > > > > it to do so there is not way to guaranty that guest will enumerate them > > > > in the same order since there is more then one way to do enumeration. I > > > > repeated this numerous times to you already. > > > > > > ACPI is really part of the motherboard. Calling it the guest just > > > confuses things. Guest OS can override bus numbering for nested buses > > > but not for root buses. > > > > > If calling ACPI part of a guest confuses you then you are already > > confused. Guest OS can do whatever it wishes with any enumeration FW did > > if it knows better. > > > > > > > > > > > > > > > > > > > > > > _SEG > > > > > > > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS > > > > > > > > (Current Resource Settings) for that just like OF. No surprise there. > > > > > > > > > > > > > > OSPM uses both I think. > > > > > > > > > > > > > > All I see linux do with CRS is get the bus number range. > > > > > > So lets assume that HW has two PCI host bridges and ACPI has: > > > > > > Device(PCI0) { > > > > > > Name (_HID, EisaId ("PNP0A03")) > > > > > > Name (_SEG, 0x00) > > > > > > } > > > > > > Device(PCI1) { > > > > > > Name (_HID, EisaId ("PNP0A03")) > > > > > > Name (_SEG, 0x01) > > > > > > } > > > > > > I.e no _CRS to describe resources. How do you think OSPM knows which of > > > > > > two pci host bridges is PCI0 and which one is PCI1? > > > > > > > > > > You must be able to uniquely address any bridge using the combination of _SEG > > > > > and _BBN. > > > > > > > > No at all. And saying "you must be able" without actually show how > > > > doesn't prove anything. _SEG is relevant only for those host bridges > > > > that support MMCONFIG (not all of them do, and none that qemu support > > > > does yet). _SEG points to specific entry in MCFG table and MCFG entry > > > > holds base address for MMCONFIG space for the bridge (this address > > > > is configured by a guest). This is all _SEG does really, no magic at > > > > all. _BBN returns bus number assigned by the BIOS to host bridge. Nothing > > > > qemu visible again. > > > > So _SEG and _BBN gives you two numbers assigned by > > > > a guest FW. Nothing qemu can use to identify a device. > > > > > > This FW is given to guest by qemu. It only assigns bus numbers > > > because qemu told it to do so. > > Seabios is just a guest qemu ships. There are other FW for qemu. Bochs > > bios, openfirmware, efi. All of them where developed outside of qemu > > project and all of them are usable without qemu. You can't consider them > > be part of qemu any more then Linux/Windows with virtio drivers. > > > > > > > > > > > > > > > > > And the spec says, e.g.: > > > > > > > > > > > > > > the memory mapped configuration base > > > > > > > address (always corresponds to bus number 0) for the PCI Segment Group > > > > > > > of the host bridge is provided by _CBA and the bus range covered by the > > > > > > > base address is indicated by the corresponding bus range specified in > > > > > > > _CRS. > > > > > > > > > > > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets > > > > > > solve the problem for PCI first and then move to PCI Express. Jumping from one > > > > > > to another destruct us from main discussion. > > > > > > > > > > I think this is what confuses us. As long as you are using cf8/cfc there's no > > > > > concept of a domain really. > > > > > Thus: > > > > > /pci@i0cf8 > > > > > > > > > > is probably enough for BIOS boot because we'll need to make root bus numbers > > > > > unique for legacy guests/option ROMs. But this is not a hardware requirement > > > > > and might become easier to ignore eith EFI. > > > > > > > > > You do not need MMCONFIG to have multiple PCI domains. You can have one > > > > configured via standard cf8/cfc and another one on ef8/efc and one more > > > > at mmio fce00000 and you can address all of them: > > > > /pci@i0cf8 > > > > /pci@i0ef8 > > > > /pci@fce00000 > > > > > > > > And each one of those PCI domains can have 256 subbridges. > > > > > > Will common guests such as windows or linux be able to use them? This > > With proper drivers yes. There is HW with more then one PCI bus and I > > think qemu emulates some of it (PPC MAC for instance). > > > > > seems to be outside the scope of the PCI Firmware specification, which > > > says that bus numbers must be unique. > > They must be unique per PCI segment group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > That should be enough for e.g. device_del. We do have the need to > > > > > > > > > describe the topology when we interface with firmware, e.g. to describe > > > > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal > > > > > > > > > with), but that's probably the only case. > > > > > > > > > > > > > > > > > Describing HW topology is the only way to unambiguously describe device to > > > > > > > > something or someone outside qemu and have persistent device naming > > > > > > > > between different HW configuration. > > > > > > > > > > > > > > Not really, since ACPI is a binary blob programmed by qemu. > > > > > > > > > > > > > APCI is part of the guest, not qemu. > > > > > > > > > > Yes it runs in the guest but it's generated by qemu. On real hardware, > > > > > it's supplied by the motherboard. > > > > > > > > > It is not generated by qemu. Parts of it depend on HW and other part depend > > > > on how BIOS configure HW. _BBN for instance is clearly defined to return > > > > address assigned bu the BIOS. > > > > > > BIOS is supplied on the motherboard and in our case by qemu as well. > > You can replace MB bios by coreboot+seabios on some of them. > > Manufacturer don't want you to do it and make it hard to do, but > > otherwise this is just software, not some magic dust. > > > > > There's no standard way for BIOS to assign bus number to the pci root, > > > so it does it in device-specific way. Why should a management tool > > > or a CLI user care about these? As far as they are concerned > > > we could use some PV scheme to find root devices and assign bus > > > numbers, and it would be exactly the same. > > > > > Go write KVM userspace that does that. AFAIK there is project out there > > that tries to do that. No luck so far. Your world view is very x86/Linux > > centric. You need to broaden it a little bit. Next time you propose > > something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd. > > > > > > > > > > Just saying "not really" doesn't > > > > > > prove much. I still haven't seen any proposition from you that actually > > > > > > solve the problem. No, "lets use guest naming" is not it. There is no > > > > > > such thing as "The Guest". > > > > > > > > > > > > -- > > > > > > Gleb. > > > > > > > > > > I am sorry if I didn't make this clear. I think we should use the domain:bus > > > > > pair to name the root device. As these are unique and > > > > > > > > > You forgot to complete the sentence :) But you made it clear enough and > > > > it is incorrect. domain:bus pair not only not unique they do not exist > > > > in qemu at all > > > > > > Sure they do. domain maps to mcfg address for express. bus is used for > > mcfg is optional as far as I can see. You can compile out MMCONFIG > > support on Linux. > > > > > cf8/cfc addressing. They are assigned by BIOS but since BIOS > > > is supplied with hardware the point is moot. > > Most PC hardware is supplied with Windows, so what? BIOS is a code that > > runs in a guest. It is part of a guest. Every line of code executed by > > vcpu belongs to a guest. No need to redefine things to prove you point. > > > > > > > > > and as such can't be used to address device. They are > > > > product of HW enumeration done by a guest OS just like eth0 or C:. > > > > > > > > -- > > > > Gleb. > > > > > > There's a huge difference between BIOS and guest OS, > > Not true. > > > > > and between bus > > > numbers of pci root and of nested bridges. > > Really? What is it? > > > > > > > > Describing hardware io ports makes sense if you are trying to > > > communicate data from qemu to the BIOS. But the rest of the world might > > > not care. > > > > > The part of the world that manage HW cares. You may need to add device > > from monitor before first line of BIOS is event executed. How can you > > rely on BIOS enumerate of devices in this case? > > > > > > -- > > Gleb. -- Gleb.
On Sun, Nov 21, 2010 at 06:01:11PM +0200, Gleb Natapov wrote: > > This FW is given to guest by qemu. It only assigns bus numbers > > because qemu told it to do so. > Seabios is just a guest qemu ships. There are other FW for qemu. We don't support them though, do we? > Bochs > bios, openfirmware, efi. All of them where developed outside of qemu > project and all of them are usable without qemu. You can't consider them > be part of qemu any more then Linux/Windows with virtio drivers. You can also burn linuxbios onto your motherboard. If you do you void your warranty. > > > > > > > > > > > > And the spec says, e.g.: > > > > > > > > > > > > the memory mapped configuration base > > > > > > address (always corresponds to bus number 0) for the PCI Segment Group > > > > > > of the host bridge is provided by _CBA and the bus range covered by the > > > > > > base address is indicated by the corresponding bus range specified in > > > > > > _CRS. > > > > > > > > > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets > > > > > solve the problem for PCI first and then move to PCI Express. Jumping from one > > > > > to another destruct us from main discussion. > > > > > > > > I think this is what confuses us. As long as you are using cf8/cfc there's no > > > > concept of a domain really. > > > > Thus: > > > > /pci@i0cf8 > > > > > > > > is probably enough for BIOS boot because we'll need to make root bus numbers > > > > unique for legacy guests/option ROMs. But this is not a hardware requirement > > > > and might become easier to ignore eith EFI. > > > > > > > You do not need MMCONFIG to have multiple PCI domains. You can have one > > > configured via standard cf8/cfc and another one on ef8/efc and one more > > > at mmio fce00000 and you can address all of them: > > > /pci@i0cf8 > > > /pci@i0ef8 > > > /pci@fce00000 Isn't the mmio one relocatable? > > > > > > And each one of those PCI domains can have 256 subbridges. > > > > Will common guests such as windows or linux be able to use them? This > With proper drivers yes. There is HW with more then one PCI bus and I > think qemu emulates some of it (PPC MAC for instance). MAC is probably just poking in a couple of predefined places. That's not enumeration. > > seems to be outside the scope of the PCI Firmware specification, which > > says that bus numbers must be unique. > They must be unique per PCI segment group. We've come full circle, didn't we? i am saying we should let users specify PCI Segment group+bus as opposed to the io port, which they don't use. > > > > > > > > > > > > > > > > > > > > > > > > That should be enough for e.g. device_del. We do have the need to > > > > > > > > describe the topology when we interface with firmware, e.g. to describe > > > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal > > > > > > > > with), but that's probably the only case. > > > > > > > > > > > > > > > Describing HW topology is the only way to unambiguously describe device to > > > > > > > something or someone outside qemu and have persistent device naming > > > > > > > between different HW configuration. > > > > > > > > > > > > Not really, since ACPI is a binary blob programmed by qemu. > > > > > > > > > > > APCI is part of the guest, not qemu. > > > > > > > > Yes it runs in the guest but it's generated by qemu. On real hardware, > > > > it's supplied by the motherboard. > > > > > > > It is not generated by qemu. Parts of it depend on HW and other part depend > > > on how BIOS configure HW. _BBN for instance is clearly defined to return > > > address assigned bu the BIOS. > > > > BIOS is supplied on the motherboard and in our case by qemu as well. > You can replace MB bios by coreboot+seabios on some of them. > Manufacturer don't want you to do it and make it hard to do, but > otherwise this is just software, not some magic dust. They support common hardware but not all features will automatically work. > > There's no standard way for BIOS to assign bus number to the pci root, > > so it does it in device-specific way. Why should a management tool > > or a CLI user care about these? As far as they are concerned > > we could use some PV scheme to find root devices and assign bus > > numbers, and it would be exactly the same. > > > Go write KVM userspace that does that. AFAIK there is project out there > that tries to do that. No luck so far. Your world view is very x86/Linux > centric. You need to broaden it a little bit. Next time you propose > something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd. Your view is very qemu centric. Ask yourself whether what you propose will work with libvirt. Better yet, ask libvirt developers. > > > > > Just saying "not really" doesn't > > > > > prove much. I still haven't seen any proposition from you that actually > > > > > solve the problem. No, "lets use guest naming" is not it. There is no > > > > > such thing as "The Guest". > > > > > > > > > > -- > > > > > Gleb. > > > > > > > > I am sorry if I didn't make this clear. I think we should use the domain:bus > > > > pair to name the root device. As these are unique and > > > > > > > You forgot to complete the sentence :) But you made it clear enough and > > > it is incorrect. domain:bus pair not only not unique they do not exist > > > in qemu at all > > > > Sure they do. domain maps to mcfg address for express. bus is used for > mcfg is optional as far as I can see. You can compile out MMCONFIG > support on Linux. This doesn't mean that you will be able to address all devices on all systems if you do. > > cf8/cfc addressing. They are assigned by BIOS but since BIOS > > is supplied with hardware the point is moot. > Most PC hardware is supplied with Windows, so what? BIOS is a code that > runs in a guest. It is part of a guest. Every line of code executed by > vcpu belongs to a guest. No need to redefine things to prove you point. Call it what you want. I care about the payload that the user supplies. BIOS and ACPI are implementation details of qemu needed to run said payload smoothly. Said payload might not know or care about IO address used to program host pci bridge. > > > > > and as such can't be used to address device. They are > > > product of HW enumeration done by a guest OS just like eth0 or C:. > > > > > > -- > > > Gleb. > > > > There's a huge difference between BIOS and guest OS, > Not true. > > > and between bus > > numbers of pci root and of nested bridges. > Really? What is it? Look at the spec citations I provided. bus number for root must be set by firmware (or default to 0). The rest can be done by the OS. > > > > Describing hardware io ports makes sense if you are trying to > > communicate data from qemu to the BIOS. But the rest of the world might > > not care. > > > The part of the world that manage HW cares. You really want libvirt to deal with ioports? Why only that? Let them emulate PCI while we are at it? > You may need to add device > from monitor before first line of BIOS is event executed. How can you > rely on BIOS enumerate of devices in this case? We can control the bus numbers that the BIOS will assign to pci root. It's probably required to make them stable anyway. > > -- > Gleb.
On Sun, Nov 21, 2010 at 08:22:03PM +0200, Michael S. Tsirkin wrote: > On Sun, Nov 21, 2010 at 06:01:11PM +0200, Gleb Natapov wrote: > > > This FW is given to guest by qemu. It only assigns bus numbers > > > because qemu told it to do so. > > Seabios is just a guest qemu ships. There are other FW for qemu. > > We don't support them though, do we? > Who is "we"? As qemu community we support any guest code. > > Bochs > > bios, openfirmware, efi. All of them where developed outside of qemu > > project and all of them are usable without qemu. You can't consider them > > be part of qemu any more then Linux/Windows with virtio drivers. > > You can also burn linuxbios onto your motherboard. If you do you > void your warranty. And? What is your point? > > > > > > > > > > > > > > > > > And the spec says, e.g.: > > > > > > > > > > > > > > the memory mapped configuration base > > > > > > > address (always corresponds to bus number 0) for the PCI Segment Group > > > > > > > of the host bridge is provided by _CBA and the bus range covered by the > > > > > > > base address is indicated by the corresponding bus range specified in > > > > > > > _CRS. > > > > > > > > > > > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets > > > > > > solve the problem for PCI first and then move to PCI Express. Jumping from one > > > > > > to another destruct us from main discussion. > > > > > > > > > > I think this is what confuses us. As long as you are using cf8/cfc there's no > > > > > concept of a domain really. > > > > > Thus: > > > > > /pci@i0cf8 > > > > > > > > > > is probably enough for BIOS boot because we'll need to make root bus numbers > > > > > unique for legacy guests/option ROMs. But this is not a hardware requirement > > > > > and might become easier to ignore eith EFI. > > > > > > > > > You do not need MMCONFIG to have multiple PCI domains. You can have one > > > > configured via standard cf8/cfc and another one on ef8/efc and one more > > > > at mmio fce00000 and you can address all of them: > > > > /pci@i0cf8 > > > > /pci@i0ef8 > > > > /pci@fce00000 > > Isn't the mmio one relocatable? > No. If it is, there is a way to specify so. > > > > > > > > And each one of those PCI domains can have 256 subbridges. > > > > > > Will common guests such as windows or linux be able to use them? This > > With proper drivers yes. There is HW with more then one PCI bus and I > > think qemu emulates some of it (PPC MAC for instance). > > MAC is probably just poking in a couple of predefined places. That's > not enumeration. > System but is not enumerable on PC too. That is why you need ACPI to describe resources or you just poke into predefined places (0cf8-0cff in case of PCI). > > > seems to be outside the scope of the PCI Firmware specification, which > > > says that bus numbers must be unique. > > They must be unique per PCI segment group. > > We've come full circle, didn't we? i am saying we should let users > specify PCI Segment group+bus as opposed to the io port, which they > don't use. > When qemu creates device model there is no any "bus" number assigned to pci bridges and thus "bus" number is meaningless for qemu. PCI Segment group has no HW meaning at all (you seems to ignore this completely). So lets say you run qemu with -S and want to delete or add device. Where do you get bus number an PCI Segment group number to specify it? > > > > > > > > > > > > > > > > > > > > > > > > > > > > That should be enough for e.g. device_del. We do have the need to > > > > > > > > > describe the topology when we interface with firmware, e.g. to describe > > > > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal > > > > > > > > > with), but that's probably the only case. > > > > > > > > > > > > > > > > > Describing HW topology is the only way to unambiguously describe device to > > > > > > > > something or someone outside qemu and have persistent device naming > > > > > > > > between different HW configuration. > > > > > > > > > > > > > > Not really, since ACPI is a binary blob programmed by qemu. > > > > > > > > > > > > > APCI is part of the guest, not qemu. > > > > > > > > > > Yes it runs in the guest but it's generated by qemu. On real hardware, > > > > > it's supplied by the motherboard. > > > > > > > > > It is not generated by qemu. Parts of it depend on HW and other part depend > > > > on how BIOS configure HW. _BBN for instance is clearly defined to return > > > > address assigned bu the BIOS. > > > > > > BIOS is supplied on the motherboard and in our case by qemu as well. > > You can replace MB bios by coreboot+seabios on some of them. > > Manufacturer don't want you to do it and make it hard to do, but > > otherwise this is just software, not some magic dust. > > They support common hardware but not all features will automatically work. So what? The same can be said about Linux on a lot of HW. > > > > There's no standard way for BIOS to assign bus number to the pci root, > > > so it does it in device-specific way. Why should a management tool > > > or a CLI user care about these? As far as they are concerned > > > we could use some PV scheme to find root devices and assign bus > > > numbers, and it would be exactly the same. > > > > > Go write KVM userspace that does that. AFAIK there is project out there > > that tries to do that. No luck so far. Your world view is very x86/Linux > > centric. You need to broaden it a little bit. Next time you propose > > something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd. > > Your view is very qemu centric. Ask yourself whether what you propose > will work with libvirt. Better yet, ask libvirt developers. My view is qemu centric because we are developing qemu here. Anything that will work in qemu monitor will work with libvirt or any other management software. > > > > > > > Just saying "not really" doesn't > > > > > > prove much. I still haven't seen any proposition from you that actually > > > > > > solve the problem. No, "lets use guest naming" is not it. There is no > > > > > > such thing as "The Guest". > > > > > > > > > > > > -- > > > > > > Gleb. > > > > > > > > > > I am sorry if I didn't make this clear. I think we should use the domain:bus > > > > > pair to name the root device. As these are unique and > > > > > > > > > You forgot to complete the sentence :) But you made it clear enough and > > > > it is incorrect. domain:bus pair not only not unique they do not exist > > > > in qemu at all > > > > > > Sure they do. domain maps to mcfg address for express. bus is used for > > mcfg is optional as far as I can see. You can compile out MMCONFIG > > support on Linux. > > This doesn't mean that you will be able to address all devices > on all systems if you do. > Why qemu should care? Guest problem. > > > cf8/cfc addressing. They are assigned by BIOS but since BIOS > > > is supplied with hardware the point is moot. > > Most PC hardware is supplied with Windows, so what? BIOS is a code that > > runs in a guest. It is part of a guest. Every line of code executed by > > vcpu belongs to a guest. No need to redefine things to prove you point. > > Call it what you want. I care about the payload that the user supplies. > BIOS and ACPI are implementation details of qemu needed to run said > payload smoothly. Said payload might not know or care about > IO address used to program host pci bridge. And your point is? What "said payload" has to do with qemu CLI? Qemu CLI behaviour should not depend on a guest code. This is path to madness. > > > > > > > > and as such can't be used to address device. They are > > > > product of HW enumeration done by a guest OS just like eth0 or C:. > > > > > > > > -- > > > > Gleb. > > > > > > There's a huge difference between BIOS and guest OS, > > Not true. > > > > > and between bus > > > numbers of pci root and of nested bridges. > > Really? What is it? > > Look at the spec citations I provided. bus number for root must be set by > firmware (or default to 0). The rest can be done by the OS. > So there is not "huge difference" at all. This is just a good old bus number. Two host bridges may have the same bus number. Some buses may be not enumerated by a guest (neither FW nor OS) at all and should still be addressable from qemu CLI. > > > > > > Describing hardware io ports makes sense if you are trying to > > > communicate data from qemu to the BIOS. But the rest of the world might > > > not care. > > > > > The part of the world that manage HW cares. > > You really want libvirt to deal with ioports? Why only that? No, I want them to deal with data that make sense. And I stated in first mail after which you started this flame war that OF paths is not the only way that make sense for qemu CLI. Using qdev device id makes perfect sense too. So CLI may look like "devce_add pci.0:pci-bridge.1:ide.0" or similar. They may even prefer it since they already deal with them on the command line. Using bus addresses, on the other hand, that may not be even assigned makes no sense. > Let them emulate PCI while we are at it? Don't be ridiculous. Actually that is you who wants to teach management how bus numbers are assigned to PCI buses. I want to teach them about device topology which they should know already since they control how machine is created. > > > You may need to add device > > from monitor before first line of BIOS is event executed. How can you > > rely on BIOS enumerate of devices in this case? > > We can control the bus numbers that the BIOS will assign to pci root. And if you need to insert device that sits behind pci-to-pci bridge? Do you want to control that bus address too? And bus number do not unambiguously describe pci root since two roots can have same bus number just fine. > It's probably required to make them stable anyway. > Why? -- Gleb.
On Sun, Nov 21, 2010 at 09:29:59PM +0200, Gleb Natapov wrote: > > We can control the bus numbers that the BIOS will assign to pci root. > And if you need to insert device that sits behind pci-to-pci bridge? Do > you want to control that bus address too? And bus number do not unambiguously > describe pci root since two roots can have same bus number just fine. They have to have different segment numbers. > > It's probably required to make them stable anyway. > > > Why? To avoid bus renumbering on reboot after you add a pci-to-pci bridge.
On Sun, Nov 21, 2010 at 10:39:31PM +0200, Michael S. Tsirkin wrote: > On Sun, Nov 21, 2010 at 09:29:59PM +0200, Gleb Natapov wrote: > > > We can control the bus numbers that the BIOS will assign to pci root. > > And if you need to insert device that sits behind pci-to-pci bridge? Do > > you want to control that bus address too? And bus number do not unambiguously > > describe pci root since two roots can have same bus number just fine. > > They have to have different segment numbers. > AKA PCI domains AKA one more number assigned by a guest. > > > It's probably required to make them stable anyway. > > > > > Why? > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge. > Why should qemu care? -- Gleb.
On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote: > > > > It's probably required to make them stable anyway. > > > > > > > Why? > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge. > > > Why should qemu care? Stable bus numbering is a feature *users* care about, because some Guest OSes get confused when a card gets moved to another bus. > -- > Gleb.
On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote: > > > > > It's probably required to make them stable anyway. > > > > > > > > > Why? > > > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge. > > > > > Why should qemu care? > > Stable bus numbering is a feature *users* care about, because > some Guest OSes get confused when a card gets moved to another > bus. > So if user cares about it it should not change HW configuration of QEMU. I guess those OSes knows how to handle hot-pluggable equipment otherwise they will get confused on real HW too. Why QEMU should care to preserve something in a face of configuration change? -- Gleb.
On Mon, Nov 22, 2010 at 03:04:51PM +0200, Gleb Natapov wrote: > On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote: > > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote: > > > > > > It's probably required to make them stable anyway. > > > > > > > > > > > Why? > > > > > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge. > > > > > > > Why should qemu care? > > > > Stable bus numbering is a feature *users* care about, because > > some Guest OSes get confused when a card gets moved to another > > bus. > > > So if user cares about it it should not change HW configuration of QEMU. > I guess those OSes knows how to handle hot-pluggable equipment otherwise > they will get confused on real HW too. Why QEMU should care to preserve > something in a face of configuration change? > > -- > Gleb. We've been there, weren't we? See http://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses#KVM_Stable_PCI_Addresses
On Mon, Nov 22, 2010 at 04:50:29PM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 22, 2010 at 03:04:51PM +0200, Gleb Natapov wrote: > > On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote: > > > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote: > > > > > > > It's probably required to make them stable anyway. > > > > > > > > > > > > > Why? > > > > > > > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge. > > > > > > > > > Why should qemu care? > > > > > > Stable bus numbering is a feature *users* care about, because > > > some Guest OSes get confused when a card gets moved to another > > > bus. > > > > > So if user cares about it it should not change HW configuration of QEMU. > > I guess those OSes knows how to handle hot-pluggable equipment otherwise > > they will get confused on real HW too. Why QEMU should care to preserve > > something in a face of configuration change? > > > > -- > > Gleb. > > We've been there, weren't we? See > http://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses#KVM_Stable_PCI_Addresses > This is about stable HW configuration. -- Gleb.
On Mon, Nov 22, 2010 at 04:52:32PM +0200, Gleb Natapov wrote: > On Mon, Nov 22, 2010 at 04:50:29PM +0200, Michael S. Tsirkin wrote: > > On Mon, Nov 22, 2010 at 03:04:51PM +0200, Gleb Natapov wrote: > > > On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote: > > > > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote: > > > > > > > > It's probably required to make them stable anyway. > > > > > > > > > > > > > > > Why? > > > > > > > > > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge. > > > > > > > > > > > Why should qemu care? > > > > > > > > Stable bus numbering is a feature *users* care about, because > > > > some Guest OSes get confused when a card gets moved to another > > > > bus. > > > > > > > So if user cares about it it should not change HW configuration of QEMU. > > > I guess those OSes knows how to handle hot-pluggable equipment otherwise > > > they will get confused on real HW too. Why QEMU should care to preserve > > > something in a face of configuration change? > > > > > > -- > > > Gleb. > > > > We've been there, weren't we? See > > http://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses#KVM_Stable_PCI_Addresses > > > This is about stable HW configuration. Exactly. We have the same need for nested bridges and devices behind them. > -- > Gleb.
On Mon, Nov 22, 2010 at 04:56:16PM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 22, 2010 at 04:52:32PM +0200, Gleb Natapov wrote: > > On Mon, Nov 22, 2010 at 04:50:29PM +0200, Michael S. Tsirkin wrote: > > > On Mon, Nov 22, 2010 at 03:04:51PM +0200, Gleb Natapov wrote: > > > > On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote: > > > > > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote: > > > > > > > > > It's probably required to make them stable anyway. > > > > > > > > > > > > > > > > > Why? > > > > > > > > > > > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge. > > > > > > > > > > > > > Why should qemu care? > > > > > > > > > > Stable bus numbering is a feature *users* care about, because > > > > > some Guest OSes get confused when a card gets moved to another > > > > > bus. > > > > > > > > > So if user cares about it it should not change HW configuration of QEMU. > > > > I guess those OSes knows how to handle hot-pluggable equipment otherwise > > > > they will get confused on real HW too. Why QEMU should care to preserve > > > > something in a face of configuration change? > > > > > > > > -- > > > > Gleb. > > > > > > We've been there, weren't we? See > > > http://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses#KVM_Stable_PCI_Addresses > > > > > This is about stable HW configuration. > > Exactly. We have the same need for nested bridges and devices behind > them. > And now you are talking about topology, not guest assigned bus numbers. So suddenly you are on my side? I don't even get what are you arguing about at this point. -- Gleb.
On Mon, Nov 22, 2010 at 04:58:11PM +0200, Gleb Natapov wrote: > On Mon, Nov 22, 2010 at 04:56:16PM +0200, Michael S. Tsirkin wrote: > > On Mon, Nov 22, 2010 at 04:52:32PM +0200, Gleb Natapov wrote: > > > On Mon, Nov 22, 2010 at 04:50:29PM +0200, Michael S. Tsirkin wrote: > > > > On Mon, Nov 22, 2010 at 03:04:51PM +0200, Gleb Natapov wrote: > > > > > On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote: > > > > > > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote: > > > > > > > > > > It's probably required to make them stable anyway. > > > > > > > > > > > > > > > > > > > Why? > > > > > > > > > > > > > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge. > > > > > > > > > > > > > > > Why should qemu care? > > > > > > > > > > > > Stable bus numbering is a feature *users* care about, because > > > > > > some Guest OSes get confused when a card gets moved to another > > > > > > bus. > > > > > > > > > > > So if user cares about it it should not change HW configuration of QEMU. > > > > > I guess those OSes knows how to handle hot-pluggable equipment otherwise > > > > > they will get confused on real HW too. Why QEMU should care to preserve > > > > > something in a face of configuration change? > > > > > > > > > > -- > > > > > Gleb. > > > > > > > > We've been there, weren't we? See > > > > http://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses#KVM_Stable_PCI_Addresses > > > > > > > This is about stable HW configuration. > > > > Exactly. We have the same need for nested bridges and devices behind > > them. > > > And now you are talking about topology, not guest assigned bus numbers. I suspect that if bus numbers change, it has the same effect as topology change on the guest. Need to test though, I'm not sure. > So suddenly you are on my side? I don't even get what are you arguing > about at this point. By this time, I forgot, too :). > -- > Gleb.
On Mon, Nov 22, 2010 at 06:41:28PM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 22, 2010 at 04:58:11PM +0200, Gleb Natapov wrote: > > On Mon, Nov 22, 2010 at 04:56:16PM +0200, Michael S. Tsirkin wrote: > > > On Mon, Nov 22, 2010 at 04:52:32PM +0200, Gleb Natapov wrote: > > > > On Mon, Nov 22, 2010 at 04:50:29PM +0200, Michael S. Tsirkin wrote: > > > > > On Mon, Nov 22, 2010 at 03:04:51PM +0200, Gleb Natapov wrote: > > > > > > On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote: > > > > > > > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote: > > > > > > > > > > > It's probably required to make them stable anyway. > > > > > > > > > > > > > > > > > > > > > Why? > > > > > > > > > > > > > > > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge. > > > > > > > > > > > > > > > > > Why should qemu care? > > > > > > > > > > > > > > Stable bus numbering is a feature *users* care about, because > > > > > > > some Guest OSes get confused when a card gets moved to another > > > > > > > bus. > > > > > > > > > > > > > So if user cares about it it should not change HW configuration of QEMU. > > > > > > I guess those OSes knows how to handle hot-pluggable equipment otherwise > > > > > > they will get confused on real HW too. Why QEMU should care to preserve > > > > > > something in a face of configuration change? > > > > > > > > > > > > -- > > > > > > Gleb. > > > > > > > > > > We've been there, weren't we? See > > > > > http://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses#KVM_Stable_PCI_Addresses > > > > > > > > > This is about stable HW configuration. > > > > > > Exactly. We have the same need for nested bridges and devices behind > > > them. > > > > > And now you are talking about topology, not guest assigned bus numbers. > > I suspect that if bus numbers change, it has the same effect as topology > change on the guest. Need to test though, I'm not sure. > Hard to believe. Unplugging card with internal pci-pci bridge may change bus numbering. > > So suddenly you are on my side? I don't even get what are you arguing > > about at this point. > > By this time, I forgot, too :). > :) -- Gleb.
diff --git a/hw/pci.c b/hw/pci.c index 7d12473..fa98d94 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) static char *pcibus_get_dev_path(DeviceState *dev) { - PCIDevice *d = (PCIDevice *)dev; - char path[16]; - - snprintf(path, sizeof(path), "%04x:%02x:%02x.%x", - pci_find_domain(d->bus), pci_bus_num(d->bus), - PCI_SLOT(d->devfn), PCI_FUNC(d->devfn)); - - return strdup(path); + PCIDevice *d = container_of(dev, PCIDevice, qdev); + PCIDevice *t; + int slot_depth; + /* Path format: Domain:00:Slot:Slot....:Slot.Function. + * 00 is added here to make this format compatible with + * domain:Bus:Slot.Func for systems without nested PCI bridges. + * Slot list specifies the slot numbers for all devices on the + * path from root to the specific device. */ + int domain_len = strlen("DDDD:00"); + int func_len = strlen(".F"); + int slot_len = strlen(":SS"); + int path_len; + char *path, *p; + + /* Calculate # of slots on path between device and root. */; + slot_depth = 0; + for (t = d; t; t = t->bus->parent_dev) + ++slot_depth; + + path_len = domain_len + bus_len + slot_len * slot_depth + func_len; + + /* Allocate memory, fill in the terminating null byte. */ + path = malloc(path_len + 1 /* For '\0' */); + path[path_len] = '\0'; + + /* First field is the domain. */ + snprintf(path, domain_len, "%04x", pci_find_domain(d->bus)); + + /* Leave space for slot numbers and fill in function number. */ + p = path + domain_len + slot_len * slot_depth; + snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn)); + + /* Fill in slot numbers. We walk up from device to root, so need to print + * them in the reverse order, last to first. */ + for (t = d; t; t = t->bus->parent_dev) { + p -= slot_len; + snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn)); + } + + return path; }