diff mbox

Use firmware provided index to register a network interface

Message ID 20100922183137.GA7607@auslistsprd01.us.dell.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Narendra K Sept. 22, 2010, 6:31 p.m. UTC
Hello,

Here is another approach to address the issue of "eth0 does not always
map to the Integrated NIC Port 1 as denoted on server chassis label".
For more details please refer to the thread -
http://marc.info/?l=linux-netdev&m=128163454631618&w=3.

Patch makes use of the firmware provided index to derive ethN names.
That way the naming scheme adheres to the existing requirements of
ethN namespace and with IFNAMSIZ length.

Please find the patch here -

From: Narendra K <narendra_k@dell.com>
Subject: [PATCH] Use firmware provided index to register a network device

This patch uses the firmware provided index to derive the ethN name.
If the firmware provides an index for the corresponding pdev, the N
is derived from the index.

As an example, consider a PowerEdge R710 which has 4 BCM5709
Lan-On-Motherboard ports,1 Intel 82572EI port and 4 82575GB ports.
The system firmware communicates the order of the 4 Lan-On-Motherboard
ports by assigning indexes to each one of them. This is available to
the OS as the SMBIOS type 41 record(for onboard devices), in the field
'device type index'. It looks like below -

Handle 0x2900, DMI type 41, 11 bytes
Onboard Device
	Reference Designation: Embedded NIC 1
	Type: Ethernet
	Status: Enabled
	Type Instance: 1
	Bus Address: 0000:01:00.0

Handle 0x2901, DMI type 41, 11 bytes
Onboard Device
	Reference Designation: Embedded NIC 2
	Type: Ethernet
	Status: Enabled
	Type Instance: 2
	Bus Address: 0000:01:00.1

Handle 0x2902, DMI type 41, 11 bytes
Onboard Device
	Reference Designation: Embedded NIC 3
	Type: Ethernet
	Status: Enabled
	Type Instance: 3
	Bus Address: 0000:02:00.0

Handle 0x2903, DMI type 41, 11 bytes
Onboard Device
	Reference Designation: Embedded NIC 4
	Type: Ethernet
	Status: Enabled
	Type Instance: 4
	Bus Address: 0000:02:00.1

The OS can use this index to name the network interfaces as below.

Onboard devices -

Interface		Fwindex		Driver
Name
eth[fwindex - 1] =eth0	1		bnx2
eth[fwindex - 1] =eth1	2		bnx2
eth[fwindex - 1] =eth2	3		bnx2
eth[fwindex - 1] =eth3	4		bnx2

The add-in devices do not get any index and they will get names from
eth4 onwards.

Add-in interfaces -

eth4			e1000e
eth5			igb
eth6			igb
eth7			igb
eth8			igb

With this patch,

1. This patch adheres to the established ABI of ethN namespace with
IFNAMSIZ length and ensures that onboard network interfaces get
expected names at the first instance itself and avoids any renaming
later.

2. The 'eth0' of the OS always corresponds to the 'Gb1' as labeled on
the system chassis. There is determinism in the way Lan-On-Motherboard
ports get named.

3. The add-in devices will always be named from beyond what the
Lan-On-Motherboard names as show above. But there is no determinism
as to which add-in interface gets what ethN name.

Signed-off-by: Narendra K <narendra_k@dell.com>
---
 drivers/pci/pci-label.c   |    1 +
 drivers/pci/pci-sysfs.c   |    5 +++++
 include/linux/netdevice.h |    2 ++
 include/linux/pci.h       |    1 +
 net/core/dev.c            |   28 ++++++++++++++++++++++------
 5 files changed, 31 insertions(+), 6 deletions(-)

Comments

Greg KH Sept. 22, 2010, 7:22 p.m. UTC | #1
On Wed, Sep 22, 2010 at 01:31:38PM -0500, Narendra K wrote:
> Hello,
> 
> Here is another approach to address the issue of "eth0 does not always
> map to the Integrated NIC Port 1 as denoted on server chassis label".
> For more details please refer to the thread -
> http://marc.info/?l=linux-netdev&m=128163454631618&w=3.
> 
> Patch makes use of the firmware provided index to derive ethN names.
> That way the naming scheme adheres to the existing requirements of
> ethN namespace and with IFNAMSIZ length.

Ick, again, what's wrong with using udev for this as it is designed to?
That way no kernel changes are needed, and no one has to rely on the
BIOS getting the firmware number right (meaning it will work on all
types of systems.)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tim Small Sept. 22, 2010, 10:07 p.m. UTC | #2
Narendra K wrote:
> Hello,
>
> Here is another approach to address the issue of "eth0 does not always
> map to the Integrated NIC Port 1 as denoted on server chassis label".
> For more details please refer to the thread -
> http://marc.info/?l=linux-netdev&m=128163454631618&w=3.
>   


Hi,

Out of interest, that link says that doing it in usespace was rejected,
but doesn't give any references... I'd be interested to know why this
wasn't viable - since this seemed like the best fit at first glance -
most people will never use this, so no need to grow their kernel size
and complexity?

Cheers,

Tim.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Sept. 22, 2010, 10:16 p.m. UTC | #3
On Wed, 22 Sep 2010 23:07:53 +0100
Tim Small <tim@buttersideup.com> wrote:

> Narendra K wrote:
> > Hello,
> >
> > Here is another approach to address the issue of "eth0 does not always
> > map to the Integrated NIC Port 1 as denoted on server chassis label".
> > For more details please refer to the thread -
> > http://marc.info/?l=linux-netdev&m=128163454631618&w=3.
> >   
> 
> 
> Hi,
> 
> Out of interest, that link says that doing it in usespace was rejected,
> but doesn't give any references... I'd be interested to know why this
> wasn't viable - since this seemed like the best fit at first glance -
> most people will never use this, so no need to grow their kernel size
> and complexity?
> 

This proposal was to ad changes into every application that
knows about network names (iproute, iptables, snmp, quagga, openswan, ...)
to do aliasing at the application layer.

I rejected it as an unmanageable since it would require changes to so
many packages (many of which are more BSD focused). Also doing aliasing
would lead to security and other issues. For example, if you write a
iptables rule based on the "Embedded NIC 1" rule would it work and know
when the packet name lookup returned eth0, or what about device names
in the Quagga RIB, ...

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tim Small Sept. 23, 2010, 6:34 a.m. UTC | #4
Stephen Hemminger wrote:
>>> http://marc.info/?l=linux-netdev&m=125510301513312&w=2
>> Out of interest, that link says that doing it in usespace was rejected,
>> but doesn't give any references... I'd be interested to know why this
>> wasn't viable - since this seemed like the best fit at first glance -
>> most people will never use this, so no need to grow their kernel size
>> and complexity?
>>
>>     
>
> This proposal was to ad changes into every application that
> knows about network names (iproute, iptables, snmp, quagga, openswan, ...)
> to do aliasing at the application layer.
>   


OK, that's bonkers, but what I was refering to was the line in the
linked post which said "Achieve the above in userspace only using udev"
- which I assumed meant to do it once in a udev rename rule by adapting
/etc/udev/rules.d/70-persistent-net.rules , /lib/udev/write_net_rules 
etc. - which is what I've used to enforce this sort of convention myself
from time to time.

Tim.
Narendra K Sept. 23, 2010, 3:10 p.m. UTC | #5
> -----Original Message-----
> From: Greg KH [mailto:greg@kroah.com]
> Sent: Thursday, September 23, 2010 12:52 AM
> To: K, Narendra
> Cc: netdev@vger.kernel.org; linux-hotplug@vger.kernel.org; linux-
> pci@vger.kernel.org; Domsch, Matt; Rose, Charles; Hargrave, Jordan;
> Nijhawan, Vijay
> Subject: Re: [PATCH] Use firmware provided index to register a network
> interface
> 
> On Wed, Sep 22, 2010 at 01:31:38PM -0500, Narendra K wrote:
> > Hello,
> >
> > Here is another approach to address the issue of "eth0 does not
> always
> > map to the Integrated NIC Port 1 as denoted on server chassis
label".
> > For more details please refer to the thread -
> > http://marc.info/?l=linux-netdev&m=128163454631618&w=3.
> >
> > Patch makes use of the firmware provided index to derive ethN names.
> > That way the naming scheme adheres to the existing requirements of
> > ethN namespace and with IFNAMSIZ length.
> 
> Ick, again, what's wrong with using udev for this as it is designed
to?
> That way no kernel changes are needed, and no one has to rely on the
> BIOS getting the firmware number right (meaning it will work on all
> types of systems.)
> 

1. We tried addressing this issue using udev only and without any kernel
changes, during Nov-Dec 2008 timeframe using Biosdevname udev helper
utility. Biosdevname utility has the ability to suggest BIOS intended
name of an interface given its OS name.

/sbin/biosdevname -I eth2 - (OS name)
eth0 - (Name according to the BIOS)

KERNEL!="eth*", GOTO="biosdevname_end"
ACTION!="add",  GOTO="biosdevname_end"
NAME=="?*",     GOTO="biosdevname_end"

PROGRAM="/sbin/biosdevname --policy=all_ethN -i %k",
ENV{INTERFACE_NAME}="%c"

LABEL="biosdevname_end"

We observed that renames in the same namespace, which is ethN namespace,
resulted in interface names like eth_rename_ren. The solution was
susceptible to driver load order. Please refer to this bug report-
https://bugzilla.novell.com/show_bug.cgi?id=441079.

This solution was not favored.

2. Based on the suggestions, I am trying to see if we can make use of
the attribute 'index' that is available to udev as a result of the patch
-
http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h
=911e1c9b05a8e3559a7aa89083930700a0b9e7ee (PCI: export SMBIOS provided
firmware instance and label to sysfs).(I am not using any udev helper in
this experiment).

On a PowerEdge R805, with 4 BCM5708 Lan-On-Motherboard ports and a dual
port Intel 82576 NIC. In this system the 82576 ports become eth0 and
eth1 and the Lan-On-Motherboard ports are eth2-eth5. Below is how the
70-persistent-net.rules file looks before I modified it -

# PCI device 0x14e4:0x164c (bnx2) (custom name provided by external
tool)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:1e:4f:fc:1b:32", ATTR{type}=="1", KERNEL=="eth*",
NAME="eth3"

# PCI device 0x14e4:0x164c (bnx2) (custom name provided by external
tool)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:1e:4f:fc:1b:30", ATTR{type}=="1", KERNEL=="eth*",
NAME="eth2"

# PCI device 0x14e4:0x164c (bnx2) (custom name provided by external
tool)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:1e:4f:fc:00:40", ATTR{type}=="1", KERNEL=="eth*",
NAME="eth5"

# PCI device 0x8086:0x10c9 (igb) (custom name provided by external tool)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:1b:21:54:33:3c", ATTR{type}=="1", KERNEL=="eth*",
NAME="eth0"

# PCI device 0x8086:0x10c9 (igb) (custom name provided by external tool)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:1b:21:54:33:3d", ATTR{type}=="1", KERNEL=="eth*",
NAME="eth1"

# PCI device 0x14e4:0x164c (bnx2) (custom name provided by external
tool)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:1e:4f:fc:00:3e", ATTR{type}=="1", KERNEL=="eth*",
NAME="eth4"

I modified the rules to use the ATTR{index} as below to get expected
names -

# PCI device 0x14e4:0x164c (bnx2) (custom name provided by external
tool)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{index}=="1",
ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"

# PCI device 0x14e4:0x164c (bnx2) (custom name provided by external
tool)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{index}=="2",
ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"

# PCI device 0x14e4:0x164c (bnx2) (custom name provided by external
tool)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{index}=="3",
ATTR{type}=="1", KERNEL=="eth*", NAME="eth2"

# PCI device 0x8086:0x10c9 (igb) (custom name provided by external tool)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:1b:21:54:33:3c", ATTR{type}=="1", KERNEL=="eth*",
NAME="eth4"

# PCI device 0x8086:0x10c9 (igb) (custom name provided by external tool)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:1b:21:54:33:3d", ATTR{type}=="1", KERNEL=="eth*",
NAME="eth5"

# PCI device 0x14e4:0x164c (bnx2) (custom name provided by external
tool)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{index}=="4",
ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"

I observe that the Lan-On-Motherboard ports get named beyond the add-in
interfaces. That is, in the original file the add-in interfaces were
eth4 and eth5. They remain same as expected. But Lan-On-Motherboard
ports get eth6,eth7,eth8,eth9. Dmesg shows the renames that are done 

igb 0000:21:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
udev: renamed network interface eth1 to eth5
udev: renamed network interface eth0 to eth4

bnx2 0000:0a:00.0: eth3: Broadcom NetXtreme II BCM5708 1000Base-T (B2)
PCI-X 64-bit 133MHz found at mem ec000000, IRQ 19, node addr
00:1e:4f:fc:00:40
udev: renamed network interface eth0 to eth6
udev: renamed network interface eth2 to eth7
udev: renamed network interface eth3 to eth8
udev: renamed network interface eth1 to eth9

I am trying to figure out what might be going on.

With regards,
Narendra K





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Narendra K Sept. 23, 2010, 3:13 p.m. UTC | #6
> -----Original Message-----
> From: Tim Small [mailto:tim@seoss.co.uk]
> Sent: Thursday, September 23, 2010 12:04 PM
> To: Stephen Hemminger
> Cc: K, Narendra; netdev@vger.kernel.org;
linux-hotplug@vger.kernel.org;
> linux-pci@vger.kernel.org; Domsch, Matt; Rose, Charles; Hargrave,
> Jordan; Nijhawan, Vijay
> Subject: Re: [PATCH] Use firmware provided index to register a network
> interface
> 
> Stephen Hemminger wrote:
> >>> http://marc.info/?l=linux-netdev&m=125510301513312&w=2
> >> Out of interest, that link says that doing it in usespace was
> rejected,
> >> but doesn't give any references... I'd be interested to know why
> this
> >> wasn't viable - since this seemed like the best fit at first glance
> -
> >> most people will never use this, so no need to grow their kernel
> size
> >> and complexity?
> >>
> >>
> >
> > This proposal was to ad changes into every application that
> > knows about network names (iproute, iptables, snmp, quagga,
openswan,
> ...)
> > to do aliasing at the application layer.
> >
> 
> 
> OK, that's bonkers, but what I was referring to was the line in the
> linked post which said "Achieve the above in userspace only using
udev"
> - which I assumed meant to do it once in a udev rename rule by
adapting
> /etc/udev/rules.d/70-persistent-net.rules , /lib/udev/write_net_rules
> etc. - which is what I've used to enforce this sort of convention
> myself
> from time to time.
> 

Hi,

I was referring to the solution proposed in this thread -
http://marc.info/?l=linux-netdev&m=125619338904322&w=3 ([PATCH] udev:
create empty regular files to represent net). 

With regards,
Narendra K

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Sept. 23, 2010, 3:27 p.m. UTC | #7
On Thu, Sep 23, 2010 at 08:40:44PM +0530, Narendra_K@Dell.com wrote:
> > -----Original Message-----
> > From: Greg KH [mailto:greg@kroah.com]
> > Sent: Thursday, September 23, 2010 12:52 AM
> > To: K, Narendra
> > Cc: netdev@vger.kernel.org; linux-hotplug@vger.kernel.org; linux-
> > pci@vger.kernel.org; Domsch, Matt; Rose, Charles; Hargrave, Jordan;
> > Nijhawan, Vijay
> > Subject: Re: [PATCH] Use firmware provided index to register a network
> > interface
> > 
> > On Wed, Sep 22, 2010 at 01:31:38PM -0500, Narendra K wrote:
> > > Hello,
> > >
> > > Here is another approach to address the issue of "eth0 does not
> > always
> > > map to the Integrated NIC Port 1 as denoted on server chassis
> label".
> > > For more details please refer to the thread -
> > > http://marc.info/?l=linux-netdev&m=128163454631618&w=3.
> > >
> > > Patch makes use of the firmware provided index to derive ethN names.
> > > That way the naming scheme adheres to the existing requirements of
> > > ethN namespace and with IFNAMSIZ length.
> > 
> > Ick, again, what's wrong with using udev for this as it is designed
> to?
> > That way no kernel changes are needed, and no one has to rely on the
> > BIOS getting the firmware number right (meaning it will work on all
> > types of systems.)
> > 
> 
> 1. We tried addressing this issue using udev only and without any kernel
> changes, during Nov-Dec 2008 timeframe using Biosdevname udev helper
> utility. Biosdevname utility has the ability to suggest BIOS intended
> name of an interface given its OS name.
> 
> /sbin/biosdevname -I eth2 - (OS name)
> eth0 - (Name according to the BIOS)
> 
> KERNEL!="eth*", GOTO="biosdevname_end"
> ACTION!="add",  GOTO="biosdevname_end"
> NAME=="?*",     GOTO="biosdevname_end"
> 
> PROGRAM="/sbin/biosdevname --policy=all_ethN -i %k",
> ENV{INTERFACE_NAME}="%c"
> 
> LABEL="biosdevname_end"
> 
> We observed that renames in the same namespace, which is ethN namespace,
> resulted in interface names like eth_rename_ren. The solution was
> susceptible to driver load order. Please refer to this bug report-
> https://bugzilla.novell.com/show_bug.cgi?id=441079.
> 
> This solution was not favored.

That's because for some reason you don't want to accept the fact that if
you want to rename things in a persistant manner, you have to do so in a
namespace that is outside of the kernel's normal one.

Now trying to change the kernel namespace itself seems like a bad hack
around this fact.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Narendra K Sept. 23, 2010, 3:50 p.m. UTC | #8
> -----Original Message-----
> From: Greg KH [mailto:greg@kroah.com]
> Sent: Thursday, September 23, 2010 8:58 PM
> To: K, Narendra
> Cc: netdev@vger.kernel.org; linux-hotplug@vger.kernel.org; linux-
> pci@vger.kernel.org; Domsch, Matt; Rose, Charles; Hargrave, Jordan;
> Nijhawan, Vijay
> Subject: Re: [PATCH] Use firmware provided index to register a network
> interface
> 
> > >
> > > Ick, again, what's wrong with using udev for this as it is
designed
> > to?
> > > That way no kernel changes are needed, and no one has to rely on
> the
> > > BIOS getting the firmware number right (meaning it will work on
all
> > > types of systems.)
> > >
> >
> > 1. We tried addressing this issue using udev only and without any
> kernel
> > changes, during Nov-Dec 2008 timeframe using Biosdevname udev helper
> > utility. Biosdevname utility has the ability to suggest BIOS
intended
> > name of an interface given its OS name.
> >
> > /sbin/biosdevname -I eth2 - (OS name)
> > eth0 - (Name according to the BIOS)
> >
> > KERNEL!="eth*", GOTO="biosdevname_end"
> > ACTION!="add",  GOTO="biosdevname_end"
> > NAME=="?*",     GOTO="biosdevname_end"
> >
> > PROGRAM="/sbin/biosdevname --policy=all_ethN -i %k",
> > ENV{INTERFACE_NAME}="%c"
> >
> > LABEL="biosdevname_end"
> >
> > We observed that renames in the same namespace, which is ethN
> namespace,
> > resulted in interface names like eth_rename_ren. The solution was
> > susceptible to driver load order. Please refer to this bug report-
> > https://bugzilla.novell.com/show_bug.cgi?id=441079.
> >
> > This solution was not favored.
> 
> That's because for some reason you don't want to accept the fact that
> if
> you want to rename things in a persistant manner, you have to do so in
> a
> namespace that is outside of the kernel's normal one.
> 
> Now trying to change the kernel namespace itself seems like a bad hack
> around this fact.
> 

The patch does not change the existing kernel name space. It adheres to
the existing ethN name space with IFNAMSIZ length requirements. The
patch only makes sure that eth0 always corresponds to what is labeled as
'Gb1' on server chassis, on systems where SMBIOS type 41 record is
available. And removes the need for any renames for the interfaces which
have a firmware index assigned by system firmware, as the very first
name assigned by the kernel will be as expected and deterministic.

With regards,
Narendra K






--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Sept. 23, 2010, 4:33 p.m. UTC | #9
On Thu, Sep 23, 2010 at 09:20:57PM +0530, Narendra_K@Dell.com wrote:
> > Now trying to change the kernel namespace itself seems like a bad hack
> > around this fact.
> > 
> 
> The patch does not change the existing kernel name space.

You are "reordering it", right?

> It adheres to the existing ethN name space with IFNAMSIZ length
> requirements. The patch only makes sure that eth0 always corresponds
> to what is labeled as 'Gb1' on server chassis, on systems where SMBIOS
> type 41 record is available. And removes the need for any renames for
> the interfaces which have a firmware index assigned by system
> firmware, as the very first name assigned by the kernel will be as
> expected and deterministic.

And on systems with buggy firmware, what is going to happen here?  And
yes, there will be buggy BIOS tables, we can guarantee that, as this is
not something that Windows supports, right?

You are also complicating the logic for 99% of the world that will never
use this feature.  But I'll leave that up to the netdev developers to
decide, it's their code to maintain over time, not mine :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Narendra K Oct. 7, 2010, 2:14 p.m. UTC | #10
On Thu, Sep 23, 2010 at 10:03:36PM +0530, Greg KH wrote:
>    On Thu, Sep 23, 2010 at 09:20:57PM +0530, Narendra_K@Dell.com wrote:
>    > > Now trying to change the kernel namespace itself seems like a bad hack
>    > > around this fact.
>    > >
>    >
>    > The patch does not change the existing kernel name space.
> 
>    You are "reordering it", right?
> 
>    > It adheres to the existing ethN name space with IFNAMSIZ length
>    > requirements. The patch only makes sure that eth0 always corresponds
>    > to what is labeled as 'Gb1' on server chassis, on systems where SMBIOS
>    > type 41 record is available. And removes the need for any renames for
>    > the interfaces which have a firmware index assigned by system
>    > firmware, as the very first name assigned by the kernel will be as
>    > expected and deterministic.
> 
>    And on systems with buggy firmware, what is going to happen here?  And
>    yes, there will be buggy BIOS tables, we can guarantee that, as this is
>    not something that Windows supports, right?
> 

It took some time to find out the details asked above. Right, windows
does not use SMBIOS type 41 record to derive names. But as a datapoint,
windows also has the same problem.

Yes, firmware and BIOS tables can be buggy. How about a command line
parameter 'no_netfwindex', passing which firmware index will not be
used to derive ethN names ? That would handle the scenario of buggy
firmware and names will be derived in the now existing way.

I will submit a patch shortly implementing this.
Tim Small Oct. 7, 2010, 2:27 p.m. UTC | #11
On 07/10/10 15:14, Narendra_K@Dell.com wrote:
> Yes, firmware and BIOS tables can be buggy. How about a command line
> parameter 'no_netfwindex', passing which firmware index will not be
> used to derive ethN names ? That would handle the scenario of buggy
> firmware and names will be derived in the now existing way.
>
> I will submit a patch shortly implementing this.
>    

What was the reason for not doing this in user space again?  You stated 
that you got races when doing renames like eth0 -> eth2, but the 
solution looked like renaming into a different name space such as eth0 
-> ethlom2  etc. so that it wouldn't race with the names handed out by 
the kernel?

Tim.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Oct. 7, 2010, 2:31 p.m. UTC | #12
On Thu, Oct 07, 2010 at 07:44:57PM +0530, Narendra_K@Dell.com wrote:
> On Thu, Sep 23, 2010 at 10:03:36PM +0530, Greg KH wrote:
> >    On Thu, Sep 23, 2010 at 09:20:57PM +0530, Narendra_K@Dell.com wrote:
> >    > > Now trying to change the kernel namespace itself seems like a bad hack
> >    > > around this fact.
> >    > >
> >    >
> >    > The patch does not change the existing kernel name space.
> > 
> >    You are "reordering it", right?
> > 
> >    > It adheres to the existing ethN name space with IFNAMSIZ length
> >    > requirements. The patch only makes sure that eth0 always corresponds
> >    > to what is labeled as 'Gb1' on server chassis, on systems where SMBIOS
> >    > type 41 record is available. And removes the need for any renames for
> >    > the interfaces which have a firmware index assigned by system
> >    > firmware, as the very first name assigned by the kernel will be as
> >    > expected and deterministic.
> > 
> >    And on systems with buggy firmware, what is going to happen here?  And
> >    yes, there will be buggy BIOS tables, we can guarantee that, as this is
> >    not something that Windows supports, right?
> > 
> 
> It took some time to find out the details asked above. Right, windows
> does not use SMBIOS type 41 record to derive names. But as a datapoint,
> windows also has the same problem.

If windows does not use this field, then you can guarantee it will show
up incorrectly in BIOSes and never be tested by manufacturers before
they ship their machines.

> Yes, firmware and BIOS tables can be buggy. How about a command line
> parameter 'no_netfwindex', passing which firmware index will not be
> used to derive ethN names ? That would handle the scenario of buggy
> firmware and names will be derived in the now existing way.

I'd prefer the option never be added in the first place as you are
placing a big burden on people whose machines were working properly on
old kernels, to suddenly have to add a command line option to get their
box to now work again.

And again, as this is a "simple" rename type operation, I still don't
see why you can't just do this from a udev rule, especially if the
firmware information is exported to userspace.  That is where such
"policy" should be added, not within the kernel itself.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pci-label.c b/drivers/pci/pci-label.c
index 90c0a72..8086268 100644
--- a/drivers/pci/pci-label.c
+++ b/drivers/pci/pci-label.c
@@ -55,6 +55,7 @@  find_smbios_instance_string(struct pci_dev *pdev, char *buf,
 							 "%s\n",
 							 dmi->name);
 			}
+			pdev->firmware_index = donboard->instance;
 			return strlen(dmi->name);
 		}
 	}
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index b5a7d9b..448ed9d 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -28,6 +28,7 @@ 
 #include "pci.h"
 
 static int sysfs_initialized;	/* = 0 */
+int pci_netdevs_with_fwindex;
 
 /* show configuration fields */
 #define pci_config_attr(field, format_string)				\
@@ -1167,6 +1168,10 @@  int __must_check pci_create_sysfs_dev_files (struct pci_dev *pdev)
 
 	pci_create_firmware_label_files(pdev);
 
+	if (pdev->firmware_index && (pdev->class >> 16) ==
+						PCI_BASE_CLASS_NETWORK)
+		pci_netdevs_with_fwindex++;
+
 	return 0;
 
 err_vga_file:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 46c36ff..4398dcf 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1080,6 +1080,8 @@  struct net_device {
 
 #define	NETDEV_ALIGN		32
 
+extern int pci_netdevs_with_fwindex;
+
 static inline
 struct netdev_queue *netdev_get_tx_queue(const struct net_device *dev,
 					 unsigned int index)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b1d1795..90113bb 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -243,6 +243,7 @@  struct pci_dev {
 	unsigned short	subsystem_vendor;
 	unsigned short	subsystem_device;
 	unsigned int	class;		/* 3 bytes: (base,sub,prog-if) */
+	unsigned int	firmware_index; /* Firmware provided index */
 	u8		revision;	/* PCI revision, low byte of class word */
 	u8		hdr_type;	/* PCI header type (`multi' flag masked out) */
 	u8		pcie_cap;	/* PCI-E capability offset */
diff --git a/net/core/dev.c b/net/core/dev.c
index 1ae6543..b177ccc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -855,7 +855,7 @@  EXPORT_SYMBOL(dev_valid_name);
 
 /**
  *	__dev_alloc_name - allocate a name for a device
- *	@net: network namespace to allocate the device name in
+ *	@dev:  device
  *	@name: name format string
  *	@buf:  scratch buffer and result name string
  *
@@ -868,13 +868,15 @@  EXPORT_SYMBOL(dev_valid_name);
  *	Returns the number of the unit assigned or a negative errno code.
  */
 
-static int __dev_alloc_name(struct net *net, const char *name, char *buf)
+static int __dev_alloc_name(struct net_device *dev, const char *name, char *buf)
 {
 	int i = 0;
 	const char *p;
 	const int max_netdevices = 8*PAGE_SIZE;
 	unsigned long *inuse;
 	struct net_device *d;
+	struct net *net;
+	struct pci_dev *pdev;
 
 	p = strnchr(name, IFNAMSIZ-1, '%');
 	if (p) {
@@ -886,15 +888,31 @@  static int __dev_alloc_name(struct net *net, const char *name, char *buf)
 		if (p[1] != 'd' || strchr(p + 2, '%'))
 			return -EINVAL;
 
+		pdev = to_pci_dev(dev->dev.parent);
+		if (pdev && pdev->firmware_index) {
+			snprintf(buf, IFNAMSIZ, name,
+				 pdev->firmware_index - 1);
+			return pdev->firmware_index - 1;
+		}
+
 		/* Use one page as a bit array of possible slots */
 		inuse = (unsigned long *) get_zeroed_page(GFP_ATOMIC);
 		if (!inuse)
 			return -ENOMEM;
 
+		/* Reserve 0 to < pci_netdevs_with_fwindex for integrated
+		 * ports with fwindex and allocate from pci_netdevs_with_fwindex
+		 * onwards for add-in devices
+		 */
+		for (i = 0; i < pci_netdevs_with_fwindex; i++)
+			set_bit(i, inuse);
+
+		net = dev_net(dev);
+
 		for_each_netdev(net, d) {
 			if (!sscanf(d->name, name, &i))
 				continue;
-			if (i < 0 || i >= max_netdevices)
+			if (i < pci_netdevs_with_fwindex || i >= max_netdevices)
 				continue;
 
 			/*  avoid cases where sscanf is not exact inverse of printf */
@@ -936,12 +954,10 @@  static int __dev_alloc_name(struct net *net, const char *name, char *buf)
 int dev_alloc_name(struct net_device *dev, const char *name)
 {
 	char buf[IFNAMSIZ];
-	struct net *net;
 	int ret;
 
 	BUG_ON(!dev_net(dev));
-	net = dev_net(dev);
-	ret = __dev_alloc_name(net, name, buf);
+	ret = __dev_alloc_name(dev, name, buf);
 	if (ret >= 0)
 		strlcpy(dev->name, buf, IFNAMSIZ);
 	return ret;