Message ID | 20190603144329.16366-1-sameehj@amazon.com |
---|---|
Headers | show |
Series | Extending the ena driver to support new features and enhance performance | expand |
From: <sameehj@amazon.com> Date: Mon, 3 Jun 2019 17:43:18 +0300 > From: Sameeh Jubran <sameehj@amazon.com> > > This patchset introduces the following: > > * add support for changing the inline header size (max_header_size) for applications > with overlay and nested headers > * enable automatic fallback to polling mode for admin queue when interrupt is not > available or missed > * add good checksum counter for Rx ethtool statistics > * update ena.txt > * some minor code clean-up > * some performance enhancements with doorbell calculations > > Differences from V1: > > * net: ena: add handling of llq max tx burst size (1/11): > * fixed christmas tree issue > > * net: ena: ethtool: add extra properties retrieval via get_priv_flags (2/11): > * replaced snprintf with strlcpy > * dropped confusing error message > * added more details to the commit message Series applied.
On Mon, 3 Jun 2019 17:43:18 +0300, sameehj@amazon.com wrote: > * net: ena: ethtool: add extra properties retrieval via get_priv_flags (2/11): > * replaced snprintf with strlcpy > * dropped confusing error message > * added more details to the commit message I asked you to clearly state that you are using the blindly passing this info from the FW to user space. Stating that you "retrieve" it is misleading. IMHO it's a dangerous precedent, you're extending kernel's uAPI down to the proprietary FW of the device. Now we have no idea what the flags are, so we can't refactor and create common APIs among drivers, or even use the same names. We have no idea what you're exposing.
On Mon, 3 Jun 2019 22:27:28 +0000, Woodhouse, David wrote: > On Mon, 2019-06-03 at 14:32 -0700, Jakub Kicinski wrote: > > On Mon, 3 Jun 2019 17:43:18 +0300, sameehj@amazon.com wrote: > > > * net: ena: ethtool: add extra properties retrieval via get_priv_flags (2/11): > > > * replaced snprintf with strlcpy > > > * dropped confusing error message > > > * added more details to the commit message > > > > I asked you to clearly state that you are using the blindly passing > > this info from the FW to user space. Stating that you "retrieve" it > > is misleading. > > Yes, we should be very clear about that. > > > IMHO it's a dangerous precedent, you're extending kernel's uAPI down to > > the proprietary FW of the device. Now we have no idea what the flags > > are, so we can't refactor and create common APIs among drivers, or even > > use the same names. We have no idea what you're exposing. > > Yes, that should be documented very clearly too, and we should > absolutely make sure that anything that makes sense for other devices > is considered for making a common API. > > However, the deployment environment for ENA is kind of weird — we get > to upgrade the *firmware*, while guest kernels might get updated only > once a decade. So the passthrough you're objecting to, actually gives > us fairly much the only viable way to offer many users the tuning > options they need — or at least to experiment and establish whether > they *do* need them or not, and thus if we should turn them into a > "real" generic property. > > You do have a valid concern, but I think we can handle it, and I > suspect that the approach taken here does make sense. Let's just make > it explicitly clear, and also document the properties that we expect to > be exposing, for visibility. Thank you. It's generally easier to push FW updates, also to enterprises, rather than rolling out full new kernels. People are less concerned with FW updates, because the only thing those can break is the NIC. Some customers also run their own modified kernels. I'd dispute the fact that Amazon is so very special. I don't dispute that this is convenient for Amazon and your FW team. However, what's best for the vendors differs here from what's good for Open Source (the FW is proprietary) and IMHO health of Linux ecosystem in general. Any "SmartNIC" vendor has temptation of uAPI-level hand off to the firmware (including my employer), we all run pretty beefy processors inside "the NIC" after all. The device centric ethtool configuration can be implemented by just forwarding the uAPI structures as they are to the FW. I'm sure Andrew and others who would like to see Linux takes more control over PHYs etc. would not like this scenario, either. To address your specific points, let's be realistic, after initial submission it's unlikely the features will be updated in a timely manner anywhere else than perhaps some Amazon's own docs. Can you be more specific on what those tuning options are today? There are users who don't care to update their kernels for 10 years, yet they care about some minor tuning option of the NIC?
> Any "SmartNIC" vendor has temptation of uAPI-level hand off to the > firmware (including my employer), we all run pretty beefy processors > inside "the NIC" after all. The device centric ethtool configuration > can be implemented by just forwarding the uAPI structures as they are > to the FW. I'm sure Andrew and others who would like to see Linux > takes more control over PHYs etc. would not like this scenario, either. No, i would not. There are a few good examples of both firmware and open drivers being used to control the same PHY, on different boards. The PHY driver was developed by the community, and has more features than the firmware driver. And it keeps gaining features. The firmware i stuck, no updates. The community driver can be debugged, the firmware is a black box, no chance of the community fixing any bugs in it. And PHYs are commodity devices. I doubt there is any value add in the firmware for a PHY, any real IPR which makes the product better, magic sauce related to the PHY. So just save the cost of writing and maintaining firmware, export the MDIO bus, and let Linux control it. Concentrate the engineers on the interesting parts of the NIC, the Smart parts, where there can be real IPR. And i would say this is true for any NIC. Let Linux control the PHY. Andrew
Andrew, Sent from my iPhone On Jun 3, 2019, at 6:52 PM, Andrew Lunn <andrew@lunn.ch> wrote: >> Any "SmartNIC" vendor has temptation of uAPI-level hand off to the >> firmware (including my employer), we all run pretty beefy processors >> inside "the NIC" after all. The device centric ethtool configuration >> can be implemented by just forwarding the uAPI structures as they are >> to the FW. I'm sure Andrew and others who would like to see Linux >> takes more control over PHYs etc. would not like this scenario, either. > > No, i would not. There are a few good examples of both firmware and > open drivers being used to control the same PHY, on different > boards. The PHY driver was developed by the community, and has more > features than the firmware driver. And it keeps gaining features. The > firmware i stuck, no updates. The community driver can be debugged, > the firmware is a black box, no chance of the community fixing any > bugs in it. > > And PHYs are commodity devices. I doubt there is any value add in the > firmware for a PHY, any real IPR which makes the product better, magic > sauce related to the PHY. So just save the cost of writing and > maintaining firmware, export the MDIO bus, and let Linux control it. > Concentrate the engineers on the interesting parts of the NIC, the > Smart parts, where there can be real IPR. > > And i would say this is true for any NIC. Let Linux control the PHY. > > Andrew > It may be true for old GbE PHYs where it’s a discrete chip from the likes of Marvell or broadcom But at 25/50/100G, the PHy is actually part of the nic. It’s a very complex SERDES. Cloud providers like us spend enormous amount of time testing the PHY across process and voltage variations, all cable types, length and manufacturing variations, and against all switches we use. Community drivers won’t be able to validate and tune all this. Plus we would need exact same setting for Linux, including all distributions even 10year old like RHEL6, for all Windows, ESX, DPDK, FreeBSD, and support millions of different customers with different sets of Machine images. In this case, there is no practical choice by have the firmware to manage the PHY
On Tue, 2019-06-04 at 02:15 +0000, Bshara, Nafea wrote: > Andrew, > > Sent from my iPhone > > On Jun 3, 2019, at 6:52 PM, Andrew Lunn <andrew@lunn.ch> wrote: > > > > Any "SmartNIC" vendor has temptation of uAPI-level hand off to the > > > firmware (including my employer), we all run pretty beefy processors > > > inside "the NIC" after all. The device centric ethtool configuration > > > can be implemented by just forwarding the uAPI structures as they are > > > to the FW. I'm sure Andrew and others who would like to see Linux > > > takes more control over PHYs etc. would not like this scenario, either. > > > > No, i would not. There are a few good examples of both firmware and > > open drivers being used to control the same PHY, on different > > boards. The PHY driver was developed by the community, and has more > > features than the firmware driver. And it keeps gaining features. The > > firmware i stuck, no updates. The community driver can be debugged, > > the firmware is a black box, no chance of the community fixing any > > bugs in it. > > > > And PHYs are commodity devices. I doubt there is any value add in the > > firmware for a PHY, any real IPR which makes the product better, magic > > sauce related to the PHY. So just save the cost of writing and > > maintaining firmware, export the MDIO bus, and let Linux control it. > > Concentrate the engineers on the interesting parts of the NIC, the > > Smart parts, where there can be real IPR. > > > > And i would say this is true for any NIC. Let Linux control the PHY. > > > > Andrew > > > > It may be true for old GbE PHYs where it’s a discrete chip from the > likes of Marvell or broadcom > > But at 25/50/100G, the PHy is actually part of the nic. It’s a very > complex SERDES. Cloud providers like us spend enormous amount of > time testing the PHY across process and voltage variations, all cable > types, length and manufacturing variations, and against all switches > we use. Community drivers won’t be able to validate and tune all > this. > > Plus we would need exact same setting for Linux, including all > distributions even 10year old like RHEL6, for all Windows, ESX, DPDK, > FreeBSD, and support millions of different customers with different > sets of Machine images. > > In this case, there is no practical choice by have the firmware to > manage the PHY I don't quite know why we're talking about PHYs in this context. ENA is basically a virtio NIC. It has no PHY.
On Tue, 04 Jun 2019 07:57:48 +0100, David Woodhouse wrote: > On Tue, 2019-06-04 at 02:15 +0000, Bshara, Nafea wrote: > > On Jun 3, 2019, at 6:52 PM, Andrew Lunn <andrew@lunn.ch> wrote: > > > > Any "SmartNIC" vendor has temptation of uAPI-level hand off to the > > > > firmware (including my employer), we all run pretty beefy processors > > > > inside "the NIC" after all. The device centric ethtool configuration > > > > can be implemented by just forwarding the uAPI structures as they are > > > > to the FW. I'm sure Andrew and others who would like to see Linux > > > > takes more control over PHYs etc. would not like this scenario, either. > > > > > > No, i would not. There are a few good examples of both firmware and > > > open drivers being used to control the same PHY, on different > > > boards. The PHY driver was developed by the community, and has more > > > features than the firmware driver. And it keeps gaining features. The > > > firmware i stuck, no updates. The community driver can be debugged, > > > the firmware is a black box, no chance of the community fixing any > > > bugs in it. > > > > > > And PHYs are commodity devices. I doubt there is any value add in the > > > firmware for a PHY, any real IPR which makes the product better, magic > > > sauce related to the PHY. So just save the cost of writing and > > > maintaining firmware, export the MDIO bus, and let Linux control it. > > > Concentrate the engineers on the interesting parts of the NIC, the > > > Smart parts, where there can be real IPR. > > > > > > And i would say this is true for any NIC. Let Linux control the PHY. > > > > It may be true for old GbE PHYs where it’s a discrete chip from the > > likes of Marvell or broadcom > > > > But at 25/50/100G, the PHy is actually part of the nic. It’s a very > > complex SERDES. Cloud providers like us spend enormous amount of > > time testing the PHY across process and voltage variations, all cable > > types, length and manufacturing variations, and against all switches > > we use. Community drivers won’t be able to validate and tune all > > this. > > > > Plus we would need exact same setting for Linux, including all > > distributions even 10year old like RHEL6, for all Windows, ESX, DPDK, > > FreeBSD, and support millions of different customers with different > > sets of Machine images. > > > > In this case, there is no practical choice by have the firmware to > > manage the PHY > > I don't quite know why we're talking about PHYs in this context. > ENA is basically a virtio NIC. It has no PHY. I brought it up as an example, to illustrate that we'd rather see less trust and control being blindly handed over to the firmware. Would you mind answering what are the examples of very important flags you need to expose to users with 10yo kernels? Or any examples for that matter..
As of today there are no flags exposed by ENA NIC device, however, we are planning to use them in the near future. We want to provide customers with extra methods to identify (and differentiate) multiple network interfaces that can be attached to a single VM. Currently, customers can identify a specific network interface (ENI) only by MAC address, and later look up this MAC among other multiple ENIs that they have. In some cases it might not be convenient. Using these flags will let us inform the customer about ENI`s specific properties. It's not finalized, but tentatively it can look like this: primary-eni: on /* Differentiate between primary and secondary ENIs */ has-associated-efa: on /* Will specify ENA device has another associated EFA device */ > -----Original Message----- > From: Jakub Kicinski <jakub.kicinski@netronome.com> > Sent: Tuesday, June 4, 2019 8:24 PM > To: David Woodhouse <dwmw2@infradead.org> > Cc: Bshara, Nafea <nafea@amazon.com>; Andrew Lunn <andrew@lunn.ch>; > Jubran, Samih <sameehj@amazon.com>; Kiyanovski, Arthur > <akiyano@amazon.com>; Bshara, Saeed <saeedb@amazon.com>; Tzalik, > Guy <gtzalik@amazon.com>; Matushevsky, Alexander > <matua@amazon.com>; Liguori, Anthony <aliguori@amazon.com>; Saidi, Ali > <alisaidi@amazon.com>; Machulsky, Zorik <zorik@amazon.com>; > netdev@vger.kernel.org; Wilson, Matt <msw@amazon.com>; > davem@davemloft.net; Belgazal, Netanel <netanel@amazon.com>; > Herrenschmidt, Benjamin <benh@amazon.com> > Subject: Re: [PATCH V2 net 00/11] Extending the ena driver to support new > features and enhance performance > > On Tue, 04 Jun 2019 07:57:48 +0100, David Woodhouse wrote: > > On Tue, 2019-06-04 at 02:15 +0000, Bshara, Nafea wrote: > > > On Jun 3, 2019, at 6:52 PM, Andrew Lunn <andrew@lunn.ch> wrote: > > > > > Any "SmartNIC" vendor has temptation of uAPI-level hand off to > > > > > the firmware (including my employer), we all run pretty beefy > > > > > processors inside "the NIC" after all. The device centric > > > > > ethtool configuration can be implemented by just forwarding the > > > > > uAPI structures as they are to the FW. I'm sure Andrew and > > > > > others who would like to see Linux takes more control over PHYs etc. > would not like this scenario, either. > > > > > > > > No, i would not. There are a few good examples of both firmware > > > > and open drivers being used to control the same PHY, on different > > > > boards. The PHY driver was developed by the community, and has > > > > more features than the firmware driver. And it keeps gaining > > > > features. The firmware i stuck, no updates. The community driver > > > > can be debugged, the firmware is a black box, no chance of the > > > > community fixing any bugs in it. > > > > > > > > And PHYs are commodity devices. I doubt there is any value add in > > > > the firmware for a PHY, any real IPR which makes the product > > > > better, magic sauce related to the PHY. So just save the cost of > > > > writing and maintaining firmware, export the MDIO bus, and let Linux > control it. > > > > Concentrate the engineers on the interesting parts of the NIC, the > > > > Smart parts, where there can be real IPR. > > > > > > > > And i would say this is true for any NIC. Let Linux control the PHY. > > > > > > It may be true for old GbE PHYs where it’s a discrete chip from the > > > likes of Marvell or broadcom > > > > > > But at 25/50/100G, the PHy is actually part of the nic. It’s a very > > > complex SERDES. Cloud providers like us spend enormous amount of > > > time testing the PHY across process and voltage variations, all > > > cable types, length and manufacturing variations, and against all > > > switches we use. Community drivers won’t be able to validate and > > > tune all this. > > > > > > Plus we would need exact same setting for Linux, including all > > > distributions even 10year old like RHEL6, for all Windows, ESX, > > > DPDK, FreeBSD, and support millions of different customers with > > > different sets of Machine images. > > > > > > In this case, there is no practical choice by have the firmware to > > > manage the PHY > > > > I don't quite know why we're talking about PHYs in this context. > > ENA is basically a virtio NIC. It has no PHY. > > I brought it up as an example, to illustrate that we'd rather see less trust and > control being blindly handed over to the firmware. > > Would you mind answering what are the examples of very important flags > you need to expose to users with 10yo kernels? Or any examples for that > matter..
Hi Samih! Please don't top post on Linux kernel mailing lists. On Thu, 6 Jun 2019 10:23:40 +0000, Jubran, Samih wrote: > As of today there are no flags exposed by ENA NIC device, however, we > are planning to use them in the near future. We want to provide > customers with extra methods to identify (and differentiate) multiple > network interfaces that can be attached to a single VM. Currently, > customers can identify a specific network interface (ENI) only by MAC > address, and later look up this MAC among other multiple ENIs that > they have. In some cases it might not be convenient. Using these > flags will let us inform the customer about ENI`s specific > properties. Oh no :( You're using private _feature_ flags as labels or tags. > It's not finalized, but tentatively it can look like this: > primary-eni: on /* Differentiate between primary and secondary ENIs */ Did you consider using phys_port_name for this use case? If the intent is to have those interfaces bonded, even better would be to extend the net_failover module or work on user space ABI for VM failover. That might be a bigger effort, though. > has-associated-efa: on /* Will specify ENA device has another associated EFA device */ IDK how your ENA/EFA thing works, but sounds like something that should be solved in the device model.
On 6/6/19, 10:16 AM, "Jakub Kicinski" <jakub.kicinski@netronome.com> wrote: Hi Samih! Please don't top post on Linux kernel mailing lists. On Thu, 6 Jun 2019 10:23:40 +0000, Jubran, Samih wrote: > As of today there are no flags exposed by ENA NIC device, however, we > are planning to use them in the near future. We want to provide > customers with extra methods to identify (and differentiate) multiple > network interfaces that can be attached to a single VM. Currently, > customers can identify a specific network interface (ENI) only by MAC > address, and later look up this MAC among other multiple ENIs that > they have. In some cases it might not be convenient. Using these > flags will let us inform the customer about ENI`s specific > properties. Oh no :( You're using private _feature_ flags as labels or tags. > It's not finalized, but tentatively it can look like this: > primary-eni: on /* Differentiate between primary and secondary ENIs */ Did you consider using phys_port_name for this use case? If the intent is to have those interfaces bonded, even better would be to extend the net_failover module or work on user space ABI for VM failover. That might be a bigger effort, though. > has-associated-efa: on /* Will specify ENA device has another associated EFA device */ IDK how your ENA/EFA thing works, but sounds like something that should be solved in the device model. [NB] ENA is a netDev, EFA is an ib_dev. Both share the same physical network All previous attempt to make them share at device level leads to a lot of complexity at the OS level, with inter-dependencies that are pronged to error Not to mention that OS distribution that backport from mainline would backport different subset of each driver, let alone they belong to two subtrees with two different maintainers and we don’t want to be in a place where we have to coordinate releases between two subgroups As such, we selected a much safer path, that operational, development and deployment of two different drivers (netdev vs ib_dev) much easier but exposing the nic as two different Physical functions/devices Only cost we have is that we need this extra propriety to help user identify which two devices are related hence Samih's comments
On Thu, 6 Jun 2019 21:40:19 +0000, Bshara, Nafea wrote: > On 6/6/19, 10:16 AM, "Jakub Kicinski" <jakub.kicinski@netronome.com> wrote: > > Hi Samih! > > Please don't top post on Linux kernel mailing lists. > > On Thu, 6 Jun 2019 10:23:40 +0000, Jubran, Samih wrote: > > As of today there are no flags exposed by ENA NIC device, however, we > > are planning to use them in the near future. We want to provide > > customers with extra methods to identify (and differentiate) multiple > > network interfaces that can be attached to a single VM. Currently, > > customers can identify a specific network interface (ENI) only by MAC > > address, and later look up this MAC among other multiple ENIs that > > they have. In some cases it might not be convenient. Using these > > flags will let us inform the customer about ENI`s specific > > properties. > > Oh no :( You're using private _feature_ flags as labels or tags. > > > It's not finalized, but tentatively it can look like this: > > primary-eni: on /* Differentiate between primary and secondary ENIs */ > > Did you consider using phys_port_name for this use case? > > If the intent is to have those interfaces bonded, even better would > be to extend the net_failover module or work on user space ABI for VM > failover. That might be a bigger effort, though. Someone else will address this point? > > has-associated-efa: on /* Will specify ENA device has another associated EFA device */ > > IDK how your ENA/EFA thing works, but sounds like something that should > be solved in the device model. > > [NB] ENA is a netDev, EFA is an ib_dev. Both share the same physical network > All previous attempt to make them share at device level leads to a > lot of complexity at the OS level, with inter-dependencies that are > pronged to error Not to mention that OS distribution that backport > from mainline would backport different subset of each driver, let > alone they belong to two subtrees with two different maintainers and > we don’t want to be in a place where we have to coordinate releases > between two subgroups > > As such, we selected a much safer path, that operational, development and deployment of two different drivers (netdev vs ib_dev) much easier but exposing the nic as two different Physical functions/devices > > Only cost we have is that we need this extra propriety to help user identify which two devices are related hence Samih's comments I think you're arguing with a different point than the one I made :) Do you think I said they should use the same PCI device? I haven't. IIUC you are exposing a "tag" through the feature API on the netdev to indicate that there is another PCI device present on the system? What I said is if there is a relation between PCI devices the best level to expose this relation at is at the device model level. IOW have some form on "link" in sysfs, not in a random place on a netdev. Having said that, it's entirely unclear to me what the user scenario is here. You say "which two devices related", yet you only have one bit, so it can indicate that there is another device, not _which_ device is related. Information you can full well get from running lspci 🤷 Do the devices have the same PCI ID/vendor:model?
Sent from my iPhone > On Jun 6, 2019, at 3:05 PM, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > >> On Thu, 6 Jun 2019 21:40:19 +0000, Bshara, Nafea wrote: >> On 6/6/19, 10:16 AM, "Jakub Kicinski" <jakub.kicinski@netronome.com> wrote: >> >> Hi Samih! >> >> Please don't top post on Linux kernel mailing lists. >> >>> On Thu, 6 Jun 2019 10:23:40 +0000, Jubran, Samih wrote: >>> As of today there are no flags exposed by ENA NIC device, however, we >>> are planning to use them in the near future. We want to provide >>> customers with extra methods to identify (and differentiate) multiple >>> network interfaces that can be attached to a single VM. Currently, >>> customers can identify a specific network interface (ENI) only by MAC >>> address, and later look up this MAC among other multiple ENIs that >>> they have. In some cases it might not be convenient. Using these >>> flags will let us inform the customer about ENI`s specific >>> properties. >> >> Oh no :( You're using private _feature_ flags as labels or tags. >> >>> It's not finalized, but tentatively it can look like this: >>> primary-eni: on /* Differentiate between primary and secondary ENIs */ >> >> Did you consider using phys_port_name for this use case? >> >> If the intent is to have those interfaces bonded, even better would >> be to extend the net_failover module or work on user space ABI for VM >> failover. That might be a bigger effort, though. > > Someone else will address this point? > >>> has-associated-efa: on /* Will specify ENA device has another associated EFA device */ >> >> IDK how your ENA/EFA thing works, but sounds like something that should >> be solved in the device model. >> >> [NB] ENA is a netDev, EFA is an ib_dev. Both share the same physical network >> All previous attempt to make them share at device level leads to a >> lot of complexity at the OS level, with inter-dependencies that are >> pronged to error Not to mention that OS distribution that backport >> from mainline would backport different subset of each driver, let >> alone they belong to two subtrees with two different maintainers and >> we don’t want to be in a place where we have to coordinate releases >> between two subgroups >> >> As such, we selected a much safer path, that operational, development and deployment of two different drivers (netdev vs ib_dev) much easier but exposing the nic as two different Physical functions/devices >> >> Only cost we have is that we need this extra propriety to help user identify which two devices are related hence Samih's comments > > I think you're arguing with a different point than the one I made :) > Do you think I said they should use the same PCI device? I haven't. > > IIUC you are exposing a "tag" through the feature API on the netdev to > indicate that there is another PCI device present on the system? > > What I said is if there is a relation between PCI devices the best > level to expose this relation at is at the device model level. IOW > have some form on "link" in sysfs, not in a random place on a netdev. > > Having said that, it's entirely unclear to me what the user scenario is > here. You say "which two devices related", yet you only have one bit, > so it can indicate that there is another device, not _which_ device is > related. Information you can full well get from running lspci 🤷 > Do the devices have the same PCI ID/vendor:model? Different model id Will look into sysfs
On Thu, 6 Jun 2019 22:57:21 +0000, Bshara, Nafea wrote: > > Having said that, it's entirely unclear to me what the user scenario is > > here. You say "which two devices related", yet you only have one bit, > > so it can indicate that there is another device, not _which_ device is > > related. Information you can full well get from running lspci 🤷 > > Do the devices have the same PCI ID/vendor:model? > > Different model id Okay, then you know which one is which. Are there multiple ENAs but one EFA? > Will look into sysfs I still don't understand what is the problem you're trying to solve, perhaps phys_port_id is the way to go... The larger point here is that we can't guide you to the right API unless we know what you're trying to achieve. And we don't have the slightest clue of what're trying to achieve if uAPI is forwarded to the device. Honestly this is worse, and way more basic than I thought, I think 315c28d2b714 ("net: ena: ethtool: add extra properties retrieval via get_priv_flags") needs to be reverted.
Sent from my iPhone > On Jun 6, 2019, at 4:08 PM, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > > On Thu, 6 Jun 2019 22:57:21 +0000, Bshara, Nafea wrote: >>> Having said that, it's entirely unclear to me what the user scenario is >>> here. You say "which two devices related", yet you only have one bit, >>> so it can indicate that there is another device, not _which_ device is >>> related. Information you can full well get from running lspci 🤷 >>> Do the devices have the same PCI ID/vendor:model? >> >> Different model id > > Okay, then you know which one is which. Are there multiple ENAs but > one EFA? > Yes, very possible. Very common Typical use case that instances have one ena for control plane, one for internet facing , and one 100G ena that also have efa capabilities >> Will look into sysfs > > I still don't understand what is the problem you're trying to solve, > perhaps phys_port_id is the way to go... > > > The larger point here is that we can't guide you to the right API > unless we know what you're trying to achieve. And we don't have > the slightest clue of what're trying to achieve if uAPI is forwarded > to the device. > > Honestly this is worse, and way more basic than I thought, I think > 315c28d2b714 ("net: ena: ethtool: add extra properties retrieval via get_priv_flags") > needs to be reverted. Let’s not do that until we finish this discussion and explain the various use cases
On Thu, 6 Jun 2019 23:21:25 +0000, Bshara, Nafea wrote: > > On Jun 6, 2019, at 4:08 PM, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > > On Thu, 6 Jun 2019 22:57:21 +0000, Bshara, Nafea wrote: > >>> Having said that, it's entirely unclear to me what the user scenario is > >>> here. You say "which two devices related", yet you only have one bit, > >>> so it can indicate that there is another device, not _which_ device is > >>> related. Information you can full well get from running lspci 🤷 > >>> Do the devices have the same PCI ID/vendor:model? > >> > >> Different model id > > > > Okay, then you know which one is which. Are there multiple ENAs but > > one EFA? > > Yes, very possible. Very common > > Typical use case that instances have one ena for control plane, one > for internet facing , and one 100G ena that also have efa capabilities I see, and those are PCI devices.. Some form of platform data would seem like the best fit to me. There is something called: /sys/bus/pci/${dbdf}/label It seems to come from some ACPI table - DSM maybe? I think you can put whatever string you want there 🤔 > >> Will look into sysfs > > > > I still don't understand what is the problem you're trying to solve, > > perhaps phys_port_id is the way to go... > > > > > > The larger point here is that we can't guide you to the right API > > unless we know what you're trying to achieve. And we don't have > > the slightest clue of what're trying to achieve if uAPI is forwarded > > to the device. > > > > Honestly this is worse, and way more basic than I thought, I think > > 315c28d2b714 ("net: ena: ethtool: add extra properties retrieval via get_priv_flags") > > needs to be reverted. > > Let’s not do that until we finish this discussion and explain the various use cases Whatever we decide is the right API for tagging interfaces in a virtual environment, it's definitely not going to be private feature flags.
Sent from my iPhone On Jun 6, 2019, at 4:43 PM, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: >>> Okay, then you know which one is which. Are there multiple ENAs but >>> one EFA? >> >> Yes, very possible. Very common >> >> Typical use case that instances have one ena for control plane, one >> for internet facing , and one 100G ena that also have efa capabilities > > I see, and those are PCI devices.. Some form of platform data would > seem like the best fit to me. There is something called: > > /sys/bus/pci/${dbdf}/label > > It seems to come from some ACPI table - DSM maybe? I think you can put > whatever string you want there 🤔 Acpi path won’t work, much of thee interface are hot attached, using native pcie hot plug and acpi won’t be involved.
On Fri, 7 Jun 2019 01:04:14 +0000, Bshara, Nafea wrote: > On Jun 6, 2019, at 4:43 PM, Jakub Kicinski wrote: > >>> Okay, then you know which one is which. Are there multiple ENAs but > >>> one EFA? > >> > >> Yes, very possible. Very common > >> > >> Typical use case that instances have one ena for control plane, one > >> for internet facing , and one 100G ena that also have efa capabilities > > > > I see, and those are PCI devices.. Some form of platform data would > > seem like the best fit to me. There is something called: > > > > /sys/bus/pci/${dbdf}/label > > > > It seems to come from some ACPI table - DSM maybe? I think you can put > > whatever string you want there 🤔 > > Acpi path won’t work, much of thee interface are hot attached, using > native pcie hot plug and acpi won’t be involved. Perhaps hotplug break DSM, I won't pretend to be an ACPI expert. So you can find a way to stuff the label into that file from another source. There's also VPD, or custom PCI caps, but platform data generally seems like a better idea.
On Thu, 6 Jun 2019 18:14:16 -0700, Jakub Kicinski wrote: > On Fri, 7 Jun 2019 01:04:14 +0000, Bshara, Nafea wrote: > > On Jun 6, 2019, at 4:43 PM, Jakub Kicinski wrote: > > >>> Okay, then you know which one is which. Are there multiple ENAs but > > >>> one EFA? > > >> > > >> Yes, very possible. Very common > > >> > > >> Typical use case that instances have one ena for control plane, one > > >> for internet facing , and one 100G ena that also have efa capabilities > > > > > > I see, and those are PCI devices.. Some form of platform data would > > > seem like the best fit to me. There is something called: > > > > > > /sys/bus/pci/${dbdf}/label > > > > > > It seems to come from some ACPI table - DSM maybe? I think you can put > > > whatever string you want there 🤔 > > > > Acpi path won’t work, much of thee interface are hot attached, using > > native pcie hot plug and acpi won’t be involved. > > Perhaps hotplug break DSM, I won't pretend to be an ACPI expert. So you > can find a way to stuff the label into that file from another source. > There's also VPD, or custom PCI caps, but platform data generally seems > like a better idea. Jiri, do you have any thoughts about using phys_port_name for exposing topology in virtual environments. Perhaps that's the best fit on the netdev side. It's not like we want any other port name in such case, and having it automatically appended to the netdev name may be useful.
On 6/7/19, 2:27 PM, "Jakub Kicinski" <jakub.kicinski@netronome.com> wrote: On Thu, 6 Jun 2019 18:14:16 -0700, Jakub Kicinski wrote: > On Fri, 7 Jun 2019 01:04:14 +0000, Bshara, Nafea wrote: > > On Jun 6, 2019, at 4:43 PM, Jakub Kicinski wrote: > > >>> Okay, then you know which one is which. Are there multiple ENAs but > > >>> one EFA? > > >> > > >> Yes, very possible. Very common > > >> > > >> Typical use case that instances have one ena for control plane, one > > >> for internet facing , and one 100G ena that also have efa capabilities > > > > > > I see, and those are PCI devices.. Some form of platform data would > > > seem like the best fit to me. There is something called: > > > > > > /sys/bus/pci/${dbdf}/label > > > > > > It seems to come from some ACPI table - DSM maybe? I think you can put > > > whatever string you want there 🤔 > > > > Acpi path won’t work, much of thee interface are hot attached, using > > native pcie hot plug and acpi won’t be involved. > > Perhaps hotplug break DSM, I won't pretend to be an ACPI expert. So you > can find a way to stuff the label into that file from another source. > There's also VPD, or custom PCI caps, but platform data generally seems > like a better idea. Jiri, do you have any thoughts about using phys_port_name for exposing topology in virtual environments. Perhaps that's the best fit on the netdev side. It's not like we want any other port name in such case, and having it automatically appended to the netdev name may be useful. any preference for that vs /sysfs ?
On Fri, 7 Jun 2019 21:34:00 +0000, Bshara, Nafea wrote: > On 6/7/19, 2:27 PM, "Jakub Kicinski" <jakub.kicinski@netronome.com> wrote: > > On Thu, 6 Jun 2019 18:14:16 -0700, Jakub Kicinski wrote: > > On Fri, 7 Jun 2019 01:04:14 +0000, Bshara, Nafea wrote: > > > On Jun 6, 2019, at 4:43 PM, Jakub Kicinski wrote: > > > >>> Okay, then you know which one is which. Are there multiple ENAs but > > > >>> one EFA? > > > >> > > > >> Yes, very possible. Very common > > > >> > > > >> Typical use case that instances have one ena for control plane, one > > > >> for internet facing , and one 100G ena that also have efa capabilities > > > > > > > > I see, and those are PCI devices.. Some form of platform data would > > > > seem like the best fit to me. There is something called: > > > > > > > > /sys/bus/pci/${dbdf}/label > > > > > > > > It seems to come from some ACPI table - DSM maybe? I think you can put > > > > whatever string you want there 🤔 > > > > > > Acpi path won’t work, much of thee interface are hot attached, using > > > native pcie hot plug and acpi won’t be involved. > > > > Perhaps hotplug break DSM, I won't pretend to be an ACPI expert. So you > > can find a way to stuff the label into that file from another source. > > There's also VPD, or custom PCI caps, but platform data generally seems > > like a better idea. > > Jiri, do you have any thoughts about using phys_port_name for exposing > topology in virtual environments. Perhaps that's the best fit on the > netdev side. It's not like we want any other port name in such case, > and having it automatically appended to the netdev name may be useful. > > any preference for that vs /sysfs ? I think so. For the topology (control, internal, external) I feel like its a reasonable fit, and well supported by udev/systemd. We use it for appending port names of the switch today (p0, p1, p2, etc). Jiri was working on making the core kernel generate those automatically so lets give him a chance to speak up. For the ENA<>EFA link I think you may still need to resort to sysfs, unless it can be somehow implied by the topology label?
From: Sameeh Jubran <sameehj@amazon.com> This patchset introduces the following: * add support for changing the inline header size (max_header_size) for applications with overlay and nested headers * enable automatic fallback to polling mode for admin queue when interrupt is not available or missed * add good checksum counter for Rx ethtool statistics * update ena.txt * some minor code clean-up * some performance enhancements with doorbell calculations Differences from V1: * net: ena: add handling of llq max tx burst size (1/11): * fixed christmas tree issue * net: ena: ethtool: add extra properties retrieval via get_priv_flags (2/11): * replaced snprintf with strlcpy * dropped confusing error message * added more details to the commit message Arthur Kiyanovski (1): net: ena: ethtool: add extra properties retrieval via get_priv_flags Sameeh Jubran (10): net: ena: add handling of llq max tx burst size net: ena: replace free_tx/rx_ids union with single free_ids field in ena_ring net: ena: arrange ena_probe() function variables in reverse christmas tree net: ena: add newline at the end of pr_err prints net: ena: documentation: update ena.txt net: ena: allow automatic fallback to polling mode net: ena: add support for changing max_header_size in LLQ mode net: ena: optimise calculations for CQ doorbell net: ena: add good checksum counter net: ena: use dev_info_once instead of static variable .../networking/device_drivers/amazon/ena.txt | 5 +- .../net/ethernet/amazon/ena/ena_admin_defs.h | 21 +++ drivers/net/ethernet/amazon/ena/ena_com.c | 123 ++++++++++++++---- drivers/net/ethernet/amazon/ena/ena_com.h | 48 +++++++ drivers/net/ethernet/amazon/ena/ena_eth_com.c | 28 ++-- drivers/net/ethernet/amazon/ena/ena_eth_com.h | 73 +++++++++-- drivers/net/ethernet/amazon/ena/ena_ethtool.c | 78 +++++++++-- drivers/net/ethernet/amazon/ena/ena_netdev.c | 86 +++++++----- drivers/net/ethernet/amazon/ena/ena_netdev.h | 16 +-- 9 files changed, 376 insertions(+), 102 deletions(-)