Message ID | 20200626144724.224372-2-idosch@idosch.org |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | mlxsw: Add support for QSFP-DD transceiver type | expand |
> + case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_DD: > + /* Use SFF_8636 as base type. ethtool should recognize specific > + * type through the identifier value. > + */ > + modinfo->type = ETH_MODULE_SFF_8636; > + /* Verify if module EEPROM is a flat memory. In case of flat > + * memory only page 00h (0-255 bytes) can be read. Otherwise > + * upper pages 01h and 02h can also be read. Upper pages 10h > + * and 11h are currently not supported by the driver. > + */ > + if (module_info[MLXSW_REG_MCIA_EEPROM_MODULE_INFO_TYPE_ID] & > + MLXSW_REG_MCIA_EEPROM_CMIS_FLAT_MEMORY) > + modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN; > + else > + modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN; > + break; Although the upper pages 10h and 11h are not supported now, we probably think about how they would be supported, to make sure we are not going into a dead end. From ethtool qsfp.c /* * Description: * a) The register/memory layout is up to 5 128 byte pages defined by * a "pages valid" register and switched via a "page select" * register. Memory of 256 bytes can be memory mapped at a time * according to SFF 8636. * b) SFF 8636 based 640 bytes memory layout is presented for parser * * SFF 8636 based QSFP Memory Map * * 2-Wire Serial Address: 1010000x * * Lower Page 00h (128 bytes) * ====================== * | | * |Page Select Byte(127)| * ====================== * | * V * ---------------------------------------- * | | | | * V V V V * ---------- ---------- --------- ------------ * | Upper | | Upper | | Upper | | Upper | * | Page 00h | | Page 01h | | Page 02h | | Page 03h | * | | |(Optional)| |(Optional)| | (Optional) | * | | | | | | | | * | | | | | | | | * | ID | | AST | | User | | For | * | Fields | | Table | | EEPROM | | Cable | * | | | | | Data | | Assemblies | * | | | | | | | | * | | | | | | | | * ----------- ----------- ---------- -------------- * * **/ Is page 03h valid for a QSFP DD? Do we add pages 10h and 11h after page 03h, or instead of? How do we indicate to user space what pages of data have been passed to it? Andrew
> > Is page 03h valid for a QSFP DD? Do we add pages 10h and 11h after > page 03h, or instead of? How do we indicate to user space what pages > of data have been passed to it? > > Andrew From QSFP-DD CMIS Rev 4.0: "In particular, support of the Lower Memory and of Page 00h is required for all modules, including passive copper cables. These pages are therefore always implemented. Additional support for Pages 01h, 02h and bank 0 of Pages 10h and 11h is required for all paged memory modules." According to the same document, page 0x03 contains "User EEPROM (NVRs)". Byte 142, bit 2, page 0x01 indicates if the user page 0x03 was implemented. I did not find anything about page 0x02 (where the user EEPROM is stored) in the documentation for QSFP. I suppose it is always implemented? If we really want to have it so it is similar to QSFP, one could send 896 bytes (instead of 768) and just fill that portion with 0 in case it's not implemented. Note that this is just an idea, I'm not aware of best practices in cases like this. Adrian
On Fri, Jun 26, 2020 at 06:33:55PM +0100, Adrian Pop wrote: > > > > Is page 03h valid for a QSFP DD? Do we add pages 10h and 11h after > > page 03h, or instead of? How do we indicate to user space what pages > > of data have been passed to it? > > > > Andrew > > >From QSFP-DD CMIS Rev 4.0: "In particular, support of the Lower Memory > and of Page 00h is required for all modules, including passive copper > cables. These pages are therefore always implemented. Additional > support for Pages 01h, 02h and bank 0 of Pages 10h and 11h is required > for all paged memory modules." > > According to the same document, page 0x03 contains "User EEPROM > (NVRs)". Byte 142, bit 2, page 0x01 indicates if the user page 0x03 > was implemented. I did not find anything about page 0x02 (where the > user EEPROM is stored) in the documentation for QSFP. I suppose it is > always implemented? If we really want to have it so it is similar to > QSFP, one could send 896 bytes (instead of 768) and just fill that > portion with 0 in case it's not implemented. Note that this is just an > idea, I'm not aware of best practices in cases like this. It does seem common to only return a subset of the pages. This patch for example. We need some clear rules to known what the kernel has passed to user space, so we can both correctly interpret when a subset has been passed, and also ensure all drivers are doing the same thing. Currently we have: * ---------- ---------- --------- ------------ * | Upper | | Upper | | Upper | | Upper | * | Page 00h | | Page 01h | | Page 02h | | Page 03h | * | | |(Optional)| |(Optional)| | (Optional) | * | | | | | | | | * | | | | | | | | * | ID | | AST | | User | | For | * | Fields | | Table | | EEPROM | | Cable | * | | | | | Data | | Assemblies | * | | | | | | | | * | | | | | | | | * ----------- ----------- ---------- -------------- You are saying pages 00h, 01h and 02h are mandatory for QSPF-DD. Page 03h is optional, but when present, it seems to contain what is page 02h above. Since the QSPF KAPI has it, QSPF-DD KAPI should also have it. So i would suggest that pages 10h and 11h come after that. If a driver wants to pass a subset, it can, but it must always trim from the right, it cannot remove pages from the middle. Andrew
> You are saying pages 00h, 01h and 02h are mandatory for QSPF-DD. Page > 03h is optional, but when present, it seems to contain what is page > 02h above. Since the QSPF KAPI has it, QSPF-DD KAPI should also have > it. So i would suggest that pages 10h and 11h come after that. > > If a driver wants to pass a subset, it can, but it must always trim > from the right, it cannot remove pages from the middle. > > Andrew I agree with this. Basically there are two big cases: - passive copper transceivers with flat memory => just 00h will be present (both lower and higher => 256 bytes) - optical transceivers with paged memory => 00h, 01h, 02h, 10h, 11h => 768 bytes Having page 03h after 02h (so 896 bytes in total) seems like a good idea and the updates I'll have to do to my old patch are minor (updating the offset value of page 10h and 11h). When I tested my patch, I did it with both passive copper transceivers and optical transceivers and there weren't any issues. In this patch, Ido added a comment in the code stating "Upper pages 10h and 11h are currently not supported by the driver.". This won't actually be a problem! In CMIS Rev. 4, Table 8-12 Byte 85 (55h), we learn that if the value of that byte is 01h or 02h, we have a SMF or MMF interface (both optical). In the qsfp_dd_show_sig_optical_pwr function (in my patch) there is this bit: + __u8 module_type = id[QSFP_DD_MODULE_TYPE_OFFSET]; [...] + /** + * The thresholds and the high/low alarms/warnings are available + * only if an optical interface (MMF/SMF) is present (if this is + * the case, it means that 5 pages are available). + */ + if (module_type != QSFP_DD_MT_MMF && + module_type != QSFP_DD_MT_SMF && + eeprom_len != QSFP_DD_EEPROM_5PAG) + return; But Ido sets the eeprom_len to be ETH_MODULE_SFF_8472_LEN which is 512, while QSFP_DD_EEPROM_5PAG is defined as 80h * 6 = 768. So there won't be any issues of accessing non-existent values, since I return from the function that deals with the pages 10h and 11h early. When the driver will support them too everything will just work so your idea of a driver being able to pass only a subset of pages (being allowed to trim only from the right) holds. Adrian
On Fri, Jun 26, 2020 at 11:13:42PM +0100, Adrian Pop wrote: > > You are saying pages 00h, 01h and 02h are mandatory for QSPF-DD. Page > > 03h is optional, but when present, it seems to contain what is page > > 02h above. Since the QSPF KAPI has it, QSPF-DD KAPI should also have > > it. So i would suggest that pages 10h and 11h come after that. > > > > If a driver wants to pass a subset, it can, but it must always trim > > from the right, it cannot remove pages from the middle. > > > > Andrew > > I agree with this. Basically there are two big cases: > - passive copper transceivers with flat memory => just 00h will be > present (both lower and higher => 256 bytes) > - optical transceivers with paged memory => 00h, 01h, 02h, 10h, 11h => 768 bytes > Having page 03h after 02h (so 896 bytes in total) seems like a good > idea and the updates I'll have to do to my old patch are minor > (updating the offset value of page 10h and 11h). When I tested my > patch, I did it with both passive copper transceivers and optical > transceivers and there weren't any issues. Hi Adrian, Andrew, Not sure I understand... You want the kernel to always pass page 03h to user space (potentially zeroed)? Page 03h is not mandatory according to the standard and page 01h contains information if page 03h is present or not. So user space has the information it needs to determine if after page 02h we have page 03h or page 10h. Why always pass page 03h then? > > In this patch, Ido added a comment in the code stating "Upper pages > 10h and 11h are currently not supported by the driver.". This won't > actually be a problem! In CMIS Rev. 4, Table 8-12 Byte 85 (55h), we > learn that if the value of that byte is 01h or 02h, we have a SMF or > MMF interface (both optical). In the qsfp_dd_show_sig_optical_pwr > function (in my patch) there is this bit: > > + __u8 module_type = id[QSFP_DD_MODULE_TYPE_OFFSET]; > [...] > + /** > + * The thresholds and the high/low alarms/warnings are available > + * only if an optical interface (MMF/SMF) is present (if this is > + * the case, it means that 5 pages are available). > + */ > + if (module_type != QSFP_DD_MT_MMF && > + module_type != QSFP_DD_MT_SMF && > + eeprom_len != QSFP_DD_EEPROM_5PAG) > + return; > > But Ido sets the eeprom_len to be ETH_MODULE_SFF_8472_LEN which is > 512, while QSFP_DD_EEPROM_5PAG is defined as 80h * 6 = 768. So there > won't be any issues of accessing non-existent values, since I return > from the function that deals with the pages 10h and 11h early. When > the driver will support them too everything will just work so your > idea of a driver being able to pass only a subset of pages (being > allowed to trim only from the right) holds. BTW, Adrian, this is the output I get with your patch on top of current ethtool: $ ethtool -m swp1 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) Power class : 1 Max power : 0.25W Connector : 0x23 (No separable connector) Cable assembly length : 0.50km Tx CDR bypass control : Yes Rx CDR bypass control : No Tx CDR : No Rx CDR : No Transmitter technology : 0x0a (Copper cable, unequalized) Attenuation at 5GHz : 4db Attenuation at 7GHz : 5db Attenuation at 12.9GHz : 7db Attenuation at 25.8GHz : 21db Module temperature : 0.00 degrees C / 32.00 degrees F Module voltage : 0.0000 V Length (SMF) : 0.00km Length (OM5) : 0m Length (OM4) : 0m Length (OM3 50/125um) : 0m Length (OM2 50/125um) : 17m Vendor name : Mellanox Vendor OUI : 00:02:c9 Vendor PN : MCP1660-W00AE30 Vendor rev : A2 Vendor SN : MT2019VS04757 Date code : 200507 _ Revision compliance : Rev. 3.0
> > Hi Adrian, Andrew, > > Not sure I understand... You want the kernel to always pass page 03h to > user space (potentially zeroed)? Page 03h is not mandatory according to > the standard and page 01h contains information if page 03h is present or Hi Ido! Andrew was thinking of having 03h after 02h (potentially zeroed) just for the purpose of having a similar layout for QSFP-DD the same way we do for QSFP. But as you said, it is not mandatory according to the standard and I also don't know the entire codebase for ethtool and where it might be actually needed. I think Andrew can argue for its presence better than me. > not. So user space has the information it needs to determine if after > page 02h we have page 03h or page 10h. Why always pass page 03h then? > If we decide to add 03h but only sometimes, I think we will add an extra layer of complexity. Sometimes after 02h we would have 03h and sometimes 10h. In qsfp-dd.h (following the convention from qsfp.h) in my patch there are a lot of different constants defined with respect to the offset of the parent page in the memory layout and "dynamic offsets" don't sound very good, at least for me. So even if there's a way of checking in the user space which page is after 02h, a more stable memory layout works better on the long run. Adrian
On Sat, Jun 27, 2020 at 09:42:10PM +0100, Adrian Pop wrote: > > > > Hi Adrian, Andrew, > > > > Not sure I understand... You want the kernel to always pass page 03h to > > user space (potentially zeroed)? Page 03h is not mandatory according to > > the standard and page 01h contains information if page 03h is present or > > Hi Ido! > > Andrew was thinking of having 03h after 02h (potentially zeroed) just > for the purpose of having a similar layout for QSFP-DD the same way we > do for QSFP. But as you said, it is not mandatory according to the > standard and I also don't know the entire codebase for ethtool and > where it might be actually needed. I think Andrew can argue for its > presence better than me. > > > not. So user space has the information it needs to determine if after > > page 02h we have page 03h or page 10h. Why always pass page 03h then? > > > > If we decide to add 03h but only sometimes, I think we will add an > extra layer of complexity. Sometimes after 02h we would have 03h and > sometimes 10h. In qsfp-dd.h (following the convention from qsfp.h) in > my patch there are a lot of different constants defined with respect > to the offset of the parent page in the memory layout and "dynamic > offsets" don't sound very good, at least for me. So even if there's a > way of checking in the user space which page is after 02h, a more > stable memory layout works better on the long run. Adrian, Thanks for the detailed response. I don't think the kernel should pass fake pages only to make it easier for user space to parse the information. What you are describing is basic dissection and it's done all the time by wireshark / tcpdump. Anyway, even we pass a fake page 03h, page 11h can still be at a variable offset. See table 8-28 [1], bits 1-0 at offset 142 in page 01h determine the size of pages 10h and 11h: 0 - each page is 128 bytes in size 1 - each page is 256 bytes in size 2 - each page is 512 bytes in size So a completely stable layout (unless I missed something) will entail the kernel sending 1664 bytes to user space each time. This looks unnecessarily rigid to me. The people who wrote the standard obviously took into account the fact that the page layout needs to be discoverable from the data and I think we should embrace it and only pass valid information to user space. Regardless, can Andrew and you let us know if you have a problems with current patch set which only exposes pages 00h-02h? I see it's marked as "Changes Requested", so I will need to re-submit. Thanks [1] http://www.qsfp-dd.com/wp-content/uploads/2019/05/QSFP-DD-CMIS-rev4p0.pdf
Hi! Regarding multiple banks, I think that we can have a much more creative way of dealing with them (in the future). At page 76, we have "In particular, support of the Lower Memory and of Page 00h is required for all modules, including passive copper cables. These pages are therefore always implemented. Additional support for Pages 01h, 02h and bank 0 of Pages 10h and 11h is required for all paged memory modules." My old patch clearly supports only the 1st (and mandatory) bank. So the memory layout is 00h, 01h, 02h, 10h, 11h. If we will extend ethtool to work for multiple banks, we might have something like 00h, 01h, 02h, (10h, 11h | bank 0), (10h, 11h | bank 1) etc. So we can still check bits 1-0 of byte 142 from page 01h to determine how many banks we have and we can still follow the "we can trim, but only to the right" rule. Of course, this is only an idea, at the moment I don't think we can even test something like that. I'm sure that I can work something out to deal with sometimes having page 03h and sometimes not, but first I think we need to decide if we always add it or not. As I mentioned in my previous email, I think Andrew can argue for its presence better than me. Based on that, I can re-submit my old patch for ethtool. Adrian On Sun, 28 Jun 2020 at 12:56, Ido Schimmel <idosch@idosch.org> wrote: > > On Sat, Jun 27, 2020 at 09:42:10PM +0100, Adrian Pop wrote: > > > > > > Hi Adrian, Andrew, > > > > > > Not sure I understand... You want the kernel to always pass page 03h to > > > user space (potentially zeroed)? Page 03h is not mandatory according to > > > the standard and page 01h contains information if page 03h is present or > > > > Hi Ido! > > > > Andrew was thinking of having 03h after 02h (potentially zeroed) just > > for the purpose of having a similar layout for QSFP-DD the same way we > > do for QSFP. But as you said, it is not mandatory according to the > > standard and I also don't know the entire codebase for ethtool and > > where it might be actually needed. I think Andrew can argue for its > > presence better than me. > > > > > not. So user space has the information it needs to determine if after > > > page 02h we have page 03h or page 10h. Why always pass page 03h then? > > > > > > > If we decide to add 03h but only sometimes, I think we will add an > > extra layer of complexity. Sometimes after 02h we would have 03h and > > sometimes 10h. In qsfp-dd.h (following the convention from qsfp.h) in > > my patch there are a lot of different constants defined with respect > > to the offset of the parent page in the memory layout and "dynamic > > offsets" don't sound very good, at least for me. So even if there's a > > way of checking in the user space which page is after 02h, a more > > stable memory layout works better on the long run. > > Adrian, > > Thanks for the detailed response. I don't think the kernel should pass > fake pages only to make it easier for user space to parse the > information. What you are describing is basic dissection and it's done > all the time by wireshark / tcpdump. > > Anyway, even we pass a fake page 03h, page 11h can still be at a > variable offset. See table 8-28 [1], bits 1-0 at offset 142 in page 01h > determine the size of pages 10h and 11h: > > 0 - each page is 128 bytes in size > 1 - each page is 256 bytes in size > 2 - each page is 512 bytes in size > > So a completely stable layout (unless I missed something) will entail > the kernel sending 1664 bytes to user space each time. This looks > unnecessarily rigid to me. The people who wrote the standard obviously > took into account the fact that the page layout needs to be discoverable > from the data and I think we should embrace it and only pass valid > information to user space. > > Regardless, can Andrew and you let us know if you have a problems with > current patch set which only exposes pages 00h-02h? I see it's marked as > "Changes Requested", so I will need to re-submit. > > Thanks > > [1] http://www.qsfp-dd.com/wp-content/uploads/2019/05/QSFP-DD-CMIS-rev4p0.pdf
> Adrian, > > Thanks for the detailed response. I don't think the kernel should pass > fake pages only to make it easier for user space to parse the > information. What you are describing is basic dissection and it's done > all the time by wireshark / tcpdump. > > Anyway, even we pass a fake page 03h, page 11h can still be at a > variable offset. See table 8-28 [1], bits 1-0 at offset 142 in page 01h > determine the size of pages 10h and 11h: > > 0 - each page is 128 bytes in size > 1 - each page is 256 bytes in size > 2 - each page is 512 bytes in size > > So a completely stable layout (unless I missed something) will entail > the kernel sending 1664 bytes to user space each time. This looks > unnecessarily rigid to me. The people who wrote the standard obviously > took into account the fact that the page layout needs to be discoverable > from the data and I think we should embrace it and only pass valid > information to user space. O.K. That makes things more complex. And i would say, Ido is correct, we need to make use of the discoverable features. I've no practice experience with modules other than plain old SFPs, 1G. And those have all sorts of errors, even basic things like the CRC are systematically incorrect because they are not recalculated after adding the serial number. We have had people trying to submit patches to ethtool to make it ignore bits so that it dumps more information, because the manufacturer failed to set the correct bits, etc. Ido, Adrian, what is your experience with these QSFP-DD devices. Are they generally of better quality, the EEPROM can be trusted? Is there any form of compliance test. If we go down the path of using the discovery information, it means we have no way for user space to try to correct for when the information is incorrect. It cannot request specific pages. So maybe we should consider an alternative? The netlink ethtool gives us more flexibility. How about we make a new API where user space can request any pages it want, and specify the size of the page. ethtool can start out by reading page 0. That should allow it to identify the basic class of device. It can then request additional pages as needed. The nice thing about that is we don't need two parsers of the discovery information, one in user and second in kernel space. We don't need to guarantee these two parsers agree with each other, in order to correctly decode what the kernel sent to user space. And user space has the flexibility to work around known issues when manufactures get their EEPROM wrong. Andrew
On Tue, Jun 30, 2020 at 02:21:59AM +0200, Andrew Lunn wrote: > I've no practice experience with modules other than plain old SFPs, > 1G. And those have all sorts of errors, even basic things like the CRC > are systematically incorrect because they are not recalculated after > adding the serial number. We have had people trying to submit patches > to ethtool to make it ignore bits so that it dumps more information, > because the manufacturer failed to set the correct bits, etc. > > Ido, Adrian, what is your experience with these QSFP-DD devices. Are > they generally of better quality, the EEPROM can be trusted? Is there > any form of compliance test. Vadim, I know you tested with at least two different QSFP-DD modules, can you please share your experience? > > If we go down the path of using the discovery information, it means we > have no way for user space to try to correct for when the information > is incorrect. It cannot request specific pages. So maybe we should > consider an alternative? > > The netlink ethtool gives us more flexibility. How about we make a new > API where user space can request any pages it want, and specify the > size of the page. ethtool can start out by reading page 0. That should > allow it to identify the basic class of device. It can then request > additional pages as needed. Just to make sure I understand, this also means adding a new API towards drivers, right? So that they only read from HW the requested info. > The nice thing about that is we don't need two parsers of the > discovery information, one in user and second in kernel space. We > don't need to guarantee these two parsers agree with each other, in > order to correctly decode what the kernel sent to user space. And user > space has the flexibility to work around known issues when > manufactures get their EEPROM wrong. Sounds sane to me... I know that in the past Vadim had to deal with various faulty modules. Vadim, is this something we can support? What happens if user space requests a page that does not exist? For example, in the case of QSFP-DD, lets say we do not provide page 03h but user space still wants it because it believes manufacturer did not set correct bits.
> -----Original Message----- > From: Ido Schimmel <idosch@idosch.org> > Sent: Tuesday, June 30, 2020 9:00 AM > To: Andrew Lunn <andrew@lunn.ch>; Vadim Pasternak > <vadimp@mellanox.com> > Cc: Adrian Pop <popadrian1996@gmail.com>; netdev@vger.kernel.org; > davem@davemloft.net; kuba@kernel.org; Jiri Pirko <jiri@mellanox.com>; > mlxsw <mlxsw@mellanox.com>; Ido Schimmel <idosch@mellanox.com> > Subject: Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP- > DD transceivers > > On Tue, Jun 30, 2020 at 02:21:59AM +0200, Andrew Lunn wrote: > > I've no practice experience with modules other than plain old SFPs, > > 1G. And those have all sorts of errors, even basic things like the CRC > > are systematically incorrect because they are not recalculated after > > adding the serial number. We have had people trying to submit patches > > to ethtool to make it ignore bits so that it dumps more information, > > because the manufacturer failed to set the correct bits, etc. > > > > Ido, Adrian, what is your experience with these QSFP-DD devices. Are > > they generally of better quality, the EEPROM can be trusted? Is there > > any form of compliance test. > > Vadim, I know you tested with at least two different QSFP-DD modules, can > you please share your experience? > I tested two types of QSFP-DD, cooper and optical from few vendors: Innolight, SP (Source Photonics) and Mellanox customized transceivers. We don't have enough statistics. I guess in all our systems in LAB we validated about 150 - 200 cables. No one of them had wrong EEPROM. But in all Mellanox systems QSFP reading works through the firmware and firmware performs QSFP validation for stamping (some cable type are considered as untrusted and firmware put them to the black list), page checksum, power consuming criteria. > > > > If we go down the path of using the discovery information, it means we > > have no way for user space to try to correct for when the information > > is incorrect. It cannot request specific pages. So maybe we should > > consider an alternative? > > > > The netlink ethtool gives us more flexibility. How about we make a new > > API where user space can request any pages it want, and specify the > > size of the page. ethtool can start out by reading page 0. That should > > allow it to identify the basic class of device. It can then request > > additional pages as needed. > > Just to make sure I understand, this also means adding a new API towards > drivers, right? So that they only read from HW the requested info. > > > The nice thing about that is we don't need two parsers of the > > discovery information, one in user and second in kernel space. We > > don't need to guarantee these two parsers agree with each other, in > > order to correctly decode what the kernel sent to user space. And user > > space has the flexibility to work around known issues when > > manufactures get their EEPROM wrong. > > Sounds sane to me... I know that in the past Vadim had to deal with various > faulty modules. Vadim, is this something we can support? What happens if > user space requests a page that does not exist? For example, in the case of > QSFP-DD, lets say we do not provide page 03h but user space still wants it > because it believes manufacturer did not set correct bits. Regarding faulty modules, as I wrote - validation is performed by firmware and our software trust firmware. Currently user space just asks for the buffer length. I suppose in case we'll have additional API: ethtool -m <if> <page> <offset> <size> it would be possible to provide buffer only for the defined page and upto valid size. Pay attention that CMIS specification covers also others transceivers types and some of them we are going to support through ethtool, like: 19h (OSFP 8x Pluggable Transceiver) 1Ah (SFP-DD Double Density 2x Pluggable Transceiver) 1Eh (QSFP with QSFP-DD memory map) If I am not wrong 19h and 1Eh should have same layout as QSFP-DD and SFP-DD is supposed to be similar, but shorter (page 02h is reserved, page 01h contains info, which for QSFP-DD sits at page 02h).
> Sounds sane to me... I know that in the past Vadim had to deal with > various faulty modules. Vadim, is this something we can support? What > happens if user space requests a page that does not exist? For example, > in the case of QSFP-DD, lets say we do not provide page 03h but user > space still wants it because it believes manufacturer did not set > correct bits. Hi Ido I can see two scenarios. This API is retrofitted to a firmware which only supports pre-defined linear dumps. A page is requested which is not part of the dump. EOPNOTSUPP seems like a good response. The second is for a page which does not exist in the module. I would just let the i2c bus master perform the transfer. Some might return EIO, ENODEV if the SFP does not respond. Otherwise it might return 0xff from pullups, or random junk. Hopefully each page as a checksum, and hopefully the vendor actually get the checksum correct? So let userspace either deal with the error code, or whatever data is provided. The current code already to some extent handles missing data, it uses the length to determine if the page is present or not. So it is not too big a change to look error codes for individual pages. Andrew
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_env.c b/drivers/net/ethernet/mellanox/mlxsw/core_env.c index 08215fed193d..68f198afc9b0 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core_env.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core_env.c @@ -67,7 +67,8 @@ mlxsw_env_query_module_eeprom(struct mlxsw_core *mlxsw_core, int module, offset -= MLXSW_REG_MCIA_EEPROM_UP_PAGE_LENGTH * page; /* When reading upper pages 1, 2 and 3 the offset starts at * 128. Please refer to "QSFP+ Memory Map" figure in SFF-8436 - * specification for graphical depiction. + * specification and to "CMIS Module Memory Map" Figure in + * CMIS specification for graphical depiction. * MCIA register accepts buffer size <= 48. Page of size 128 * should be read by chunks of size 48, 48, 32. Align the size * of the last chunk to avoid reading after the end of the @@ -210,6 +211,22 @@ int mlxsw_env_get_module_info(struct mlxsw_core *mlxsw_core, int module, else modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN / 2; break; + case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_DD: + /* Use SFF_8636 as base type. ethtool should recognize specific + * type through the identifier value. + */ + modinfo->type = ETH_MODULE_SFF_8636; + /* Verify if module EEPROM is a flat memory. In case of flat + * memory only page 00h (0-255 bytes) can be read. Otherwise + * upper pages 01h and 02h can also be read. Upper pages 10h + * and 11h are currently not supported by the driver. + */ + if (module_info[MLXSW_REG_MCIA_EEPROM_MODULE_INFO_TYPE_ID] & + MLXSW_REG_MCIA_EEPROM_CMIS_FLAT_MEMORY) + modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN; + else + modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN; + break; default: return -EINVAL; } diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h index fcb88d4271bf..73cc7fd5020c 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/reg.h +++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h @@ -8548,6 +8548,7 @@ MLXSW_ITEM32(reg, mcia, size, 0x08, 0, 16); #define MLXSW_REG_MCIA_TH_PAGE_NUM 3 #define MLXSW_REG_MCIA_PAGE0_LO 0 #define MLXSW_REG_MCIA_TH_PAGE_OFF 0x80 +#define MLXSW_REG_MCIA_EEPROM_CMIS_FLAT_MEMORY BIT(7) enum mlxsw_reg_mcia_eeprom_module_info_rev_id { MLXSW_REG_MCIA_EEPROM_MODULE_INFO_REV_ID_UNSPC = 0x00, @@ -8566,6 +8567,7 @@ enum mlxsw_reg_mcia_eeprom_module_info_id { enum mlxsw_reg_mcia_eeprom_module_info { MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID, MLXSW_REG_MCIA_EEPROM_MODULE_INFO_REV_ID, + MLXSW_REG_MCIA_EEPROM_MODULE_INFO_TYPE_ID, MLXSW_REG_MCIA_EEPROM_MODULE_INFO_SIZE, };