mbox series

[net-next,v4,00/12] Add support for OPEN Alliance 10BASE-T1x MACPHY Serial Interface

Message ID 20240418125648.372526-1-Parthiban.Veerasooran@microchip.com
Headers show
Series Add support for OPEN Alliance 10BASE-T1x MACPHY Serial Interface | expand

Message

Parthiban Veerasooran April 18, 2024, 12:56 p.m. UTC
This patch series contain the below updates,
- Adds support for OPEN Alliance 10BASE-T1x MACPHY Serial Interface in the
  net/ethernet/oa_tc6.c.
  Link to the spec:
  -----------------
  https://opensig.org/download/document/OPEN_Alliance_10BASET1x_MAC-PHY_Serial_Interface_V1.1.pdf

- Adds driver support for Microchip LAN8650/1 Rev.B1 10BASE-T1S MACPHY
  Ethernet driver in the net/ethernet/microchip/lan865x/lan865x.c.
  Link to the product:
  --------------------
  https://www.microchip.com/en-us/product/lan8650

Testing Details:
----------------
The driver performance was tested using iperf3 in the below two setups
separately.

Setup 1:
--------
Node 0 - Raspberry Pi 4 with LAN8650 MAC-PHY 
Node 1 - Raspberry Pi 4 with EVB-LAN8670-USB USB Stick

Setup 2:
--------
Node 0 - SAMA7G54-EK with LAN8650 MAC-PHY 
Node 1 - Raspberry Pi 4 with EVB-LAN8670-USB USB Stick

Achieved maximum of 9.4 Mbps.

Some systems like Raspberry Pi 4 need performance mode enabled to get the
proper clock speed for SPI. Refer below link for more details.

https://github.com/raspberrypi/linux/issues/3381#issuecomment-1144723750

Changes:
v2:
- Removed RFC tag.
- OA TC6 framework configured in the Kconfig and Makefile to compile as a
  module.
- Kerneldoc headers added for all the API methods exposed to MAC driver.
- Odd parity calculation logic updated from the below link,
  https://elixir.bootlin.com/linux/latest/source/lib/bch.c#L348
- Control buffer memory allocation moved to the initial function.
- struct oa_tc6 implemented as an obaque structure.
- Removed kthread for handling mac-phy interrupt instead threaded irq is
  used.
- Removed interrupt implementation for soft reset handling instead of
  that polling has been implemented.
- Registers name in the defines changed according to the specification
  document.
- Registers defines are arranged in the order of offset and followed by
  register fields.
- oa_tc6_write_register() implemented for writing a single register and
  oa_tc6_write_registers() implemented for writing multiple registers.
- oa_tc6_read_register() implemented for reading a single register and
  oa_tc6_read_registers() implemented for reading multiple registers.
- Removed DRV_VERSION macro as git hash provided by ethtool.
- Moved MDIO bus registration and PHY initialization to the OA TC6 lib.
- Replaced lan865x_set/get_link_ksettings() functions with
  phy_ethtool_ksettings_set/get() functions.
- MAC-PHY's standard capability register values checked against the
  user configured values.
- Removed unnecessary parameters validity check in various places.
- Removed MAC address configuration in the lan865x_net_open() function as
  it is done in the lan865x_probe() function already.
- Moved standard registers and proprietary vendor registers to the
  respective files.
- Added proper subject prefixes for the DT bindings.
- Moved OA specific properties to a separate DT bindings and corrected the
  types & mistakes in the DT bindings.
- Inherited OA specific DT bindings to the LAN865x specific DT bindings.
- Removed sparse warnings in all the places.
- Used net_err_ratelimited() for printing the error messages.
- oa_tc6_process_rx_chunks() function and the content of oa_tc6_handler()
  function are split into small functions.
- Used proper macros provided by network layer for calculating the
  MAX_ETH_LEN.
- Return value of netif_rx() function handled properly.
- Removed unnecessary NULL initialization of skb in the
  oa_tc6_rx_eth_ready() function removed.
- Local variables declaration ordered in reverse xmas tree notation.

v3:
- Completely redesigned all the patches.
- Control and data interface patches are divided into multiple small
  patches.
- Device driver APIs added in the oa-tc6-framework.rst file.
- Code readability improved in all the patches.
- Defined macros wherever is possible.
- Changed RESETC to STATUS0_RESETC for improving the readability.
- Removed OA specific DT bindings.
- Used default configurations defined in the OA spec.
- All variables are named properly as per OA spec for more redability.
- Bigger functions are split into multiple smaller functions.
- DT binding check is done.
- Phy mask is removed in phy scanning.
- Used NET_RX_DROP to compare the rx packet submission status.
- Indentation in the Kconfig file corrected.
- Removed CONFIG_OF and CONFIG_ACPI ifdefs.
- Removed MODULE_ALIAS().

v4:
- Fixed indentation in oa-tc6-framework.rst file.
- Replaced ENODEV error code with EPROTO in the
  oa_tc6_check_ctrl_write_reply and oa_tc6_check_ctrl_read_reply()
  functions.
- Renamed oa_tc6_read_sw_reset_status() function as
  oa_tc6_read_status0().
- Changed software reset polling delay as 1ms and polling timeout as 1s.
- Implemented clause 45 registers direct access.
- Replaced ENODEV error code with ENOMEM in the
  oa_tc6_mdiobus_register() function.
- Changed transmit skbs queue size as 2.
- Added skb_linearize() function to convert contiguous packet data.
- Checked kthread_should_stop() in the oa_tc6_spi_thread_handler()
  function before proceeding for the oa_tc6_try_spi_transfer().
- Removed netdev_err() print in the oa_tc6_allocate_rx_skb() function.
- Added spi-peripheral-props reference in the dt-bindings.
- Changed the fallback order in the dt-bindings.
- Replaced netif_start_queue() with netif_wake_queue().
- Empty data transfer performed in the oa_tc6_init() function to clear
  the reset complete interrupt.
- ZARFE bit in the CONFIG0 register is set to 1 to avoid lan865x halt
  based on the recommendation in the lan865x errata.

Parthiban Veerasooran (12):
  Documentation: networking: add OPEN Alliance 10BASE-T1x MAC-PHY serial
    interface
  net: ethernet: oa_tc6: implement register write operation
  net: ethernet: oa_tc6: implement register read operation
  net: ethernet: oa_tc6: implement software reset
  net: ethernet: oa_tc6: implement error interrupts unmasking
  net: ethernet: oa_tc6: implement internal PHY initialization
  net: ethernet: oa_tc6: enable open alliance tc6 data communication
  net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet
    frames
  net: ethernet: oa_tc6: implement receive path to receive rx ethernet
    frames
  net: ethernet: oa_tc6: implement mac-phy interrupt
  microchip: lan865x: add driver support for Microchip's LAN865X MAC-PHY
  dt-bindings: net: add Microchip's LAN865X 10BASE-T1S MACPHY

 .../bindings/net/microchip,lan865x.yaml       |   80 +
 Documentation/networking/oa-tc6-framework.rst |  491 ++++++
 MAINTAINERS                                   |   15 +
 drivers/net/ethernet/Kconfig                  |   11 +
 drivers/net/ethernet/Makefile                 |    1 +
 drivers/net/ethernet/microchip/Kconfig        |    1 +
 drivers/net/ethernet/microchip/Makefile       |    1 +
 .../net/ethernet/microchip/lan865x/Kconfig    |   19 +
 .../net/ethernet/microchip/lan865x/Makefile   |    6 +
 .../net/ethernet/microchip/lan865x/lan865x.c  |  384 +++++
 drivers/net/ethernet/oa_tc6.c                 | 1321 +++++++++++++++++
 drivers/net/phy/microchip_t1s.c               |   30 +
 include/linux/oa_tc6.h                        |   23 +
 include/uapi/linux/mdio.h                     |    1 +
 14 files changed, 2384 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/microchip,lan865x.yaml
 create mode 100644 Documentation/networking/oa-tc6-framework.rst
 create mode 100644 drivers/net/ethernet/microchip/lan865x/Kconfig
 create mode 100644 drivers/net/ethernet/microchip/lan865x/Makefile
 create mode 100644 drivers/net/ethernet/microchip/lan865x/lan865x.c
 create mode 100644 drivers/net/ethernet/oa_tc6.c
 create mode 100644 include/linux/oa_tc6.h

Comments

Andrew Lunn April 22, 2024, 8:07 p.m. UTC | #1
On Thu, Apr 18, 2024 at 06:26:36PM +0530, Parthiban Veerasooran wrote:
> This patch series contain the below updates,
> - Adds support for OPEN Alliance 10BASE-T1x MACPHY Serial Interface in the
>   net/ethernet/oa_tc6.c.
>   Link to the spec:
>   -----------------
>   https://opensig.org/download/document/OPEN_Alliance_10BASET1x_MAC-PHY_Serial_Interface_V1.1.pdf
> 
> - Adds driver support for Microchip LAN8650/1 Rev.B1 10BASE-T1S MACPHY
>   Ethernet driver in the net/ethernet/microchip/lan865x/lan865x.c.
>   Link to the product:
>   --------------------
>   https://www.microchip.com/en-us/product/lan8650

I will get around to reviewing this soon, i promise.

To the OnSemi people: Have you tried your driver/device using this
framework? It would be good to have a second vendors device using it,
just to show no vendor specific code has slipped into the framework.

Thanks
	Andrew
Andrew Lunn April 22, 2024, 8:08 p.m. UTC | #2
> Testing Details:
> ----------------
> The driver performance was tested using iperf3 in the below two setups
> separately.
> 
> Setup 1:
> --------
> Node 0 - Raspberry Pi 4 with LAN8650 MAC-PHY 
> Node 1 - Raspberry Pi 4 with EVB-LAN8670-USB USB Stick
> 
> Setup 2:
> --------
> Node 0 - SAMA7G54-EK with LAN8650 MAC-PHY 
> Node 1 - Raspberry Pi 4 with EVB-LAN8670-USB USB Stick

Would it be possible to chain these two setups together by adding two
USB dongles to one of the Ri 4s? If i remember correctly, there were
reports of issues when two devices were using the framework at once.

Thanks
	Andrew
Andrew Lunn April 22, 2024, 11:23 p.m. UTC | #3
On Mon, Apr 22, 2024 at 10:08:23PM +0200, Andrew Lunn wrote:
> > Testing Details:
> > ----------------
> > The driver performance was tested using iperf3 in the below two setups
> > separately.
> > 
> > Setup 1:
> > --------
> > Node 0 - Raspberry Pi 4 with LAN8650 MAC-PHY 
> > Node 1 - Raspberry Pi 4 with EVB-LAN8670-USB USB Stick
> > 
> > Setup 2:
> > --------
> > Node 0 - SAMA7G54-EK with LAN8650 MAC-PHY 
> > Node 1 - Raspberry Pi 4 with EVB-LAN8670-USB USB Stick
> 
> Would it be possible to chain these two setups together by adding two
> USB dongles to one of the Ri 4s? If i remember correctly, there were
> reports of issues when two devices were using the framework at once.

Sorry, that makes no sense. Your USB device is unlikely to be doing
USB->SPI->MAC-PHY. Do you have two LAN8650 MAC-PHY you can connect to
one host?

    Andrew
Andrew Lunn April 22, 2024, 11:43 p.m. UTC | #4
On Thu, Apr 18, 2024 at 06:26:36PM +0530, Parthiban Veerasooran wrote:
> This patch series contain the below updates,
> - Adds support for OPEN Alliance 10BASE-T1x MACPHY Serial Interface in the
>   net/ethernet/oa_tc6.c.
>   Link to the spec:
>   -----------------
>   https://opensig.org/download/document/OPEN_Alliance_10BASET1x_MAC-PHY_Serial_Interface_V1.1.pdf
> 
> - Adds driver support for Microchip LAN8650/1 Rev.B1 10BASE-T1S MACPHY
>   Ethernet driver in the net/ethernet/microchip/lan865x/lan865x.c.
>   Link to the product:
>   --------------------
>   https://www.microchip.com/en-us/product/lan8650

The patchset did not apply cleanly to net-next:

https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=206056&state=*&q=&archive=&delegate=

Which means it did not get any of the standard automatic testing :-(

      Andrew
Andrew Lunn April 22, 2024, 11:48 p.m. UTC | #5
> +/**
> + * oa_tc6_write_registers - function for writing multiple consecutive registers.
> + * @tc6: oa_tc6 struct.
> + * @address: address of the first register to be written in the MAC-PHY.
> + * @value: values to be written from the starting register address @address.
> + * @length: number of consecutive registers to be written from @address.
> + *
> + * Maximum of 128 consecutive registers can be written starting at @address.
> + *
> + * Returns 0 on success otherwise failed.
> + */

I think the static analyser tools are getting more picky, and what
'Return:' .

https://elixir.bootlin.com/linux/latest/source/Documentation/doc-guide/kernel-doc.rst#L86

All the examples use Return:

That document also says:

The documentation format is verified by the kernel build when it is
requested to perform extra gcc checks::

	make W=n

And if patchwork can apply your patches, it also checks for problems
like this.

    Andrew
Parthiban Veerasooran April 23, 2024, 4:38 a.m. UTC | #6
Hi Andrew,

On 23/04/24 5:18 am, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> +/**
>> + * oa_tc6_write_registers - function for writing multiple consecutive registers.
>> + * @tc6: oa_tc6 struct.
>> + * @address: address of the first register to be written in the MAC-PHY.
>> + * @value: values to be written from the starting register address @address.
>> + * @length: number of consecutive registers to be written from @address.
>> + *
>> + * Maximum of 128 consecutive registers can be written starting at @address.
>> + *
>> + * Returns 0 on success otherwise failed.
>> + */
> 
OK. In this case kernel-doc tool takes it as **Description** that's why 
didn't complaint any error when I run the below command,

$ scripts/kernel-doc -v -none drivers/net/ethernet/oa_tc6.c

drivers/net/ethernet/oa_tc6.c:313: info: Scanning doc for function 
oa_tc6_read_registers
drivers/net/ethernet/oa_tc6.c:343: info: Scanning doc for function 
oa_tc6_read_register
drivers/net/ethernet/oa_tc6.c:357: info: Scanning doc for function 
oa_tc6_write_registers
drivers/net/ethernet/oa_tc6.c:387: info: Scanning doc for function 
oa_tc6_write_register
drivers/net/ethernet/oa_tc6.c:1154: info: Scanning doc for function 
oa_tc6_start_xmit
drivers/net/ethernet/oa_tc6.c:1185: info: Scanning doc for function 
oa_tc6_init
drivers/net/ethernet/oa_tc6.c:1306: info: Scanning doc for function 
oa_tc6_exit

Got this info from below link,

https://elixir.bootlin.com/linux/latest/source/Documentation/doc-guide/kernel-doc.rst#L53

https://elixir.bootlin.com/linux/latest/source/Documentation/doc-guide/kernel-doc.rst#L81

> I think the static analyser tools are getting more picky, and what
> 'Return:' .
> 
> https://elixir.bootlin.com/linux/latest/source/Documentation/doc-guide/kernel-doc.rst#L86
> 
> All the examples use Return:
OK. Thanks for the info. I will change it and also in other APIs as well 
in the next version.

Ex:
Return: 0 on success otherwise failed.
> 
> That document also says:
> 
> The documentation format is verified by the kernel build when it is
> requested to perform extra gcc checks::
> 
>          make W=n
Unfortunately it didn't complaint anything as "Returns" line considered 
as **Description** I guess.
> 
> And if patchwork can apply your patches, it also checks for problems
> like this.
OK. If I understand this correctly the above change will resolve this.

Best regards,
Parthiban V
> 
>      Andrew
Andrew Lunn April 23, 2024, 11:14 p.m. UTC | #7
On Thu, Apr 18, 2024 at 06:26:38PM +0530, Parthiban Veerasooran wrote:
> Implement register write operation according to the control communication
> specified in the OPEN Alliance 10BASE-T1x MACPHY Serial Interface
> document. Control write commands are used by the SPI host to write
> registers within the MAC-PHY. Each control write commands are composed of
> a 32 bits control command header followed by register write data.
> 
> The MAC-PHY ignores the final 32 bits of data from the SPI host at the
> end of the control write command. The write command and data is also
> echoed from the MAC-PHY back to the SPI host to enable the SPI host to
> identify which register write failed in the case of any bus errors.
> Control write commands can write either a single register or multiple
> consecutive registers. When multiple consecutive registers are written,
> the address is automatically post-incremented by the MAC-PHY. Writing to
> any unimplemented or undefined registers shall be ignored and yield no
> effect.
> 
> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>

Apart from the Return: issues:

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew
Andrew Lunn April 23, 2024, 11:17 p.m. UTC | #8
> +static int oa_tc6_check_ctrl_read_reply(struct oa_tc6 *tc6, u8 size)
> +{
> +	u32 *tx_buf = tc6->spi_ctrl_tx_buf;
> +	u32 *rx_buf = tc6->spi_ctrl_rx_buf + OA_TC6_CTRL_IGNORED_SIZE;

Reverse christmas tree. Those two need swapping around.

	Andrew
Andrew Lunn April 23, 2024, 11:26 p.m. UTC | #9
> +static int oa_tc6_read_status0(struct oa_tc6 *tc6)
> +{
> +	u32 regval;
> +	int ret;
> +
> +	ret = oa_tc6_read_register(tc6, OA_TC6_REG_STATUS0, &regval);
> +	if (ret)
> +		return 0;

If there is an error, your throw the error code away?

It is a bit messy, since you are using this inside
readx_poll_timeout(). I would probably do a netdev_warn() or similar,
since it should not happen, and then return 0? I _think_ this is
probably the first bus transaction we do, so if it fails, knowing the
error code will help figuring out what is wrong with the SPI bus
configuration.

	Andrew
Andrew Lunn April 23, 2024, 11:27 p.m. UTC | #10
On Thu, Apr 18, 2024 at 06:26:41PM +0530, Parthiban Veerasooran wrote:
> This will unmask the following error interrupts from the MAC-PHY.
>   tx protocol error
>   rx buffer overflow error
>   loss of framing error
>   header error
> The MAC-PHY will signal an error by setting the EXST bit in the receive
> data footer which will then allow the host to read the STATUS0 register
> to find the source of the error.
> 
> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew
Andrew Lunn April 23, 2024, 11:48 p.m. UTC | #11
> +/* PHY Clause 22 and 29 registers base address and mask */
> +#define OA_TC6_PHY_STD_REG_ADDR_BASE		0xFF00
> +#define OA_TC6_PHY_STD_REG_ADDR_MASK		0x3F

Did you every find out why C29 is reference here? From the standard:

29. System considerations for multisegment 100BASE-T networks

NOTE—This clause relates to clauses that are not recommended for new
installations. This clause is not recommended for new
installations. Since March 2012, maintenance changes are no longer
being considered for this clause.

I don't think you should be referencing it in the code.

> +static int oa_tc6_mdiobus_read(struct mii_bus *bus, int addr, int regnum)
> +{
> +	struct oa_tc6 *tc6 = bus->priv;
> +	u32 regval;
> +	bool ret;
> +
> +	ret = oa_tc6_read_register(tc6, OA_TC6_PHY_STD_REG_ADDR_BASE |
> +				   (regnum & OA_TC6_PHY_STD_REG_ADDR_MASK),
> +				   &regval);
> +	if (ret)
> +		return -ENODEV;

In general, you should not replace an error code from a lower level
function with a different error code. If there is a good reason to do
this, please add a comment.

> diff --git a/drivers/net/phy/microchip_t1s.c b/drivers/net/phy/microchip_t1s.c
> index 534ca7d1b061..769a88254285 100644
> --- a/drivers/net/phy/microchip_t1s.c
> +++ b/drivers/net/phy/microchip_t1s.c
> @@ -268,6 +268,34 @@ static int lan86xx_read_status(struct phy_device *phydev)
>  	return 0;
>  }

Please put this into a new patch.

>  
> +/* OPEN Alliance 10BASE-T1x compliance MAC-PHYs will have both C22 and
> + * C45 registers space. If the PHY is discovered via C22 bus protocol it assumes
> + * it uses C22 protocol and always uses C22 registers indirect access to access
> + * C45 registers. This is because, we don't have a clean separation between
> + * C22/C45 register space and C22/C45 MDIO bus protocols. Resulting, PHY C45
> + * registers direct access can't be used which can save multiple SPI bus access.
> + * To support this feature, set .read_mmd/.write_mmd in the PHY driver to call
> + * .read_c45/.write_c45 in the OPEN Alliance framework
> + * drivers/net/ethernet/oa_tc6.c
> + */
> +static int lan865x_phy_read_mmd(struct phy_device *phydev, int devnum,
> +				u16 regnum)
> +{
> +	struct mii_bus *bus = phydev->mdio.bus;
> +	int addr = phydev->mdio.addr;
> +
> +	return bus->read_c45(bus, addr, devnum, regnum);

It is better to use __mdiobus_c45_read(). That will check you have the
lock held, won't jump through a null pointer if the bus does not
implement C45, does tracing, and increments the MDIO statistics.

	  Andrew
Andrew Lunn April 23, 2024, 11:49 p.m. UTC | #12
On Thu, Apr 18, 2024 at 06:26:43PM +0530, Parthiban Veerasooran wrote:
> Enabling Configuration Synchronization bit (SYNC) in the Configuration
> Register #0 enables data communication in the MAC-PHY. The state of this
> bit is reflected in the data footer SYNC bit.
> 
> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew
Andrew Lunn April 24, 2024, 12:02 a.m. UTC | #13
> +static int oa_tc6_process_extended_status(struct oa_tc6 *tc6)
> +{
> +	u32 value;
> +	int ret;
> +
> +	ret = oa_tc6_read_register(tc6, OA_TC6_REG_STATUS0, &value);
> +	if (ret) {
> +		netdev_err(tc6->netdev, "STATUS0 register read failed: %d\n",
> +			   ret);
> +		return -ENODEV;
> +	}
> +
> +	/* Clear the error interrupts status */
> +	ret = oa_tc6_write_register(tc6, OA_TC6_REG_STATUS0, value);
> +	if (ret) {
> +		netdev_err(tc6->netdev, "STATUS0 register write failed: %d\n",
> +			   ret);
> +		return -ENODEV;

More examples where you replace one error code with a different one.

     Andrew
Andrew Lunn April 24, 2024, 12:08 a.m. UTC | #14
> +static int oa_tc6_allocate_rx_skb(struct oa_tc6 *tc6)
> +{
> +	tc6->rx_skb = netdev_alloc_skb(tc6->netdev, tc6->netdev->mtu + ETH_HLEN +
> +				       ETH_FCS_LEN + NET_IP_ALIGN);
> +	if (!tc6->rx_skb) {
> +		tc6->netdev->stats.rx_dropped++;
> +		return -ENOMEM;
> +	}
> +	skb_reserve(tc6->rx_skb, NET_IP_ALIGN);

I think you can use netdev_alloc_skb_ip_align() here.
Andrew Lunn April 24, 2024, 12:10 a.m. UTC | #15
On Thu, Apr 18, 2024 at 06:26:46PM +0530, Parthiban Veerasooran wrote:
> The MAC-PHY interrupt is asserted when the following conditions are met.
> 
> Receive chunks available - This interrupt is asserted when the previous
> data footer had no receive data chunks available and once the receive
> data chunks become available for reading. On reception of the first data
> header this interrupt will be deasserted.
> 
> Transmit chunk credits available - This interrupt is asserted when the
> previous data footer indicated no transmit credits available and once the
> transmit credits become available for transmitting transmit data chunks.
> On reception of the first data header this interrupt will be deasserted.
> 
> Extended status event - This interrupt is asserted when the previous data
> footer indicated no extended status and once the extended event become
> available. In this case the host should read status #0 register to know
> the corresponding error/event. On reception of the first data header this
> interrupt will be deasserted.
> 
> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew
Andrew Lunn April 24, 2024, 12:27 a.m. UTC | #16
> +/* OPEN Alliance Configuration Register #0 */
> +#define OA_TC6_REG_CONFIG0		0x0004
> +#define CONFIG0_ZARFE_ENABLE		BIT(12)

If this is a standard register, you should put these defined where
other drivers can use them.

> +static int lan865x_set_mac_address(struct net_device *netdev, void *addr)
> +{
> +	struct lan865x_priv *priv = netdev_priv(netdev);
> +	struct sockaddr *address = addr;
> +	int ret;
> +
> +	ret = eth_prepare_mac_addr_change(netdev, addr);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (ether_addr_equal(address->sa_data, netdev->dev_addr))
> +		return 0;
> +
> +	ret = lan865x_set_hw_macaddr(priv, address->sa_data);
> +	if (ret)
> +		return ret;
> +
> +	eth_hw_addr_set(netdev, address->sa_data);

It seems more normal to call eth_commit_mac_addr_change(), which
better pairs with eth_prepare_mac_addr_change().

> +static int lan865x_set_zarfe(struct lan865x_priv *priv)
> +{
> +	u32 regval;
> +	int ret;
> +
> +	ret = oa_tc6_read_register(priv->tc6, OA_TC6_REG_CONFIG0, &regval);
> +	if (ret)
> +		return ret;
> +
> +	/* Set Zero-Align Receive Frame Enable */
> +	regval |= CONFIG0_ZARFE_ENABLE;
> +
> +	return oa_tc6_write_register(priv->tc6, OA_TC6_REG_CONFIG0, regval);
> +}

There does not appear to be anything specific to your device here. So
please make this a helper in the shared code, so any driver can use
it.

	Andrew
Parthiban Veerasooran April 26, 2024, 5:55 a.m. UTC | #17
Hi Andrew,

On 24/04/24 4:44 am, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Thu, Apr 18, 2024 at 06:26:38PM +0530, Parthiban Veerasooran wrote:
>> Implement register write operation according to the control communication
>> specified in the OPEN Alliance 10BASE-T1x MACPHY Serial Interface
>> document. Control write commands are used by the SPI host to write
>> registers within the MAC-PHY. Each control write commands are composed of
>> a 32 bits control command header followed by register write data.
>>
>> The MAC-PHY ignores the final 32 bits of data from the SPI host at the
>> end of the control write command. The write command and data is also
>> echoed from the MAC-PHY back to the SPI host to enable the SPI host to
>> identify which register write failed in the case of any bus errors.
>> Control write commands can write either a single register or multiple
>> consecutive registers. When multiple consecutive registers are written,
>> the address is automatically post-incremented by the MAC-PHY. Writing to
>> any unimplemented or undefined registers shall be ignored and yield no
>> effect.
>>
>> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
> 
> Apart from the Return: issues:
OK. Thanks.

Best regards,
Parthiban V
> 
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
> 
>      Andrew
>
Parthiban Veerasooran April 26, 2024, 5:56 a.m. UTC | #18
Hi Andrew,

On 24/04/24 4:47 am, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> +static int oa_tc6_check_ctrl_read_reply(struct oa_tc6 *tc6, u8 size)
>> +{
>> +     u32 *tx_buf = tc6->spi_ctrl_tx_buf;
>> +     u32 *rx_buf = tc6->spi_ctrl_rx_buf + OA_TC6_CTRL_IGNORED_SIZE;
> 
> Reverse christmas tree. Those two need swapping around.
Ah ok, somehow I missed it. Will correct it in the next version.

Best regards,
Parthiban V
> 
>          Andrew
>
Parthiban Veerasooran April 26, 2024, 6:38 a.m. UTC | #19
Hi Andrew,

On 24/04/24 4:56 am, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> +static int oa_tc6_read_status0(struct oa_tc6 *tc6)
>> +{
>> +     u32 regval;
>> +     int ret;
>> +
>> +     ret = oa_tc6_read_register(tc6, OA_TC6_REG_STATUS0, &regval);
>> +     if (ret)
>> +             return 0;
> 
> If there is an error, your throw the error code away?
> 
> It is a bit messy, since you are using this inside
> readx_poll_timeout(). I would probably do a netdev_warn() or similar,
> since it should not happen, and then return 0? I _think_ this is
> probably the first bus transaction we do, so if it fails, knowing the
> error code will help figuring out what is wrong with the SPI bus
> configuration.
OK, I will add the below print before "return 0;" in the next version.

dev_err(&tc6->spi->dev, "STATUS0 register read failed: %d\n", ret);

Best regards,
Parthiban V
> 
>          Andrew
>
Parthiban Veerasooran April 26, 2024, 1:17 p.m. UTC | #20
Hi Andrew,

On 24/04/24 5:18 am, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> +/* PHY Clause 22 and 29 registers base address and mask */
>> +#define OA_TC6_PHY_STD_REG_ADDR_BASE         0xFF00
>> +#define OA_TC6_PHY_STD_REG_ADDR_MASK         0x3F
> 
> Did you every find out why C29 is reference here? From the standard:
> 
> 29. System considerations for multisegment 100BASE-T networks
> 
> NOTE—This clause relates to clauses that are not recommended for new
> installations. This clause is not recommended for new
> installations. Since March 2012, maintenance changes are no longer
> being considered for this clause.
> 
> I don't think you should be referencing it in the code.
Thanks for this detailed explanation. The reason why I considered this 
is, in section 9.2 and table 7 of the OPEN Alliance document shows the 
mapping of standard registers within MMS 0 where clause 29 also listed 
as clause 22 extended registers from 0xFF20 to 0xFF3F.

https://opensig.org/download/document/OPEN_Alliance_10BASET1x_MAC-PHY_Serial_Interface_V1.1.pdf

But by considering the limitations explained by you above, I will change 
the mask as 0x1F to support only for clause 22 registers in the next 
version.
> 
>> +static int oa_tc6_mdiobus_read(struct mii_bus *bus, int addr, int regnum)
>> +{
>> +     struct oa_tc6 *tc6 = bus->priv;
>> +     u32 regval;
>> +     bool ret;
>> +
>> +     ret = oa_tc6_read_register(tc6, OA_TC6_PHY_STD_REG_ADDR_BASE |
>> +                                (regnum & OA_TC6_PHY_STD_REG_ADDR_MASK),
>> +                                &regval);
>> +     if (ret)
>> +             return -ENODEV;
> 
> In general, you should not replace an error code from a lower level
> function with a different error code. If there is a good reason to do
> this, please add a comment.
Ah ok, I will return "ret" in the next version.
> 
>> diff --git a/drivers/net/phy/microchip_t1s.c b/drivers/net/phy/microchip_t1s.c
>> index 534ca7d1b061..769a88254285 100644
>> --- a/drivers/net/phy/microchip_t1s.c
>> +++ b/drivers/net/phy/microchip_t1s.c
>> @@ -268,6 +268,34 @@ static int lan86xx_read_status(struct phy_device *phydev)
>>        return 0;
>>   }
> 
> Please put this into a new patch.
OK, will create as a new patch in the next version.
> 
>>
>> +/* OPEN Alliance 10BASE-T1x compliance MAC-PHYs will have both C22 and
>> + * C45 registers space. If the PHY is discovered via C22 bus protocol it assumes
>> + * it uses C22 protocol and always uses C22 registers indirect access to access
>> + * C45 registers. This is because, we don't have a clean separation between
>> + * C22/C45 register space and C22/C45 MDIO bus protocols. Resulting, PHY C45
>> + * registers direct access can't be used which can save multiple SPI bus access.
>> + * To support this feature, set .read_mmd/.write_mmd in the PHY driver to call
>> + * .read_c45/.write_c45 in the OPEN Alliance framework
>> + * drivers/net/ethernet/oa_tc6.c
>> + */
>> +static int lan865x_phy_read_mmd(struct phy_device *phydev, int devnum,
>> +                             u16 regnum)
>> +{
>> +     struct mii_bus *bus = phydev->mdio.bus;
>> +     int addr = phydev->mdio.addr;
>> +
>> +     return bus->read_c45(bus, addr, devnum, regnum);
> 
> It is better to use __mdiobus_c45_read(). That will check you have the
> lock held, won't jump through a null pointer if the bus does not
> implement C45, does tracing, and increments the MDIO statistics.
OK, sure will do it in the next version. As this is also applicable for 
__mdiobus_c45_write(), I will change this as well.

Best regards,
Parthiban V
> 
>            Andrew
>
Parthiban Veerasooran April 26, 2024, 1:19 p.m. UTC | #21
Hi Andrew,

On 24/04/24 5:32 am, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> +static int oa_tc6_process_extended_status(struct oa_tc6 *tc6)
>> +{
>> +     u32 value;
>> +     int ret;
>> +
>> +     ret = oa_tc6_read_register(tc6, OA_TC6_REG_STATUS0, &value);
>> +     if (ret) {
>> +             netdev_err(tc6->netdev, "STATUS0 register read failed: %d\n",
>> +                        ret);
>> +             return -ENODEV;
>> +     }
>> +
>> +     /* Clear the error interrupts status */
>> +     ret = oa_tc6_write_register(tc6, OA_TC6_REG_STATUS0, value);
>> +     if (ret) {
>> +             netdev_err(tc6->netdev, "STATUS0 register write failed: %d\n",
>> +                        ret);
>> +             return -ENODEV;
> 
> More examples where you replace one error code with a different one.
Yes I noted. Somehow I messed up with all the -ENODEV. I will take of 
this in all the places in the next version.

Best regards,
Parthiban V
> 
>       Andrew
>
Parthiban Veerasooran April 26, 2024, 1:32 p.m. UTC | #22
Hi Andrew,

On 24/04/24 5:57 am, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> +/* OPEN Alliance Configuration Register #0 */
>> +#define OA_TC6_REG_CONFIG0           0x0004
>> +#define CONFIG0_ZARFE_ENABLE         BIT(12)
> 
> If this is a standard register, you should put these defined where
> other drivers can use them.
OK. Will move it to oa_tc6.c in the next version as you have also 
suggested to use this as a helper function in the below comment.
> 
>> +static int lan865x_set_mac_address(struct net_device *netdev, void *addr)
>> +{
>> +     struct lan865x_priv *priv = netdev_priv(netdev);
>> +     struct sockaddr *address = addr;
>> +     int ret;
>> +
>> +     ret = eth_prepare_mac_addr_change(netdev, addr);
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     if (ether_addr_equal(address->sa_data, netdev->dev_addr))
>> +             return 0;
>> +
>> +     ret = lan865x_set_hw_macaddr(priv, address->sa_data);
>> +     if (ret)
>> +             return ret;
>> +
>> +     eth_hw_addr_set(netdev, address->sa_data);
> 
> It seems more normal to call eth_commit_mac_addr_change(), which
> better pairs with eth_prepare_mac_addr_change().
OK, so I will replace eth_hw_addr_set() with 
eth_prepare_mac_addr_change() in the next version.
> 
>> +static int lan865x_set_zarfe(struct lan865x_priv *priv)
>> +{
>> +     u32 regval;
>> +     int ret;
>> +
>> +     ret = oa_tc6_read_register(priv->tc6, OA_TC6_REG_CONFIG0, &regval);
>> +     if (ret)
>> +             return ret;
>> +
>> +     /* Set Zero-Align Receive Frame Enable */
>> +     regval |= CONFIG0_ZARFE_ENABLE;
>> +
>> +     return oa_tc6_write_register(priv->tc6, OA_TC6_REG_CONFIG0, regval);
>> +}
> 
> There does not appear to be anything specific to your device here. So
> please make this a helper in the shared code, so any driver can use
> it.
OK, I will implement this helper function as oa_tc6_enable_zarfe() in 
the oa_tc6.c. Also do you want me to move this helper function 
implementation to a new patch?

Best regards,
Parthiban V
> 
>          Andrew
>
Parthiban Veerasooran April 26, 2024, 1:45 p.m. UTC | #23
Hi Andrew,

On 24/04/24 5:38 am, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> +static int oa_tc6_allocate_rx_skb(struct oa_tc6 *tc6)
>> +{
>> +     tc6->rx_skb = netdev_alloc_skb(tc6->netdev, tc6->netdev->mtu + ETH_HLEN +
>> +                                    ETH_FCS_LEN + NET_IP_ALIGN);
>> +     if (!tc6->rx_skb) {
>> +             tc6->netdev->stats.rx_dropped++;
>> +             return -ENOMEM;
>> +     }
>> +     skb_reserve(tc6->rx_skb, NET_IP_ALIGN);
> 
> I think you can use netdev_alloc_skb_ip_align() here.
Ah OK, then do you mean we can rewrite the function 
oa_tc6_allocate_rx_skb() as below?

static int oa_tc6_allocate_rx_skb(struct oa_tc6 *tc6)
{
	tc6->rx_skb = netdev_alloc_skb_ip_align(tc6->netdev, tc6->netdev->mtu + 
ETH_HLEN + ETH_FCS_LEN);
	if (tc6->rx_skb)
		return 0;

	tc6->netdev->stats.rx_dropped++;
	return -ENOMEM;
}

Best regards,
Parthiban V
>
Andrew Lunn April 26, 2024, 6:13 p.m. UTC | #24
On Fri, Apr 26, 2024 at 01:45:20PM +0000, Parthiban.Veerasooran@microchip.com wrote:
> Hi Andrew,
> 
> On 24/04/24 5:38 am, Andrew Lunn wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > 
> >> +static int oa_tc6_allocate_rx_skb(struct oa_tc6 *tc6)
> >> +{
> >> +     tc6->rx_skb = netdev_alloc_skb(tc6->netdev, tc6->netdev->mtu + ETH_HLEN +
> >> +                                    ETH_FCS_LEN + NET_IP_ALIGN);
> >> +     if (!tc6->rx_skb) {
> >> +             tc6->netdev->stats.rx_dropped++;
> >> +             return -ENOMEM;
> >> +     }
> >> +     skb_reserve(tc6->rx_skb, NET_IP_ALIGN);
> > 
> > I think you can use netdev_alloc_skb_ip_align() here.
> Ah OK, then do you mean we can rewrite the function 
> oa_tc6_allocate_rx_skb() as below?
> 
> static int oa_tc6_allocate_rx_skb(struct oa_tc6 *tc6)
> {
> 	tc6->rx_skb = netdev_alloc_skb_ip_align(tc6->netdev, tc6->netdev->mtu + 
> ETH_HLEN + ETH_FCS_LEN);
> 	if (tc6->rx_skb)
> 		return 0;
> 
> 	tc6->netdev->stats.rx_dropped++;
> 	return -ENOMEM;
> }

Looks about right. But i did say 'I think', meaning i'm not too sure
about this.

I generally don't review code actually moving packets around. It is
what developers focus on, test heavily, and so is generally O.K. It is
the code around the edges which often needs improvements prompted by
review, ethtool, PHY handling, statistics.

	Andrew
Andrew Lunn April 26, 2024, 6:14 p.m. UTC | #25
> OK, I will implement this helper function as oa_tc6_enable_zarfe() in 
> the oa_tc6.c. Also do you want me to move this helper function 
> implementation to a new patch?

Yes please.

    Andrew
Conor Dooley April 27, 2024, 7:57 p.m. UTC | #26
On Sat, Apr 27, 2024 at 09:19:34PM +0200, Ramón Nordin Rodriguez wrote:
> Hi,
> 
> For me the mac driver fails to probe with the following log
> [    0.123325] SPI driver lan865x has no spi_device_id for microchip,lan8651
> 
> With this change the driver probes
> 
> diff --git a/drivers/net/ethernet/microchip/lan865x/lan865x.c b/drivers/net/ethernet/microchip/lan865x/lan865x.c
> index 9abefa8b9d9f..72a663f14f50 100644
> --- a/drivers/net/ethernet/microchip/lan865x/lan865x.c
> +++ b/drivers/net/ethernet/microchip/lan865x/lan865x.c
> @@ -364,7 +364,7 @@ static void lan865x_remove(struct spi_device *spi)
>  }
> 
>  static const struct of_device_id lan865x_dt_ids[] = {
> -       { .compatible = "microchip,lan8651", "microchip,lan8650" },

Huh, that's very strange. I don't see a single instance in the tree of a
of_device_id struct like this with two compatibles like this (at least
with a search of `rg "\.compatible.*\", \"" drivers/`.

Given the fallbacks in the binding, only "microchip,lan8650" actually
needs to be here.

> +       { .compatible = "microchip,lan865x", "microchip,lan8650" },
>         { /* Sentinel */ }
>  };
>  MODULE_DEVICE_TABLE(of, lan865x_dt_ids);
> 
> Along with compatible = "microchip,lan865x" in the dts

Just to be clear, the compatible w/ an x is unacceptable due to the
wildcard and the binding should stay as-is. Whatever probing bugs
the code has need to be resolved instead :)

Thanks,
Conor.
Ramón Nordin Rodriguez April 27, 2024, 8:02 p.m. UTC | #27
Could xfer.rx_buf for the data path point to the currently allocacted socket buff 
	struct spi_transfer xfer = { 0 };
	struct spi_message msg;

	if (header_type == OA_TC6_DATA_HEADER) {
		xfer.tx_buf = tc6->spi_data_tx_buf;
		xfer.rx_buf = tc6->spi_data_rx_buf;
	} else {
		xfer.tx_buf = tc6->spi_ctrl_tx_buf;
		xfer.rx_buf = tc6->spi_ctrl_rx_buf;
	}
	xfer.len = length;

To avoid an additional copy here?

> +static void oa_tc6_update_rx_skb(struct oa_tc6 *tc6, u8 *payload, u8 length)
> +{
> +	memcpy(skb_put(tc6->rx_skb, length), payload, length);
> +}
> 

R
Ramón Nordin Rodriguez April 27, 2024, 8:13 p.m. UTC | #28
On Sat, Apr 27, 2024 at 08:57:43PM +0100, Conor Dooley wrote:
> >  static const struct of_device_id lan865x_dt_ids[] = {
> > -       { .compatible = "microchip,lan8651", "microchip,lan8650" },
> 
> Huh, that's very strange. I don't see a single instance in the tree of a
> of_device_id struct like this with two compatibles like this (at least
> with a search of `rg "\.compatible.*\", \"" drivers/`.
> 
> Given the fallbacks in the binding, only "microchip,lan8650" actually
> needs to be here.
> 
> > +       { .compatible = "microchip,lan865x", "microchip,lan8650" },
> >         { /* Sentinel */ }
> >  };
> >  MODULE_DEVICE_TABLE(of, lan865x_dt_ids);
> > 
> > Along with compatible = "microchip,lan865x" in the dts
> 
> Just to be clear, the compatible w/ an x is unacceptable due to the
> wildcard and the binding should stay as-is. Whatever probing bugs
> the code has need to be resolved instead :)
> 

All right, so when I change to

@@ -364,7 +364,7 @@ static void lan865x_remove(struct spi_device *spi)
 }

 static const struct of_device_id lan865x_dt_ids[] = {
-       { .compatible = "microchip,lan8651", "microchip,lan8650" },
+       { .compatible = "microchip,lan8650" },
        { /* Sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, lan865x_dt_ids);

I still get the output
[    0.124266] SPI driver lan865x has no spi_device_id for microchip,lan8650
But the driver does probe and I get a network interface.

If no one beats me to it I'll single step the probe tomorrow.
R
Conor Dooley April 27, 2024, 8:22 p.m. UTC | #29
On Sat, Apr 27, 2024 at 10:13:33PM +0200, Ramón Nordin Rodriguez wrote:
> On Sat, Apr 27, 2024 at 08:57:43PM +0100, Conor Dooley wrote:
> > >  static const struct of_device_id lan865x_dt_ids[] = {
> > > -       { .compatible = "microchip,lan8651", "microchip,lan8650" },
> > 
> > Huh, that's very strange. I don't see a single instance in the tree of a
> > of_device_id struct like this with two compatibles like this (at least
> > with a search of `rg "\.compatible.*\", \"" drivers/`.
> > 
> > Given the fallbacks in the binding, only "microchip,lan8650" actually
> > needs to be here.
> > 
> > > +       { .compatible = "microchip,lan865x", "microchip,lan8650" },
> > >         { /* Sentinel */ }
> > >  };
> > >  MODULE_DEVICE_TABLE(of, lan865x_dt_ids);
> > > 
> > > Along with compatible = "microchip,lan865x" in the dts
> > 
> > Just to be clear, the compatible w/ an x is unacceptable due to the
> > wildcard and the binding should stay as-is. Whatever probing bugs
> > the code has need to be resolved instead :)
> > 
> 
> All right, so when I change to
> 
> @@ -364,7 +364,7 @@ static void lan865x_remove(struct spi_device *spi)
>  }
> 
>  static const struct of_device_id lan865x_dt_ids[] = {
> -       { .compatible = "microchip,lan8651", "microchip,lan8650" },
> +       { .compatible = "microchip,lan8650" },
>         { /* Sentinel */ }
>  };
>  MODULE_DEVICE_TABLE(of, lan865x_dt_ids);
> 
> I still get the output
> [    0.124266] SPI driver lan865x has no spi_device_id for microchip,lan8650
> But the driver does probe and I get a network interface.
> 
> If no one beats me to it I'll single step the probe tomorrow.

I think the error pretty much is what it says it is, the driver doesn't
appear to have a spi_device_id table containing lan8650. The name of
the driver is lan685x which is used in the fallback clause in
__spi_register_driver(), so it complains as it does not find lan8650 in
either. If my understanding is correct, either a spi_device_id table is
required or the driver needs a rename with s/x/0/.

Cheers,
Conor.
Andrew Lunn April 27, 2024, 8:40 p.m. UTC | #30
On Sat, Apr 27, 2024 at 09:19:34PM +0200, Ramón Nordin Rodriguez wrote:
> Hi,
> 
> For me the mac driver fails to probe with the following log
> [    0.123325] SPI driver lan865x has no spi_device_id for microchip,lan8651
> 
> With this change the driver probes
> 
> diff --git a/drivers/net/ethernet/microchip/lan865x/lan865x.c b/drivers/net/ethernet/microchip/lan865x/lan865x.c
> index 9abefa8b9d9f..72a663f14f50 100644
> --- a/drivers/net/ethernet/microchip/lan865x/lan865x.c
> +++ b/drivers/net/ethernet/microchip/lan865x/lan865x.c
> @@ -364,7 +364,7 @@ static void lan865x_remove(struct spi_device *spi)
>  }
> 
>  static const struct of_device_id lan865x_dt_ids[] = {
> -       { .compatible = "microchip,lan8651", "microchip,lan8650" },
> +       { .compatible = "microchip,lan865x", "microchip,lan8650" },
>         { /* Sentinel */ }
>  };

The device tree binding says:

+  compatible:
+    oneOf:
+      - const: microchip,lan8650
+      - items:
+          - const: microchip,lan8651
+          - const: microchip,lan8650

So your DT node should either be:

compatible = "microchip,lan8651", "microchip,lan8650";

or

compatible = "microchip,lan8650"

There is no mention of lan865x in the binding, so this patch is
clearly wrong.

What do you have in your DT node?

     Andrew
Andrew Lunn April 27, 2024, 8:58 p.m. UTC | #31
On Sat, Apr 27, 2024 at 09:35:06PM +0200, Ramón Nordin Rodriguez wrote:
> I'm running a dual lan8650 setup, neither IC passed the sw reset in the
> oa_tc.c module, I need to pull the reset pin low to reset the pin before
> the rest of the init stuff happens.
> 
> The datasheet recommends not doing a sw reset, excerpt from section
> 4.1.1.3 Software Reset
> "Note: The SW_RESET bit of the Clause 22 Basic Control register will reset only the internal PHY, not
> the entire device. This PHY only reset is not recommended for use. If such a reset is detected, by
> reading the RESETC bit of the STS2 register, reset the entire device."

That is not so good. The PHY driver does not know the PHY is embedded
within another device. It has no idea of RESETC bit in STS2. Looking
at the phy driver, i don't actually seeing it using
genphy_soft_reset(). Do you see a code path where this could actually
be an issue?

Supporting a hardware reset does however make sense. It would be best
if you submitted a proper clean patch. It can be added to the end of
this series, keeping you as author.

     Andrew
Ramón Nordin Rodriguez April 27, 2024, 9:06 p.m. UTC | #32
Ok this tripped me up.

> The device tree binding says:
> 
> +  compatible:
> +    oneOf:
> +      - const: microchip,lan8650
> +      - items:
> +          - const: microchip,lan8651
> +          - const: microchip,lan8650
> 
> So your DT node should either be:
> 
> compatible = "microchip,lan8651", "microchip,lan8650";
> 
> or
> 
> compatible = "microchip,lan8650"
> 
> There is no mention of lan865x in the binding, so this patch is
> clearly wrong.
> 
> What do you have in your DT node?

Initially I set compatible = "microchip,lan8650", and did not get the
driver to probe, so I got carried away with adding things that were not
necessary.

I dropped my patch and tested again.
What does work is setting:

compatible = "microchip,lan8651"

 - or - 

compatible = "microchip,lan8651", "microchip,lan8650"

but just compatible = "lan8650" does not work.

Also I'm getting the output
[    0.125056] SPI driver lan8650 has no spi_device_id for microchip,lan8651

As Conor pointed out setting the define DRV_NAME to "lan8651" fixes
that.
Setting the define to "lan8650" yet gets the spi module to log the 'no spi_device id..'.

I don't really have an opinion here, but I think there is a risk that
more than one dev might stumble on the same thing as me and expect that
either or should work.

BR
R
Ramón Nordin Rodriguez April 27, 2024, 9:09 p.m. UTC | #33
> I think the error pretty much is what it says it is, the driver doesn't
> appear to have a spi_device_id table containing lan8650. The name of
> the driver is lan685x which is used in the fallback clause in
> __spi_register_driver(), so it complains as it does not find lan8650 in
> either. If my understanding is correct, either a spi_device_id table is
> required or the driver needs a rename with s/x/0/.
> 

Right you are, no gdb necessary. With the caveat that I only get it
working when setting DRV_NAME to "lan8651", setting it to "lan8650"
still produces the log

R
Andrew Lunn April 27, 2024, 9:17 p.m. UTC | #34
On Sat, Apr 27, 2024 at 09:52:15PM +0200, Ramón Nordin Rodriguez wrote:
> > +static int oa_tc6_unmask_macphy_error_interrupts(struct oa_tc6 *tc6)
> > +{
> > +	u32 regval;
> > +	int ret;
> > +
> > +	ret = oa_tc6_read_register(tc6, OA_TC6_REG_INT_MASK0, &regval);
> > +	if (ret)
> > +		return ret;
> > +
> > +	regval &= ~(INT_MASK0_TX_PROTOCOL_ERR_MASK |
> > +		    INT_MASK0_RX_BUFFER_OVERFLOW_ERR_MASK |
> > +		    INT_MASK0_LOSS_OF_FRAME_ERR_MASK |
> > +		    INT_MASK0_HEADER_ERR_MASK);
> > +
> > +	return oa_tc6_write_register(tc6, OA_TC6_REG_INT_MASK0, regval);
> > +}
> > +
> 
> This togheter with patch 11 works poorly for me. I get alot of kernel
> output, dropped packets and lower performance.
> Below is an example for a run when I curl a 10MB blob
> 
> time curl 20.0.0.55:8000/rdump -o dump -w '{%speed_download}'
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                  Dload  U[  387.944737] net_ratelimit: 38 callbacks suppressed
> pload   Total   Spent    Left  Sp[  387.944755] eth0: Receive buffer overflow error
> eed
>   0     0    0     0    0     0      0      0 --:--:-- --:-[  387.961424] eth0: Receive buffer overflow error
>   0 10.0M    0  2896    0     0  13031      0  0:13:24 --:--:--  0:13:24 12986[  388.204257] eth0: Receive buffer overflow error
> [  388.209848] eth0: Receive buffer overflow error

How fast is your SPI bus? Faster than the link speed? Or slower?

It could be different behaviour is needed depending on the SPI bus
speed. If the SPI bus is faster than the link speed, by some margin,
the receiver buffer should not overflow, since the CPU can empty the
buffer faster than it fills.

If however, the SPI bus is slower than the link speed, there will be
buffer overflows, and a reliance on TCP backing off and slowing down.
The driver should not be spamming the log, since it is going to happen
and there is nothing that can be done about it.

> I tried this patch
> 
> diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c
> index 9f17f3712137..bd7bd3ef6897 100644
> --- a/drivers/net/ethernet/oa_tc6.c
> +++ b/drivers/net/ethernet/oa_tc6.c
> @@ -615,21 +615,9 @@ static int oa_tc6_sw_reset_macphy(struct oa_tc6 *tc6)
>         return oa_tc6_write_register(tc6, OA_TC6_REG_STATUS0, regval);
>  }
> 
> -static int oa_tc6_unmask_macphy_error_interrupts(struct oa_tc6 *tc6)
> +static int oa_tc6_disable_imask0_interrupts(struct oa_tc6 *tc6)
>  {
> -       u32 regval;
> -       int ret;
> -
> -       ret = oa_tc6_read_register(tc6, OA_TC6_REG_INT_MASK0, &regval);
> -       if (ret)
> -               return ret;
> -
> -       regval &= ~(INT_MASK0_TX_PROTOCOL_ERR_MASK |
> -                   INT_MASK0_RX_BUFFER_OVERFLOW_ERR_MASK |
> -                   INT_MASK0_LOSS_OF_FRAME_ERR_MASK |
> -                   INT_MASK0_HEADER_ERR_MASK);
> -
> -       return oa_tc6_write_register(tc6, OA_TC6_REG_INT_MASK0, regval);
> +       return oa_tc6_write_register(tc6, OA_TC6_REG_INT_MASK0, (u32)-1);

So this appears to be disabling all error interrupts?

This is maybe going too far. Overflow errors are going to happen if
you have a slow SPI bus. So i probably would not enable that. However,
are the other errors actually expected in normal usage? If not, leave
them enabled, because they might indicate real problems.

> Which results in no log spam, ~5-10% higher throughput and no dropped
> packets when I look at /sys/class/net/eth0/statistics/rx_dropped

You cannot trust rx_dropped because you just disabled the code which
increments it! The device is probably still dropping packets, and they
are no longer counted.

It could be the performance increase comes from two places:

1) Spending time and bus bandwidth dealing with the buffer overflow
interrupt

2) Printing out the serial port.

Please could you benchmark a few things:

1) Remove the printk("Receive buffer overflow error"), but otherwise
keep the code the same. That will give us an idea how much the serial
port matters.

2) Disable only the RX buffer overflow interrupt

3) Disable all error interrupts.

   Andrew
Ramón Nordin Rodriguez April 27, 2024, 9:29 p.m. UTC | #35
On Sat, Apr 27, 2024 at 10:58:07PM +0200, Andrew Lunn wrote:
> On Sat, Apr 27, 2024 at 09:35:06PM +0200, Ramón Nordin Rodriguez wrote:
> > I'm running a dual lan8650 setup, neither IC passed the sw reset in the
> > oa_tc.c module, I need to pull the reset pin low to reset the pin before
> > the rest of the init stuff happens.
> > 
> > The datasheet recommends not doing a sw reset, excerpt from section
> > 4.1.1.3 Software Reset
> > "Note: The SW_RESET bit of the Clause 22 Basic Control register will reset only the internal PHY, not
> > the entire device. This PHY only reset is not recommended for use. If such a reset is detected, by
> > reading the RESETC bit of the STS2 register, reset the entire device."
> 
> That is not so good. The PHY driver does not know the PHY is embedded
> within another device. It has no idea of RESETC bit in STS2. Looking
> at the phy driver, i don't actually seeing it using
> genphy_soft_reset(). Do you see a code path where this could actually
> be an issue?
> 

I agree with your assesment, the phy won't reset itself, but maybe we
could add some comment doc about not adding it for the lan8670,
so no one trips over that in the future.
Though the phy does not have to be baked with the lan8650 mac, so that
might be tricky to cover all future bases there. But for now I don't
think it matters.

Then regarding doing the soft reset in the oa_tc6 module, I'm not sure
that it matters, since it seems to work fine for me for as long as I do
the hw reset first.
But I can submit a suggestion for how to deal with reset-quirks if we
want to have the soft reset optional.

I'll run a test with and without the soft reset and see if I can spot
any change in behaviour.

Let me know if I missed any nuance in your question.

> Supporting a hardware reset does however make sense. It would be best
> if you submitted a proper clean patch. It can be added to the end of
> this series, keeping you as author.
> 

I'd be happy to. Getting late in scandinavia so I'll clean it up and
submit tomorrow.

R
Ramón Nordin Rodriguez April 27, 2024, 9:55 p.m. UTC | #36
> How fast is your SPI bus? Faster than the link speed? Or slower?
> 
> It could be different behaviour is needed depending on the SPI bus
> speed. If the SPI bus is faster than the link speed, by some margin,
> the receiver buffer should not overflow, since the CPU can empty the
> buffer faster than it fills.

I'm running at 25MHz, I'm guessing that should translate to fast enough
for the 10MBit half duplex link.
But I'm not sure how the spi clock translates to bps here.

> 
> If however, the SPI bus is slower than the link speed, there will be
> buffer overflows, and a reliance on TCP backing off and slowing down.
> The driver should not be spamming the log, since it is going to happen
> and there is nothing that can be done about it.
> 

I agree, I think the print could be a DBG if deemed necessary, but there
is also the dropped counter to look at.

> > I tried this patch
> > 
> > diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c
> > index 9f17f3712137..bd7bd3ef6897 100644
> > --- a/drivers/net/ethernet/oa_tc6.c
> > +++ b/drivers/net/ethernet/oa_tc6.c
> > @@ -615,21 +615,9 @@ static int oa_tc6_sw_reset_macphy(struct oa_tc6 *tc6)
> >         return oa_tc6_write_register(tc6, OA_TC6_REG_STATUS0, regval);
> >  }
> > 
> > -static int oa_tc6_unmask_macphy_error_interrupts(struct oa_tc6 *tc6)
> > +static int oa_tc6_disable_imask0_interrupts(struct oa_tc6 *tc6)
> >  {
> > -       u32 regval;
> > -       int ret;
> > -
> > -       ret = oa_tc6_read_register(tc6, OA_TC6_REG_INT_MASK0, &regval);
> > -       if (ret)
> > -               return ret;
> > -
> > -       regval &= ~(INT_MASK0_TX_PROTOCOL_ERR_MASK |
> > -                   INT_MASK0_RX_BUFFER_OVERFLOW_ERR_MASK |
> > -                   INT_MASK0_LOSS_OF_FRAME_ERR_MASK |
> > -                   INT_MASK0_HEADER_ERR_MASK);
> > -
> > -       return oa_tc6_write_register(tc6, OA_TC6_REG_INT_MASK0, regval);
> > +       return oa_tc6_write_register(tc6, OA_TC6_REG_INT_MASK0, (u32)-1);
> 
> So this appears to be disabling all error interrupts?

Yes, and I think you are right in that it's an overcorrection. There is
a secondary interrupt mask register as well that is not touched by the
driver, so that's left at whatever the chip defaults to on reset.

> 
> This is maybe going too far. Overflow errors are going to happen if
> you have a slow SPI bus. So i probably would not enable that. However,
> are the other errors actually expected in normal usage? If not, leave
> them enabled, because they might indicate real problems.

I'm guessing you are right and that the others actually would be
meningful to log.
There is a nested question here as well, and that is wheter to keep or
drop the code that drops the rx buffer on overflow interrupt.
I think not dropping the full buffer could be one of the reasons for the
perf change.

> 
> > Which results in no log spam, ~5-10% higher throughput and no dropped
> > packets when I look at /sys/class/net/eth0/statistics/rx_dropped
> 
> You cannot trust rx_dropped because you just disabled the code which
> increments it! The device is probably still dropping packets, and they
> are no longer counted.

I'll curb my enthusiasm a bit :)

> 
> It could be the performance increase comes from two places:
> 
> 1) Spending time and bus bandwidth dealing with the buffer overflow
> interrupt
> 
> 2) Printing out the serial port.
> 

I think it's possible that the buffer cleanup triggered after the
overflow interrupt hits could be cause 3) here, but just a guess.

> Please could you benchmark a few things:
> 
> 1) Remove the printk("Receive buffer overflow error"), but otherwise
> keep the code the same. That will give us an idea how much the serial
> port matters.
> 
> 2) Disable only the RX buffer overflow interrupt
> 
> 3) Disable all error interrupts.
> 

Thanks for layout it out so clearly. I'll run through the scenarios!

R
Bagas Sanjaya April 28, 2024, 9:33 a.m. UTC | #37
On Thu, Apr 18, 2024 at 06:26:37PM +0530, Parthiban Veerasooran wrote:
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 04e5f7c20e30..79fa7abb4ec9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16400,6 +16400,12 @@ L:	linux-rdma@vger.kernel.org
>  S:	Supported
>  F:	drivers/infiniband/ulp/opa_vnic
>  
> +OPEN ALLIANCE 10BASE-T1S MACPHY SERIAL INTERFACE FRAMEWORK
> +M:	Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
> +L:	netdev@vger.kernel.org
> +S:	Maintained
> +F:	Documentation/networking/oa-tc6-framework.rst
> +
>  OPEN FIRMWARE AND FLATTENED DEVICE TREE
>  M:	Rob Herring <robh+dt@kernel.org>
>  M:	Frank Rowand <frowand.list@gmail.com>

I can't apply this series on top of current net-next due to MAINTAINERS diff
above (on context line 16400 I have Omnivision entries). Care to reroll?
(Hint: also specify --base= to git-send-email(1) so that I and other
reviewers know where to apply this series.)

Confused...
Andrew Lunn April 28, 2024, 2:18 p.m. UTC | #38
> but just compatible = "lan8650" does not work.

The device tree binding says that is valid, so this needs fixing.

Maybe copy max1111.c in hwmon:

static const struct spi_device_id max1111_ids[] = {
        { "max1110", max1110 },
        { "max1111", max1111 },
        { "max1112", max1112 },
        { "max1113", max1113 },
        { },
};
MODULE_DEVICE_TABLE(spi, max1111_ids);

static struct spi_driver max1111_driver = {
        .driver         = {
                .name   = "max1111",
        },
        .id_table       = max1111_ids,
        .probe          = max1111_probe,
        .remove         = max1111_remove,
};

Interestingly, there is no compatible table. So the device tree
binding probably should change, not require two compatibles for
lan8651.

	Andrew
Andrew Lunn April 28, 2024, 2:25 p.m. UTC | #39
> I agree with your assesment, the phy won't reset itself, but maybe we
> could add some comment doc about not adding it for the lan8670,
> so no one trips over that in the future.

In the PHY driver, you can provide your own .soft_reset handler. It
could return -EOPNOTSUPP, or -EINVAL. Maybe check the data sheets for
the standalone devices supported by the driver. Can you limit this to
just the TC6 PHY?

     Andrew
Andrew Lunn April 28, 2024, 2:48 p.m. UTC | #40
On Sat, Apr 27, 2024 at 11:55:13PM +0200, Ramón Nordin Rodriguez wrote:
> > How fast is your SPI bus? Faster than the link speed? Or slower?
> > 
> > It could be different behaviour is needed depending on the SPI bus
> > speed. If the SPI bus is faster than the link speed, by some margin,
> > the receiver buffer should not overflow, since the CPU can empty the
> > buffer faster than it fills.
> 
> I'm running at 25MHz, I'm guessing that should translate to fast enough
> for the 10MBit half duplex link.
> But I'm not sure how the spi clock translates to bps here.

That seems plenty fast. Maybe you can get a bus pirate or similar
sniffing the bus. Maybe there are big gaps between the transfers for
some reason? Or the interrupt controller is very slow?

> I'm guessing you are right and that the others actually would be
> meningful to log.
> There is a nested question here as well, and that is wheter to keep or
> drop the code that drops the rx buffer on overflow interrupt.
> I think not dropping the full buffer could be one of the reasons for the
> perf change.

You need to look careful at what a buffer overflow means, as written
in the standard. Does it mean a chunk has been dropped from the frame
currently being transferred over the SPI bus? If so, you need to drop
the frame, because it is missing 64 bytes somewhere. That could happen
if the device has very minimal buffering and does cut through. So the
frame goes straight to the SPI bus while it is still being received
from the line. Or the device could have sufficient buffers to hold a
few full frames. It has run out of such buffers while receiving, and
so dropped the frame. You never see that frame over SPI because it has
already been discarded. If so, linux should not be dropping anything,
the device already has.

Given your 25Mhz bus speed, i think there at least two things wrong
here. Dropping frames is too noise, and potentially doing it wrong. I
also think there is something not optimal in your SPI master, because
25MHz should not have trouble with 10Mbps line speed.

	Andrew
Andrew Lunn April 28, 2024, 2:59 p.m. UTC | #41
> n  |  name     |  min  |  avg  |  max  |  rx dropped  |  samples
> 1  |  no mod   |  827K |  846K |  891K |      945     |     5
> 2  |  no log   |  711K |  726K |  744K |      562     |     5
> 3  |  less irq |  815K |  833K |  846K |      N/A     |     5
> 4  |  no irq   |  914K |  924K |  931K |      N/A     |     5
> 5  |  simple   |  857K |  868K |  879K |      615     |     5

That is odd.

Side question: What CONFIG_HZ= do you have? 100, 250, 1000?  Try
1000. I've seen problems where the driver wants to sleep for a short
time, but the CONFIG_HZ value limits how short a time it can actually
sleep. It ends up sleeping much longer than it wants.

	Andrew
Ramón Nordin Rodriguez April 28, 2024, 10 p.m. UTC | #42
On Sun, Apr 28, 2024 at 04:25:28PM +0200, Andrew Lunn wrote:
> > I agree with your assesment, the phy won't reset itself, but maybe we
> > could add some comment doc about not adding it for the lan8670,
> > so no one trips over that in the future.
> 
> In the PHY driver, you can provide your own .soft_reset handler. It
> could return -EOPNOTSUPP, or -EINVAL. Maybe check the data sheets for
> the standalone devices supported by the driver. Can you limit this to
> just the TC6 PHY?
> 

Gotcha, I think that should be pretty easy to handle then. The
microchip_t1s.c module handles two phy families 
* lan865x - baked in
* lan867x - standalone 

I need to do some thinking and get a bit more oriented. But pretty sure
there is a simple path for this.

R
Ramón Nordin Rodriguez April 28, 2024, 10:04 p.m. UTC | #43
On Sun, Apr 28, 2024 at 04:59:40PM +0200, Andrew Lunn wrote:
> > n  |  name     |  min  |  avg  |  max  |  rx dropped  |  samples
> > 1  |  no mod   |  827K |  846K |  891K |      945     |     5
> > 2  |  no log   |  711K |  726K |  744K |      562     |     5
> > 3  |  less irq |  815K |  833K |  846K |      N/A     |     5
> > 4  |  no irq   |  914K |  924K |  931K |      N/A     |     5
> > 5  |  simple   |  857K |  868K |  879K |      615     |     5
> 
> That is odd.
> 
> Side question: What CONFIG_HZ= do you have? 100, 250, 1000?  Try
> 1000. I've seen problems where the driver wants to sleep for a short
> time, but the CONFIG_HZ value limits how short a time it can actually
> sleep. It ends up sleeping much longer than it wants.
> 

Good catch, had it set to 250. I'll rebuild with CONFIG_HZ=1000 and
rerun the tests.
Parthiban Veerasooran April 29, 2024, 6:13 a.m. UTC | #44
Hi Andrew,

On 26/04/24 11:43 pm, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Fri, Apr 26, 2024 at 01:45:20PM +0000, Parthiban.Veerasooran@microchip.com wrote:
>> Hi Andrew,
>>
>> On 24/04/24 5:38 am, Andrew Lunn wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>
>>>> +static int oa_tc6_allocate_rx_skb(struct oa_tc6 *tc6)
>>>> +{
>>>> +     tc6->rx_skb = netdev_alloc_skb(tc6->netdev, tc6->netdev->mtu + ETH_HLEN +
>>>> +                                    ETH_FCS_LEN + NET_IP_ALIGN);
>>>> +     if (!tc6->rx_skb) {
>>>> +             tc6->netdev->stats.rx_dropped++;
>>>> +             return -ENOMEM;
>>>> +     }
>>>> +     skb_reserve(tc6->rx_skb, NET_IP_ALIGN);
>>>
>>> I think you can use netdev_alloc_skb_ip_align() here.
>> Ah OK, then do you mean we can rewrite the function
>> oa_tc6_allocate_rx_skb() as below?
>>
>> static int oa_tc6_allocate_rx_skb(struct oa_tc6 *tc6)
>> {
>>        tc6->rx_skb = netdev_alloc_skb_ip_align(tc6->netdev, tc6->netdev->mtu +
>> ETH_HLEN + ETH_FCS_LEN);
>>        if (tc6->rx_skb)
>>                return 0;
>>
>>        tc6->netdev->stats.rx_dropped++;
>>        return -ENOMEM;
>> }
> 
> Looks about right. But i did say 'I think', meaning i'm not too sure
> about this.
> 
> I generally don't review code actually moving packets around. It is
> what developers focus on, test heavily, and so is generally O.K. It is
> the code around the edges which often needs improvements prompted by
> review, ethtool, PHY handling, statistics.
You are right. But I tested implementing your proposal with iperf3 and 
it works as expected. So will keep this implementation in the next version.

Best regards,
Parthiban V
> 
>          Andrew
Parthiban Veerasooran April 29, 2024, 6:13 a.m. UTC | #45
Hi Andrew,

On 26/04/24 11:44 pm, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> OK, I will implement this helper function as oa_tc6_enable_zarfe() in
>> the oa_tc6.c. Also do you want me to move this helper function
>> implementation to a new patch?
> 
> Yes please.
OK. Thanks.

Best regards,
Parthiban V
> 
>      Andrew
Parthiban Veerasooran April 29, 2024, 8:32 a.m. UTC | #46
Hi Ramon,

On 28/04/24 1:32 am, Ramón Nordin Rodriguez wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Could xfer.rx_buf for the data path point to the currently allocacted socket buff
>          struct spi_transfer xfer = { 0 };
>          struct spi_message msg;
> 
>          if (header_type == OA_TC6_DATA_HEADER) {
>                  xfer.tx_buf = tc6->spi_data_tx_buf;
>                  xfer.rx_buf = tc6->spi_data_rx_buf;
>          } else {
>                  xfer.tx_buf = tc6->spi_ctrl_tx_buf;
>                  xfer.rx_buf = tc6->spi_ctrl_rx_buf;
>          }
>          xfer.len = length;
> 
> To avoid an additional copy here?
I think, this can be done later as part of optimization/improvements. 
Let's keep it simple and optimize it later.

Best regards,
Parthiban V
> 
>> +static void oa_tc6_update_rx_skb(struct oa_tc6 *tc6, u8 *payload, u8 length)
>> +{
>> +     memcpy(skb_put(tc6->rx_skb, length), payload, length);
>> +}
>>
> 
> R
>
Parthiban Veerasooran April 29, 2024, 9:47 a.m. UTC | #47
Hi All,

On 28/04/24 1:27 am, Conor Dooley wrote:
> On Sat, Apr 27, 2024 at 09:19:34PM +0200, Ramón Nordin Rodriguez wrote:
>> Hi,
>>
>> For me the mac driver fails to probe with the following log
>> [    0.123325] SPI driver lan865x has no spi_device_id for microchip,lan8651
>>
>> With this change the driver probes
>>
>> diff --git a/drivers/net/ethernet/microchip/lan865x/lan865x.c b/drivers/net/ethernet/microchip/lan865x/lan865x.c
>> index 9abefa8b9d9f..72a663f14f50 100644
>> --- a/drivers/net/ethernet/microchip/lan865x/lan865x.c
>> +++ b/drivers/net/ethernet/microchip/lan865x/lan865x.c
>> @@ -364,7 +364,7 @@ static void lan865x_remove(struct spi_device *spi)
>>   }
>>
>>   static const struct of_device_id lan865x_dt_ids[] = {
>> -       { .compatible = "microchip,lan8651", "microchip,lan8650" },
> Huh, that's very strange. I don't see a single instance in the tree of a
> of_device_id struct like this with two compatibles like this (at least
> with a search of `rg "\.compatible.*\", \"" drivers/`.
> 
> Given the fallbacks in the binding, only "microchip,lan8650" actually
> needs to be here.
> 
>> +       { .compatible = "microchip,lan865x", "microchip,lan8650" },
>>          { /* Sentinel */ }
>>   };
>>   MODULE_DEVICE_TABLE(of, lan865x_dt_ids);
>>
>> Along with compatible = "microchip,lan865x" in the dts
> Just to be clear, the compatible w/ an x is unacceptable due to the
> wildcard and the binding should stay as-is. Whatever probing bugs
> the code has need to be resolved instead 🙂
> 
Looks like, the below changes needed to work correctly,

lan865x.c:
- compatible string to be changed like below as it is a fallback for all 
variants,
	.compatible = "microchip,lan8650"
- DRV_NAME to be changed like below,
	#define DRV_NAME                        "lan8650"

microchip,lan865x.example.dts for lan8650:
- compatible string to be changed like below,
	.compatible = "microchip,lan8650";
	OR
microchip,lan865x.example.dts for lan8651:
- compatible string to be changed like below,
	compatible = "microchip,lan8651", "microchip,lan8650";

I tested with the above changes and there was no issues observed. Any 
comments on this? Otherwise we can stick with these changes for the next 
version.

Best regards,
Parthiban V

> Thanks,
> Conor.
Ramón Nordin Rodriguez April 29, 2024, 10:45 a.m. UTC | #48
> > To avoid an additional copy here?
> I think, this can be done later as part of optimization/improvements. 
> Let's keep it simple and optimize it later.

Sound good to me

R
Parthiban Veerasooran April 29, 2024, 11:31 a.m. UTC | #49
Hi,

On 28/04/24 3:03 pm, Bagas Sanjaya wrote:
> On Thu, Apr 18, 2024 at 06:26:37PM +0530, Parthiban Veerasooran wrote:
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 04e5f7c20e30..79fa7abb4ec9 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -16400,6 +16400,12 @@ L:	linux-rdma@vger.kernel.org
>>   S:	Supported
>>   F:	drivers/infiniband/ulp/opa_vnic
>>   
>> +OPEN ALLIANCE 10BASE-T1S MACPHY SERIAL INTERFACE FRAMEWORK
>> +M:	Parthiban Veerasooran<parthiban.veerasooran@microchip.com>
>> +L:	netdev@vger.kernel.org
>> +S:	Maintained
>> +F:	Documentation/networking/oa-tc6-framework.rst
>> +
>>   OPEN FIRMWARE AND FLATTENED DEVICE TREE
>>   M:	Rob Herring<robh+dt@kernel.org>
>>   M:	Frank Rowand<frowand.list@gmail.com>
> I can't apply this series on top of current net-next due to MAINTAINERS diff
> above (on context line 16400 I have Omnivision entries). Care to reroll?
> (Hint: also specify --base= to git-send-email(1) so that I and other
> reviewers know where to apply this series.)
OK. Will add the base commit in the next version.

It is,

base-commit: b307e25d4e6d341ce5af8682d5913c70655d1d36

Best regards,
Parthiban V
> 
> Confused...
> 
> -- An old man doll... just what I always wanted! - Clara
Andrew Lunn April 29, 2024, 12:09 p.m. UTC | #50
> Looks like, the below changes needed to work correctly,
> 
> lan865x.c:
> - compatible string to be changed like below as it is a fallback for all 
> variants,
> 	.compatible = "microchip,lan8650"
> - DRV_NAME to be changed like below,
> 	#define DRV_NAME                        "lan8650"
> 
> microchip,lan865x.example.dts for lan8650:
> - compatible string to be changed like below,
> 	.compatible = "microchip,lan8650";
> 	OR
> microchip,lan865x.example.dts for lan8651:
> - compatible string to be changed like below,
> 	compatible = "microchip,lan8651", "microchip,lan8650";
> 
> I tested with the above changes and there was no issues observed. Any 
> comments on this? Otherwise we can stick with these changes for the next 
> version.

As Conor said, this is probably relying on the fallback
mechanism. Please look at other SPI devices, e.g. hwmon, and see how
they probe for multiple different devices.

	Andrew
Simon Horman April 29, 2024, 4:17 p.m. UTC | #51
On Thu, Apr 18, 2024 at 06:26:37PM +0530, Parthiban Veerasooran wrote:
> The IEEE 802.3cg project defines two 10 Mbit/s PHYs operating over a
> single pair of conductors. The 10BASE-T1L (Clause 146) is a long reach
> PHY supporting full duplex point-to-point operation over 1 km of single
> balanced pair of conductors. The 10BASE-T1S (Clause 147) is a short reach
> PHY supporting full / half duplex point-to-point operation over 15 m of
> single balanced pair of conductors, or half duplex multidrop bus
> operation over 25 m of single balanced pair of conductors.
> 
> Furthermore, the IEEE 802.3cg project defines the new Physical Layer
> Collision Avoidance (PLCA) Reconciliation Sublayer (Clause 148) meant to
> provide improved determinism to the CSMA/CD media access method. PLCA
> works in conjunction with the 10BASE-T1S PHY operating in multidrop mode.
> 
> The aforementioned PHYs are intended to cover the low-speed / low-cost
> applications in industrial and automotive environment. The large number
> of pins (16) required by the MII interface, which is specified by the
> IEEE 802.3 in Clause 22, is one of the major cost factors that need to be
> addressed to fulfil this objective.
> 
> The MAC-PHY solution integrates an IEEE Clause 4 MAC and a 10BASE-T1x PHY
> exposing a low pin count Serial Peripheral Interface (SPI) to the host
> microcontroller. This also enables the addition of Ethernet functionality
> to existing low-end microcontrollers which do not integrate a MAC
> controller.
> 
> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>

Hi Parthiban,

Some minor feedback from my side.

> ---
>  Documentation/networking/oa-tc6-framework.rst | 491 ++++++++++++++++++

Please add oa-tc6-framework to 

Flagged by: make SPHINXDIRS="networking" htmldocs

>  MAINTAINERS                                   |   6 +
>  2 files changed, 497 insertions(+)
>  create mode 100644 Documentation/networking/oa-tc6-framework.rst
> 
> diff --git a/Documentation/networking/oa-tc6-framework.rst b/Documentation/networking/oa-tc6-framework.rst

...

> +Device drivers API
> +==================
> +
> +The include/linux/oa_tc6.h defines the following functions:
> +
> +.. c:function:: struct oa_tc6 *oa_tc6_init(struct spi_device *spi,
> +                                           struct net_device *netdev)

I think that you need to use a '\' to escape newlines
for a for a multi-line c:function.

e.g.
.. c:function:: struct oa_tc6 *oa_tc6_init(struct spi_device *spi, \
                                           struct net_device *netdev)

Link: https://www.sphinx-doc.org/en/master/usage/domains/index.html

Flagged by make SPHINXDIRS="networking" htmldocs
when using Sphinx 7.2.6

> +
> +Initialize OA TC6 lib.
> +
> +.. c:function:: void oa_tc6_exit(struct oa_tc6 *tc6)
> +
> +Free allocated OA TC6 lib.
> +
> +.. c:function:: int oa_tc6_write_register(struct oa_tc6 *tc6, u32 address,
> +                                          u32 value)
> +
> +Write a single register in the MAC-PHY.
> +
> +.. c:function:: int oa_tc6_write_registers(struct oa_tc6 *tc6, u32 address,
> +                                           u32 value[], u8 length)
> +
> +Writing multiple consecutive registers starting from @address in the MAC-PHY.
> +Maximum of 128 consecutive registers can be written starting at @address.
> +
> +.. c:function:: int oa_tc6_read_register(struct oa_tc6 *tc6, u32 address,
> +                                         u32 *value)
> +
> +Read a single register in the MAC-PHY.
> +
> +.. c:function:: int oa_tc6_read_registers(struct oa_tc6 *tc6, u32 address,
> +                                          u32 value[], u8 length)
> +
> +Reading multiple consecutive registers starting from @address in the MAC-PHY.
> +Maximum of 128 consecutive registers can be read starting at @address.
> +
> +.. c:function:: netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6,
> +                                              struct sk_buff *skb);
> +
> +The transmit Ethernet frame in the skb is or going to be transmitted through
> +the MAC-PHY.

...
Simon Horman April 29, 2024, 4:36 p.m. UTC | #52
On Thu, Apr 18, 2024 at 06:26:38PM +0530, Parthiban Veerasooran wrote:
> Implement register write operation according to the control communication
> specified in the OPEN Alliance 10BASE-T1x MACPHY Serial Interface
> document. Control write commands are used by the SPI host to write
> registers within the MAC-PHY. Each control write commands are composed of
> a 32 bits control command header followed by register write data.
> 
> The MAC-PHY ignores the final 32 bits of data from the SPI host at the
> end of the control write command. The write command and data is also
> echoed from the MAC-PHY back to the SPI host to enable the SPI host to
> identify which register write failed in the case of any bus errors.
> Control write commands can write either a single register or multiple
> consecutive registers. When multiple consecutive registers are written,
> the address is automatically post-incremented by the MAC-PHY. Writing to
> any unimplemented or undefined registers shall be ignored and yield no
> effect.
> 
> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>

...

> diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c

...

> +/**
> + * oa_tc6_write_registers - function for writing multiple consecutive registers.
> + * @tc6: oa_tc6 struct.
> + * @address: address of the first register to be written in the MAC-PHY.
> + * @value: values to be written from the starting register address @address.
> + * @length: number of consecutive registers to be written from @address.
> + *
> + * Maximum of 128 consecutive registers can be written starting at @address.
> + *
> + * Returns 0 on success otherwise failed.

Nit: I think you need a ':' after "Returns" (or "Return")
     in order for kernel-doc -Wall to recognise this as a return section.

Likewise elsewhere in this patch(set).

> + */
> +int oa_tc6_write_registers(struct oa_tc6 *tc6, u32 address, u32 value[],
> +			   u8 length)
> +{

...
Parthiban Veerasooran April 30, 2024, 8:14 a.m. UTC | #53
Hi Simon,

On 29/04/24 10:06 pm, Simon Horman wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Thu, Apr 18, 2024 at 06:26:38PM +0530, Parthiban Veerasooran wrote:
>> Implement register write operation according to the control communication
>> specified in the OPEN Alliance 10BASE-T1x MACPHY Serial Interface
>> document. Control write commands are used by the SPI host to write
>> registers within the MAC-PHY. Each control write commands are composed of
>> a 32 bits control command header followed by register write data.
>>
>> The MAC-PHY ignores the final 32 bits of data from the SPI host at the
>> end of the control write command. The write command and data is also
>> echoed from the MAC-PHY back to the SPI host to enable the SPI host to
>> identify which register write failed in the case of any bus errors.
>> Control write commands can write either a single register or multiple
>> consecutive registers. When multiple consecutive registers are written,
>> the address is automatically post-incremented by the MAC-PHY. Writing to
>> any unimplemented or undefined registers shall be ignored and yield no
>> effect.
>>
>> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
> 
> ...
> 
>> diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c
> 
> ...
> 
>> +/**
>> + * oa_tc6_write_registers - function for writing multiple consecutive registers.
>> + * @tc6: oa_tc6 struct.
>> + * @address: address of the first register to be written in the MAC-PHY.
>> + * @value: values to be written from the starting register address @address.
>> + * @length: number of consecutive registers to be written from @address.
>> + *
>> + * Maximum of 128 consecutive registers can be written starting at @address.
>> + *
>> + * Returns 0 on success otherwise failed.
> 
> Nit: I think you need a ':' after "Returns" (or "Return")
>       in order for kernel-doc -Wall to recognise this as a return section.
> 
> Likewise elsewhere in this patch(set).
OK. Thanks for the hint. Yes I noticed some warnings on this "Returns" 
area while running the below command,

./scripts/kernel-doc -Wall -v -none drivers/net/ethernet/oa_tc6.c

Will fix it in the next version.

Best regards,
Parthiban V
> 
>> + */
>> +int oa_tc6_write_registers(struct oa_tc6 *tc6, u32 address, u32 value[],
>> +                        u8 length)
>> +{
> 
> ...
Parthiban Veerasooran April 30, 2024, 11:30 a.m. UTC | #54
Hi Simon,

On 29/04/24 9:47 pm, Simon Horman wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Thu, Apr 18, 2024 at 06:26:37PM +0530, Parthiban Veerasooran wrote:
>> The IEEE 802.3cg project defines two 10 Mbit/s PHYs operating over a
>> single pair of conductors. The 10BASE-T1L (Clause 146) is a long reach
>> PHY supporting full duplex point-to-point operation over 1 km of single
>> balanced pair of conductors. The 10BASE-T1S (Clause 147) is a short reach
>> PHY supporting full / half duplex point-to-point operation over 15 m of
>> single balanced pair of conductors, or half duplex multidrop bus
>> operation over 25 m of single balanced pair of conductors.
>>
>> Furthermore, the IEEE 802.3cg project defines the new Physical Layer
>> Collision Avoidance (PLCA) Reconciliation Sublayer (Clause 148) meant to
>> provide improved determinism to the CSMA/CD media access method. PLCA
>> works in conjunction with the 10BASE-T1S PHY operating in multidrop mode.
>>
>> The aforementioned PHYs are intended to cover the low-speed / low-cost
>> applications in industrial and automotive environment. The large number
>> of pins (16) required by the MII interface, which is specified by the
>> IEEE 802.3 in Clause 22, is one of the major cost factors that need to be
>> addressed to fulfil this objective.
>>
>> The MAC-PHY solution integrates an IEEE Clause 4 MAC and a 10BASE-T1x PHY
>> exposing a low pin count Serial Peripheral Interface (SPI) to the host
>> microcontroller. This also enables the addition of Ethernet functionality
>> to existing low-end microcontrollers which do not integrate a MAC
>> controller.
>>
>> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
> 
> Hi Parthiban,
> 
> Some minor feedback from my side.
> 
>> ---
>>   Documentation/networking/oa-tc6-framework.rst | 491 ++++++++++++++++++
> 
> Please add oa-tc6-framework to
> 
> Flagged by: make SPHINXDIRS="networking" htmldocs
OK. Thanks for pointing this. Will correct it in the next version.
> 
>>   MAINTAINERS                                   |   6 +
>>   2 files changed, 497 insertions(+)
>>   create mode 100644 Documentation/networking/oa-tc6-framework.rst
>>
>> diff --git a/Documentation/networking/oa-tc6-framework.rst b/Documentation/networking/oa-tc6-framework.rst
> 
> ...
> 
>> +Device drivers API
>> +==================
>> +
>> +The include/linux/oa_tc6.h defines the following functions:
>> +
>> +.. c:function:: struct oa_tc6 *oa_tc6_init(struct spi_device *spi,
>> +                                           struct net_device *netdev)
> 
> I think that you need to use a '\' to escape newlines
> for a for a multi-line c:function.
> 
> e.g.
> .. c:function:: struct oa_tc6 *oa_tc6_init(struct spi_device *spi, \
>                                             struct net_device *netdev)
> 
> Link: https://www.sphinx-doc.org/en/master/usage/domains/index.html
> 
> Flagged by make SPHINXDIRS="networking" htmldocs
> when using Sphinx 7.2.6
Yes noted. Will correct it in the next version.

Best regards,
Parthiban V
> 
>> +
>> +Initialize OA TC6 lib.
>> +
>> +.. c:function:: void oa_tc6_exit(struct oa_tc6 *tc6)
>> +
>> +Free allocated OA TC6 lib.
>> +
>> +.. c:function:: int oa_tc6_write_register(struct oa_tc6 *tc6, u32 address,
>> +                                          u32 value)
>> +
>> +Write a single register in the MAC-PHY.
>> +
>> +.. c:function:: int oa_tc6_write_registers(struct oa_tc6 *tc6, u32 address,
>> +                                           u32 value[], u8 length)
>> +
>> +Writing multiple consecutive registers starting from @address in the MAC-PHY.
>> +Maximum of 128 consecutive registers can be written starting at @address.
>> +
>> +.. c:function:: int oa_tc6_read_register(struct oa_tc6 *tc6, u32 address,
>> +                                         u32 *value)
>> +
>> +Read a single register in the MAC-PHY.
>> +
>> +.. c:function:: int oa_tc6_read_registers(struct oa_tc6 *tc6, u32 address,
>> +                                          u32 value[], u8 length)
>> +
>> +Reading multiple consecutive registers starting from @address in the MAC-PHY.
>> +Maximum of 128 consecutive registers can be read starting at @address.
>> +
>> +.. c:function:: netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6,
>> +                                              struct sk_buff *skb);
>> +
>> +The transmit Ethernet frame in the skb is or going to be transmitted through
>> +the MAC-PHY.
> 
> ...
Parthiban Veerasooran April 30, 2024, 1:30 p.m. UTC | #55
Hi Andrew,

On 29/04/24 5:39 pm, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> Looks like, the below changes needed to work correctly,
>>
>> lan865x.c:
>> - compatible string to be changed like below as it is a fallback for all
>> variants,
>>        .compatible = "microchip,lan8650"
>> - DRV_NAME to be changed like below,
>>        #define DRV_NAME                        "lan8650"
>>
>> microchip,lan865x.example.dts for lan8650:
>> - compatible string to be changed like below,
>>        .compatible = "microchip,lan8650";
>>        OR
>> microchip,lan865x.example.dts for lan8651:
>> - compatible string to be changed like below,
>>        compatible = "microchip,lan8651", "microchip,lan8650";
>>
>> I tested with the above changes and there was no issues observed. Any
>> comments on this? Otherwise we can stick with these changes for the next
>> version.
> 
> As Conor said, this is probably relying on the fallback
> mechanism. Please look at other SPI devices, e.g. hwmon, and see how
> they probe for multiple different devices.
I just referred the below drivers for the spi devices handling along 
with the compatible,

https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/davicom/dm9051.c#L1239

https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/adi/adin1110.c#L1644

lan8650 - MAC-PHY chip
lan8651 - ETH Click-Mikroe with MAC-PHY chip

So, they are different in interface but not in functionality. There is 
no difference in the configuration. So let's consider lan8650 is the 
fallback option for lan8651.

By referring the above links, I have restructured the code like below to 
support with lan8651 fallback. Still there is no change in the existing 
device tree binding. This is the only change in lan865x.c.

static const struct spi_device_id spidev_spi_ids[] = {
         { .name = "lan8650" },
         {},
};

static const struct of_device_id lan865x_dt_ids[] = {
         { .compatible = "microchip,lan8650" },
         { /* Sentinel */ }
};
MODULE_DEVICE_TABLE(of, lan865x_dt_ids);

static struct spi_driver lan865x_driver = {
         .driver = {
                 .name = DRV_NAME,
                 .of_match_table = lan865x_dt_ids,
          },
         .probe = lan865x_probe,
         .remove = lan865x_remove,
         .id_table = spidev_spi_ids,
};

I also referred the below two links for the device tree binding and 
driver in case of fallback.

https://elixir.bootlin.com/linux/latest/source/Documentation/devicetree/bindings/net/smsc,lan9115.yaml#L18

https://elixir.bootlin.com/linux/latest/source/Documentation/devicetree/bindings/net/smsc,lan9115.yaml#L102

https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/smsc/smsc911x.c#L2652

Best regards,
Parthiban V
> 
>          Andrew
Conor Dooley April 30, 2024, 4:55 p.m. UTC | #56
On Tue, Apr 30, 2024 at 01:30:22PM +0000, Parthiban.Veerasooran@microchip.com wrote:
> Hi Andrew,
> 
> On 29/04/24 5:39 pm, Andrew Lunn wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > 
> >> Looks like, the below changes needed to work correctly,
> >>
> >> lan865x.c:
> >> - compatible string to be changed like below as it is a fallback for all
> >> variants,
> >>        .compatible = "microchip,lan8650"
> >> - DRV_NAME to be changed like below,
> >>        #define DRV_NAME                        "lan8650"
> >>
> >> microchip,lan865x.example.dts for lan8650:
> >> - compatible string to be changed like below,
> >>        .compatible = "microchip,lan8650";
> >>        OR
> >> microchip,lan865x.example.dts for lan8651:
> >> - compatible string to be changed like below,
> >>        compatible = "microchip,lan8651", "microchip,lan8650";
> >>
> >> I tested with the above changes and there was no issues observed. Any
> >> comments on this? Otherwise we can stick with these changes for the next
> >> version.
> > 
> > As Conor said, this is probably relying on the fallback
> > mechanism. Please look at other SPI devices, e.g. hwmon, and see how
> > they probe for multiple different devices.
> I just referred the below drivers for the spi devices handling along 
> with the compatible,
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/davicom/dm9051.c#L1239
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/adi/adin1110.c#L1644
> 
> lan8650 - MAC-PHY chip
> lan8651 - ETH Click-Mikroe with MAC-PHY chip
> 
> So, they are different in interface but not in functionality. There is 
> no difference in the configuration. So let's consider lan8650 is the 
> fallback option for lan8651.
> 
> By referring the above links, I have restructured the code like below to 
> support with lan8651 fallback. Still there is no change in the existing 
> device tree binding. This is the only change in lan865x.c.
> 
> static const struct spi_device_id spidev_spi_ids[] = {
>          { .name = "lan8650" },
>          {},
> };
> 
> static const struct of_device_id lan865x_dt_ids[] = {
>          { .compatible = "microchip,lan8650" },
>          { /* Sentinel */ }
> };
> MODULE_DEVICE_TABLE(of, lan865x_dt_ids);
> 
> static struct spi_driver lan865x_driver = {
>          .driver = {
>                  .name = DRV_NAME,
>                  .of_match_table = lan865x_dt_ids,
>           },
>          .probe = lan865x_probe,
>          .remove = lan865x_remove,
>          .id_table = spidev_spi_ids,
> };
> 
> I also referred the below two links for the device tree binding and 
> driver in case of fallback.

Did you also verify that the warning from the spi core is no longer
generated when using compatible = "microchip,lan8651", "microchip,lan8650"?
Ramón Nordin Rodriguez May 1, 2024, 6:29 p.m. UTC | #57
> > n  |  name     |  min  |  avg  |  max  |  rx dropped  |  samples
> > 1  |  no mod   |  827K |  846K |  891K |      945     |     5
> > 2  |  no log   |  711K |  726K |  744K |      562     |     5
> > 3  |  less irq |  815K |  833K |  846K |      N/A     |     5
> > 4  |  no irq   |  914K |  924K |  931K |      N/A     |     5
> > 5  |  simple   |  857K |  868K |  879K |      615     |     5
> 
> That is odd.
> 
> Side question: What CONFIG_HZ= do you have? 100, 250, 1000?  Try
> 1000. I've seen problems where the driver wants to sleep for a short
> time, but the CONFIG_HZ value limits how short a time it can actually
> sleep. It ends up sleeping much longer than it wants.
> 

I have been doing my best to abuse the link some more. In brief tweaking
CONFIG_HZ has some but limited effect. 
Saturating the link with the rx buffer interrupt enabled breaks the driver.
Saturating the link with the rx buffer interrupt disabled has poor
performance.

The following scenario has been tested. Both ends of the link run:
* server.py
* client.py

One end is an arm64 quad core running at 1.2GHz with the lan8650 macphy.
The other end is an amd 3950x running the lan8670 usb eval board.
Both systems should be fast enough that running python should not be a
limiting factor.

-- The test code --
server.py
#!/bin/env python3
import socket

def serve(sock: socket.socket):
    while True:
        client, addr = sock.accept()
        print(f'connection from: {addr}')
        while len(client.recv(2048)) > 0:
            pass
        print('client disconnected')
        client.close()

if __name__ == '__main__':
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    sock.bind(('0.0.0.0', 4040))
    sock.listen(1)
    serve(sock)
    print("something went wrong")

client.py
#!/bin/env python3
import socket
import sys

if __name__ == '__main__':
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((sys.argv[1], 4040))

    while True:
        sock.sendall(b'0'*2048)

-- test runs --
run 1 - all interrupts enabled
Time to failure:
1 min or less

Kernel output:
[   94.361312] sched: RT throttling activated

top output:
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 145 root     -51   0       0      0      0 R  95.5   0.0   1:11.22 oa-tc6-spi-thread

link stats:
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 32:c2:7e:22:93:99 brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast
       3371902    7186      0      48       0       0
    RX errors:  length    crc   frame    fifo overrun
                     0      0       0       0       0
    TX:  bytes packets errors dropped carrier collsns
      10341438    8071      0       0       0       0
    TX errors: aborted   fifo  window heartbt transns
                     0      0       0       0       1
state:
Completly borked, can't ping in or out, bringing the interface down then up
has no effect.
There is no SPI clock and no interrupts generated by the mac-phy.
The worker thread seems to have live locked.

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
run 2 - RX_BUFFER_OVERLOW interrupt disabled

state:
Runs just fine but the oa-tc6-spi-thread is consuming 10-20% cpu
Ping times have increased from 1-2ms to 8-35ms


-- additional notes --
When tweaking CONFIG_HZ I do get some changes in behaviour, the cpu
consumption stays stable at 20%+-2 with CONFIG_HZ=250, when increased to
CONFIG_HZ=1000 it jumps up and down between 10-20%.

I don't have access to a logic analyzer but my old oscilloscope is
almost reliable. I could confirm that the spi clock is indeed running at
the expected 25MHz, but I could observe some gaps of up to 320µs so
that's 8k spi cycles spent doing something else.
These gaps were observed on the SPI clock and the macphy interrupt was
active for the same ammount of time(though this was measured independently
and not on the same trigger).
I've been drinking way to much coffe, so soldering is not gonna happen
today (shaky hands), but if it helps I can solder wires to attach both
probes to confirm that the gap in the SPI clock happens at the same time
or not as the interrupt is active.

I'd be keen on hearing what Microchips plans to address. If tracking
down performance issues is a priority I'll probably not spend any time
on it, if not then I'll definetly dig into it more.

Let me know if anything is unclear or if I can help out with anything
specific.

R
Andrew Lunn May 1, 2024, 7:58 p.m. UTC | #58
> Kernel output:
> [   94.361312] sched: RT throttling activated
> 
> top output:
>  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>  145 root     -51   0       0      0      0 R  95.5   0.0   1:11.22 oa-tc6-spi-thread
> 
> link stats:
> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
>     link/ether 32:c2:7e:22:93:99 brd ff:ff:ff:ff:ff:ff
>     RX:  bytes packets errors dropped  missed   mcast
>        3371902    7186      0      48       0       0
>     RX errors:  length    crc   frame    fifo overrun
>                      0      0       0       0       0
>     TX:  bytes packets errors dropped carrier collsns
>       10341438    8071      0       0       0       0
>     TX errors: aborted   fifo  window heartbt transns
>                      0      0       0       0       1
> state:
> Completly borked, can't ping in or out, bringing the interface down then up
> has no effect.

Completely borked is not good. 

> I'd be keen on hearing what Microchips plans to address. If tracking
> down performance issues is a priority I'll probably not spend any time
> on it, if not then I'll definetly dig into it more.

Borked ! = performance issue. This should be addressed. Lets see if
the python scripts are enough to be able to reproduce it.

We should not rule out differences in SPI drivers. So it might be you
need to debug this, if Microchip cannot reproduce it.

I would also suggest you enable the "kernel hacking" debug features
for finding deadlocks, bad spinlocks, stuck RT tasks etc. That might
give some clues.

	Andrew
Andrew Lunn May 1, 2024, 8:07 p.m. UTC | #59
> I don't have access to a logic analyzer but my old oscilloscope is
> almost reliable. I could confirm that the spi clock is indeed running at
> the expected 25MHz, but I could observe some gaps of up to 320µs so
> that's 8k spi cycles spent doing something else.
> These gaps were observed on the SPI clock and the macphy interrupt was
> active for the same ammount of time(though this was measured independently
> and not on the same trigger).
> I've been drinking way to much coffe, so soldering is not gonna happen
> today (shaky hands), but if it helps I can solder wires to attach both
> probes to confirm that the gap in the SPI clock happens at the same time
> or not as the interrupt is active.

What i expect you will see is that the interrupt line goes active. A
while later there is an SPI bus transfer to fetch the interrupt
status, and when that transfer completes the interrupt should go
inactive. But you will need your second channel to confirm this.

One question would be, is 320µs reasonable for an interrupt, and one
SPI transfer?

I would then expect a number of back to back transfers, each around 64
bytes in size, as it gets the received frame. The gaps between those
transfers will be interesting.

	Andrew
Parthiban Veerasooran May 2, 2024, 5:56 a.m. UTC | #60
Hi Conor,

On 30/04/24 10:25 pm, Conor Dooley wrote:
> On Tue, Apr 30, 2024 at 01:30:22PM +0000,Parthiban.Veerasooran@microchip.com  wrote:
>> Hi Andrew,
>>
>> On 29/04/24 5:39 pm, Andrew Lunn wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>
>>>> Looks like, the below changes needed to work correctly,
>>>>
>>>> lan865x.c:
>>>> - compatible string to be changed like below as it is a fallback for all
>>>> variants,
>>>>         .compatible = "microchip,lan8650"
>>>> - DRV_NAME to be changed like below,
>>>>         #define DRV_NAME                        "lan8650"
>>>>
>>>> microchip,lan865x.example.dts for lan8650:
>>>> - compatible string to be changed like below,
>>>>         .compatible = "microchip,lan8650";
>>>>         OR
>>>> microchip,lan865x.example.dts for lan8651:
>>>> - compatible string to be changed like below,
>>>>         compatible = "microchip,lan8651", "microchip,lan8650";
>>>>
>>>> I tested with the above changes and there was no issues observed. Any
>>>> comments on this? Otherwise we can stick with these changes for the next
>>>> version.
>>> As Conor said, this is probably relying on the fallback
>>> mechanism. Please look at other SPI devices, e.g. hwmon, and see how
>>> they probe for multiple different devices.
>> I just referred the below drivers for the spi devices handling along
>> with the compatible,
>>
>> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/davicom/dm9051.c#L1239
>>
>> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/adi/adin1110.c#L1644
>>
>> lan8650 - MAC-PHY chip
>> lan8651 - ETH Click-Mikroe with MAC-PHY chip
>>
>> So, they are different in interface but not in functionality. There is
>> no difference in the configuration. So let's consider lan8650 is the
>> fallback option for lan8651.
>>
>> By referring the above links, I have restructured the code like below to
>> support with lan8651 fallback. Still there is no change in the existing
>> device tree binding. This is the only change in lan865x.c.
>>
>> static const struct spi_device_id spidev_spi_ids[] = {
>>           { .name = "lan8650" },
>>           {},
>> };
>>
>> static const struct of_device_id lan865x_dt_ids[] = {
>>           { .compatible = "microchip,lan8650" },
>>           { /* Sentinel */ }
>> };
>> MODULE_DEVICE_TABLE(of, lan865x_dt_ids);
>>
>> static struct spi_driver lan865x_driver = {
>>           .driver = {
>>                   .name = DRV_NAME,
>>                   .of_match_table = lan865x_dt_ids,
>>            },
>>           .probe = lan865x_probe,
>>           .remove = lan865x_remove,
>>           .id_table = spidev_spi_ids,
>> };
>>
>> I also referred the below two links for the device tree binding and
>> driver in case of fallback.
> Did you also verify that the warning from the spi core is no longer
> generated when using compatible = "microchip,lan8651", "microchip,lan8650"?
Do you mean changing in the lan865x.c file? if yes then I got the 
warning "SPI driver lan865x has no spi_device_id for microchip,lan8651"

But after updating the lan865x.c like below, the warning disappeared.

static const struct spi_device_id spidev_spi_ids[] = {
         { .name = "lan8650" },
         { .name = "lan8651" },
         {},
};

Note: In both the above two cases compatible in the dts is
compatible = "microchip,lan8651", "microchip,lan8650";

Best regards,
Parthiban V
Parthiban Veerasooran May 2, 2024, 10:10 a.m. UTC | #61
Hi Ramon,

On 01/05/24 11:59 pm, Ramón Nordin Rodriguez wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>>> n  |  name     |  min  |  avg  |  max  |  rx dropped  |  samples
>>> 1  |  no mod   |  827K |  846K |  891K |      945     |     5
>>> 2  |  no log   |  711K |  726K |  744K |      562     |     5
>>> 3  |  less irq |  815K |  833K |  846K |      N/A     |     5
>>> 4  |  no irq   |  914K |  924K |  931K |      N/A     |     5
>>> 5  |  simple   |  857K |  868K |  879K |      615     |     5
>>
>> That is odd.
>>
>> Side question: What CONFIG_HZ= do you have? 100, 250, 1000?  Try
>> 1000. I've seen problems where the driver wants to sleep for a short
>> time, but the CONFIG_HZ value limits how short a time it can actually
>> sleep. It ends up sleeping much longer than it wants.
>>
> 
> I have been doing my best to abuse the link some more. In brief tweaking
> CONFIG_HZ has some but limited effect.
> Saturating the link with the rx buffer interrupt enabled breaks the driver.
> Saturating the link with the rx buffer interrupt disabled has poor
> performance.
> 
> The following scenario has been tested. Both ends of the link run:
> * server.py
> * client.py
> 
> One end is an arm64 quad core running at 1.2GHz with the lan8650 macphy.
> The other end is an amd 3950x running the lan8670 usb eval board.
> Both systems should be fast enough that running python should not be a
> limiting factor.
> 
> -- The test code --
> server.py
> #!/bin/env python3
> import socket
> 
> def serve(sock: socket.socket):
>      while True:
>          client, addr = sock.accept()
>          print(f'connection from: {addr}')
>          while len(client.recv(2048)) > 0:
>              pass
>          print('client disconnected')
>          client.close()
> 
> if __name__ == '__main__':
>      sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>      sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
>      sock.bind(('0.0.0.0', 4040))
>      sock.listen(1)
>      serve(sock)
>      print("something went wrong")
> 
> client.py
> #!/bin/env python3
> import socket
> import sys
> 
> if __name__ == '__main__':
>      sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>      sock.connect((sys.argv[1], 4040))
> 
>      while True:
>          sock.sendall(b'0'*2048)
> 
> -- test runs --
> run 1 - all interrupts enabled
> Time to failure:
> 1 min or less
> 
> Kernel output:
> [   94.361312] sched: RT throttling activated
> 
> top output:
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>   145 root     -51   0       0      0      0 R  95.5   0.0   1:11.22 oa-tc6-spi-thread
> 
> link stats:
> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
>      link/ether 32:c2:7e:22:93:99 brd ff:ff:ff:ff:ff:ff
>      RX:  bytes packets errors dropped  missed   mcast
>         3371902    7186      0      48       0       0
>      RX errors:  length    crc   frame    fifo overrun
>                       0      0       0       0       0
>      TX:  bytes packets errors dropped carrier collsns
>        10341438    8071      0       0       0       0
>      TX errors: aborted   fifo  window heartbt transns
>                       0      0       0       0       1
> state:
> Completly borked, can't ping in or out, bringing the interface down then up
> has no effect.
> There is no SPI clock and no interrupts generated by the mac-phy.
> The worker thread seems to have live locked.
> 
> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> run 2 - RX_BUFFER_OVERLOW interrupt disabled
> 
> state:
> Runs just fine but the oa-tc6-spi-thread is consuming 10-20% cpu
> Ping times have increased from 1-2ms to 8-35ms
> 
> 
> -- additional notes --
> When tweaking CONFIG_HZ I do get some changes in behaviour, the cpu
> consumption stays stable at 20%+-2 with CONFIG_HZ=250, when increased to
> CONFIG_HZ=1000 it jumps up and down between 10-20%.
> 
> I don't have access to a logic analyzer but my old oscilloscope is
> almost reliable. I could confirm that the spi clock is indeed running at
> the expected 25MHz, but I could observe some gaps of up to 320µs so
> that's 8k spi cycles spent doing something else.
> These gaps were observed on the SPI clock and the macphy interrupt was
> active for the same ammount of time(though this was measured independently
> and not on the same trigger).
> I've been drinking way to much coffe, so soldering is not gonna happen
> today (shaky hands), but if it helps I can solder wires to attach both
> probes to confirm that the gap in the SPI clock happens at the same time
> or not as the interrupt is active.
> 
> I'd be keen on hearing what Microchips plans to address. If tracking
> down performance issues is a priority I'll probably not spend any time
> on it, if not then I'll definetly dig into it more.
I tried to reproduce the issue in my setup with your above applications.
But surprisingly I couldn't reproduce the issue you are facing.

One end is Raspberry Pi 4 with lan8651 MAC-PHY and the other end is 
Raspberry Pi 4 with EVB-LAN8670-USB Stick.

lan8651 MAC-PHY side:
---------------------

pi@raspberrypi:~/lan865x/v4_mainline $ python3 server.py
connection from: ('192.168.5.101', 46608)

Top output:
-----------
top - 14:28:14 up  1:14,  4 users,  load average: 0.68, 0.67, 0.63
Tasks: 201 total,   1 running, 200 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  1.1 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.1 si, 
0.0 st
MiB Mem :   7810.0 total,   7110.5 free,    392.3 used,    432.4 
buff/cache
MiB Swap:    100.0 total,    100.0 free,      0.0 used.   7417.7 avail Mem

     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
COMMAND 

    7219 root     -51   0       0      0      0 S   2.7   0.0   1:13.11 
oa-tc6-spi-thread 

   35307 pi        20   0   16280   9216   5248 S   0.7   0.1   0:16.53 
python3 

   88831 pi        20   0   11728   4864   2816 R   0.3   0.1   0:00.22 
top 

   89819 vnc       20   0    2320    384    384 S   0.3   0.0   0:00.04 
sh 

       1 root      20   0  168652  11580   8436 S   0.0   0.1   0:03.85 
systemd 

       2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 
kthreadd 

       3 root      20   0       0      0      0 S   0.0   0.0   0:00.00 
pool_workqueue_release 

       4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 
kworker/R-rcu_g 

       5 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 
kworker/R-rcu_p 

       6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 
kworker/R-slub_ 

       7 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 
kworker/R-netns 

      11 root      20   0       0      0      0 I   0.0   0.0   0:00.03 
kworker/u8:0-ext4-rsv-conversion 

      12 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 
kworker/R-mm_pe

ifconfig output:
----------------
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 192.168.5.100  netmask 255.255.255.0  broadcast 0.0.0.0
         ether 04:05:06:01:02:03  txqueuelen 1000  (Ethernet)
         RX packets 1589879  bytes 2391045582 (2.2 GiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 1022419  bytes 71954905 (68.6 MiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
         device interrupt 55

dmesg output:
-------------
pi@raspberrypi:~/lan865x/v4_mainline $ dmesg
[  234.019968] LAN865X Rev.B0 Internal Phy spi0.0:00: attached PHY 
driver (mii_bus:phy_addr=spi0.0:00, irq=POLL)
[  234.069387] lan865x spi0.0 eth1: Link is Up - 10Mbps/Half - flow 
control off

EVB-LAN8670-USB stick side:
---------------------------
pi@pv-rpi-tp2:~/microchip/t1s-usb $ python3 client.py 192.168.5.100

ifconfig output:
----------------
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 192.168.5.101  netmask 255.255.255.0  broadcast 192.168.5.255
         ether 00:1e:c0:d1:ca:39  txqueuelen 1000  (Ethernet)
         RX packets 1034606  bytes 58330335 (55.6 MiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 1722400  bytes 2628199197 (2.4 GiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Let me know if you want to try anything more.

Best regards,
Parthiban V
> 
> Let me know if anything is unclear or if I can help out with anything
> specific.
> 
> R
Ramón Nordin Rodriguez May 2, 2024, 10:19 a.m. UTC | #62
> I tried to reproduce the issue in my setup with your above applications.
> But surprisingly I couldn't reproduce the issue you are facing.
> 
> One end is Raspberry Pi 4 with lan8651 MAC-PHY and the other end is 
> Raspberry Pi 4 with EVB-LAN8670-USB Stick.
> 
> lan8651 MAC-PHY side:
> ---------------------
> 

Did you run both client and server on both ends? The previous tests I
have done have only sent traffic in one direction, which did not lead to
a live lock.
But both sending and receiving as fast as possible in both directions
trigger the problems on my end.
R
Parthiban Veerasooran May 3, 2024, 7:10 a.m. UTC | #63
On 02/05/24 3:49 pm, Ramón Nordin Rodriguez wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> I tried to reproduce the issue in my setup with your above applications.
>> But surprisingly I couldn't reproduce the issue you are facing.
>>
>> One end is Raspberry Pi 4 with lan8651 MAC-PHY and the other end is
>> Raspberry Pi 4 with EVB-LAN8670-USB Stick.
>>
>> lan8651 MAC-PHY side:
>> ---------------------
>>
> 
> Did you run both client and server on both ends? The previous tests I
> have done have only sent traffic in one direction, which did not lead to
> a live lock.
> But both sending and receiving as fast as possible in both directions
> trigger the problems on my end.
OK, today I executed both the client and server on both the sides. It is 
running more than an hour, it didn't crash. But there are some "Receive 
buffer overflow error" time to time in the dmesg. I don't know whether 
this is an expected behavior with this type of stress test.

top output:
-----------

pi@raspberrypi:~/lan865x/v4_mainline $ top

top - 19:47:07 up 54 min,  5 users,  load average: 1.23, 1.20, 1.14
Tasks: 205 total,   1 running, 204 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  7.5 sy,  0.0 ni, 92.1 id,  0.0 wa,  0.0 hi,  0.2 si, 
0.0 st
MiB Mem :   7810.0 total,   7013.5 free,    402.6 used,    518.0 
buff/cache
MiB Swap:    100.0 total,    100.0 free,      0.0 used.   7407.5 avail Mem

     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
COMMAND 

    7218 root     -51   0       0      0      0 D  28.6   0.0  13:51.12 
oa-tc6-spi-thread 

   63853 vnc       20   0    2320    384    384 S   0.7   0.0   0:00.23 
sh 

   65090 pi        20   0   11728   4992   2816 R   0.7   0.1   0:00.04 
top 

      17 root      20   0       0      0      0 I   0.3   0.0   0:01.74 
rcu_preempt 

      27 root      20   0       0      0      0 S   0.3   0.0   0:16.67 
ksoftirqd/2 

     268 root      20   0   50024  16132  15108 S   0.3   0.2   0:02.02 
systemd-journal 

     956 pi        20   0  699196  37896  29160 S   0.3   0.5   0:03.60 
wireplumber 

    7328 pi        20   0   16280   9216   5248 S   0.3   0.1   0:07.61 
python3 

       1 root      20   0  168692  11340   8320 S   0.0   0.1   0:03.67 
systemd 

       2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 
kthreadd

ifconfig output:
----------------

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 192.168.5.100  netmask 255.255.255.0  broadcast 0.0.0.0
         ether 04:05:06:01:02:03  txqueuelen 1000  (Ethernet)
         RX packets 1944541  bytes 909191168 (867.0 MiB)
         RX errors 0  dropped 2  overruns 0  frame 0
         TX packets 2402855  bytes 3129162624 (2.9 GiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
         device interrupt 55

dmesg output:
-------------

[  285.482275] LAN865X Rev.B0 Internal Phy spi0.0:00: attached PHY 
driver (mii_bus:phy_addr=spi0.0:00, irq=POLL)
[  285.534760] lan865x spi0.0 eth1: Link is Up - 10Mbps/Half - flow 
control off
[  341.466221] eth1: Receive buffer overflow error
[  345.730222] eth1: Receive buffer overflow error
[  345.891126] eth1: Receive buffer overflow error
[  346.074220] eth1: Receive buffer overflow error
[  354.374223] eth1: Receive buffer overflow error
[  354.375422] eth1: Receive buffer overflow error
[  355.786219] eth1: Receive buffer overflow error
[  356.898221] eth1: Receive buffer overflow error
[  357.782219] eth1: Receive buffer overflow error
[  357.783194] eth1: Receive buffer overflow error
[  365.446224] eth1: Receive buffer overflow error
[  387.722221] eth1: Receive buffer overflow error
[  387.790218] eth1: Receive buffer overflow error
[  387.950224] eth1: Receive buffer overflow error
[  387.951199] eth1: Receive buffer overflow error
[  388.790219] eth1: Receive buffer overflow error
[  389.010220] eth1: Receive buffer overflow error
[  391.066221] eth1: Receive buffer overflow error
[  409.746225] eth1: Receive buffer overflow error
[  442.835278] eth1: Receive buffer overflow error
[  447.410222] eth1: Receive buffer overflow error
[  447.510220] eth1: Receive buffer overflow error
[  447.758219] eth1: Receive buffer overflow error
[  448.066221] eth1: Receive buffer overflow error
[  449.590220] eth1: Receive buffer overflow error
[  450.686221] eth1: Receive buffer overflow error
[  451.238222] eth1: Receive buffer overflow error
[  451.490220] eth1: Receive buffer overflow error
[  472.254221] eth1: Receive buffer overflow error
[  473.302222] eth1: Receive buffer overflow error
[  474.070220] eth1: Receive buffer overflow error
[  475.262220] eth1: Receive buffer overflow error
[  475.362223] eth1: Receive buffer overflow error
[  477.362224] eth1: Receive buffer overflow error
[  501.918221] eth1: Receive buffer overflow error
[  505.510219] eth1: Receive buffer overflow error
[  516.466220] eth1: Receive buffer overflow error
[  600.830221] eth1: Receive buffer overflow error
[  675.614219] eth1: Receive buffer overflow error
[  676.318223] eth1: Receive buffer overflow error
[  676.594223] eth1: Receive buffer overflow error
[  678.702219] eth1: Receive buffer overflow error
[  679.102219] eth1: Receive buffer overflow error
[  679.642218] eth1: Receive buffer overflow error
[  679.894218] eth1: Receive buffer overflow error
[  679.906224] eth1: Receive buffer overflow error
[  682.529386] eth1: Receive buffer overflow error
[  682.786221] eth1: Receive buffer overflow error
[  703.374220] eth1: Receive buffer overflow error
[  703.462219] eth1: Receive buffer overflow error
[  716.606222] eth1: Receive buffer overflow error
[  717.134219] eth1: Receive buffer overflow error
[  722.838219] eth1: Receive buffer overflow error
[  801.522220] eth1: Receive buffer overflow error
[  871.946226] eth1: Receive buffer overflow error
[ 1304.190220] eth1: Receive buffer overflow error
[ 1307.734221] eth1: Receive buffer overflow error
[ 1309.738224] eth1: Receive buffer overflow error
[ 1605.222219] eth1: Receive buffer overflow error
[ 1739.927121] eth1: Receive buffer overflow error
[ 2240.002213] eth1: Receive buffer overflow error
[ 2241.355378] eth1: Receive buffer overflow error
[ 2300.374218] eth1: Receive buffer overflow error
[ 2332.850214] eth1: Receive buffer overflow error
[ 2347.322215] eth1: Receive buffer overflow error
[ 2377.702216] eth1: Receive buffer overflow error
[ 2393.942214] eth1: Receive buffer overflow error
[ 2404.986214] eth1: Receive buffer overflow error
[ 2407.538216] eth1: Receive buffer overflow error
[ 2407.539190] eth1: Receive buffer overflow error
[ 2411.266213] eth1: Receive buffer overflow error
[ 2421.542220] eth1: Receive buffer overflow error
[ 2713.142214] eth1: Receive buffer overflow error
[ 3401.226215] eth1: Receive buffer overflow error
[ 3494.394214] eth1: Receive buffer overflow error
[ 3511.850214] eth1: Receive buffer overflow error


Best regards,
Parthiban V
> R
>
Andrew Lunn May 6, 2024, 1:21 a.m. UTC | #64
> [  285.482275] LAN865X Rev.B0 Internal Phy spi0.0:00: attached PHY 
> driver (mii_bus:phy_addr=spi0.0:00, irq=POLL)
> [  285.534760] lan865x spi0.0 eth1: Link is Up - 10Mbps/Half - flow 
> control off
> [  341.466221] eth1: Receive buffer overflow error
> [  345.730222] eth1: Receive buffer overflow error
> [  345.891126] eth1: Receive buffer overflow error
> [  346.074220] eth1: Receive buffer overflow error

Generally we only log real errors. Is a receive buffer overflow a real
error? I would say not. But it would be good to count it.

There was also the open question, what exactly does a receive buffer
overflow mean?

The spec says:

  9.2.8.11 RXBOE

  Receive Buffer Overflow Error. When set, this bit indicates that the
  receive buffer (from the network) has overflowed and receive frame
  data was lost.

Which is a bit ambiguous. I would hope it means the receiver has
dropped the packet. It will not be passed to the host. But other than
maybe count it, i don't think there is anything to do. But Ramón was
suggesting you actually drop the frame currently be transferred over
the SPI bus?

	Andrew
Piergiorgio Beruto May 6, 2024, 6:47 a.m. UTC | #65
HI all,

The RXBOE is basically a warning, indicating that for some reason the SPI host is not fast enough in retrieving frames from the device.
In my experience, this is typically caused by excessive latency of the SPI driver, or if you have an unoptimized network driver for the MACPHY.

And no, you would certainly NOT drop a frame that is transferred over the SPI bus. It is the frame at the ingress of the device that will be dropped (ie the one at the end of the queue, not the beginning)..
Just my 0.2cent

Thanks,
Piergiorgio

-----Original Message-----
From: Andrew Lunn <andrew@lunn.ch> 
Sent: 6 May, 2024 03:21
To: Parthiban.Veerasooran@microchip.com
Cc: ramon.nordin.rodriguez@ferroamp.se; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; horms@kernel.org; saeedm@nvidia.com; anthony.l.nguyen@intel.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; corbet@lwn.net; linux-doc@vger.kernel.org; robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org; conor+dt@kernel.org; devicetree@vger.kernel.org; Horatiu.Vultur@microchip.com; ruanjinjie@huawei.com; Steen.Hegelund@microchip.com; vladimir.oltean@nxp.com; UNGLinuxDriver@microchip.com; Thorsten.Kummermehr@microchip.com; Piergiorgio Beruto <Pier.Beruto@onsemi.com>; Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>; Nicolas.Ferre@microchip.com; benjamin.bigler@bernformulastudent.ch
Subject: Re: [PATCH net-next v4 05/12] net: ethernet: oa_tc6: implement error interrupts unmasking

[External Email]: This email arrived from an external source - Please exercise caution when opening any attachments or clicking on links.


> [  285.482275] LAN865X Rev.B0 Internal Phy spi0.0:00: attached PHY 
> driver (mii_bus:phy_addr=spi0.0:00, irq=POLL) [  285.534760] lan865x 
> spi0.0 eth1: Link is Up - 10Mbps/Half - flow control off [  
> 341.466221] eth1: Receive buffer overflow error [  345.730222] eth1: 
> Receive buffer overflow error [  345.891126] eth1: Receive buffer 
> overflow error [  346.074220] eth1: Receive buffer overflow error

Generally we only log real errors. Is a receive buffer overflow a real error? I would say not. But it would be good to count it.

There was also the open question, what exactly does a receive buffer overflow mean?

The spec says:

  9.2.8.11 RXBOE

  Receive Buffer Overflow Error. When set, this bit indicates that the
  receive buffer (from the network) has overflowed and receive frame
  data was lost.

Which is a bit ambiguous. I would hope it means the receiver has dropped the packet. It will not be passed to the host. But other than maybe count it, i don't think there is anything to do. But Ramón was suggesting you actually drop the frame currently be transferred over the SPI bus?

	Andrew
Ramón Nordin Rodriguez May 7, 2024, 12:44 p.m. UTC | #66
> The RXBOE is basically a warning, indicating that for some reason the SPI host is not fast enough in retrieving frames from the device.
> In my experience, this is typically caused by excessive latency of the SPI driver, or if you have an unoptimized network driver for the MACPHY.

Definetly tracks with what I'm seeing, got two macphys, one is sharing
the bus with a can transiever, getting more RX overflows on that
interface.

> Thanks,
> Piergiorgio
> 
> Which is a bit ambiguous. I would hope it means the receiver has dropped the packet. It will not be passed to the host. But other than maybe count it, i don't think there is anything to do. But Ramón was suggesting you actually drop the frame currently be transferred over the SPI bus?
> 
> 	Andrew

This was misscommunicated on my part. In my happy-hacking adventures I
got better throughput when dropping more chunks in the driver code.

--

Suddenly I can't reproduce the hang, I'll give it one more attempt today
though. Did some measurements with a better oscilloscope, I'm seeing a
periodic latency of ~500us for clearing the irq, typically it's
considerably shorter, and a periodic ~1ms silence on the wire.

But since I can't reproduce the hang this is just performance related.

R
Parthiban Veerasooran May 8, 2024, 1:05 p.m. UTC | #67
Hi Andrew,

On 23/04/24 4:53 am, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Mon, Apr 22, 2024 at 10:08:23PM +0200, Andrew Lunn wrote:
>>> Testing Details:
>>> ----------------
>>> The driver performance was tested using iperf3 in the below two setups
>>> separately.
>>>
>>> Setup 1:
>>> --------
>>> Node 0 - Raspberry Pi 4 with LAN8650 MAC-PHY
>>> Node 1 - Raspberry Pi 4 with EVB-LAN8670-USB USB Stick
>>>
>>> Setup 2:
>>> --------
>>> Node 0 - SAMA7G54-EK with LAN8650 MAC-PHY
>>> Node 1 - Raspberry Pi 4 with EVB-LAN8670-USB USB Stick
>>
>> Would it be possible to chain these two setups together by adding two
>> USB dongles to one of the Ri 4s? If i remember correctly, there were
>> reports of issues when two devices were using the framework at once.
> 
> Sorry, that makes no sense. Your USB device is unlikely to be doing
> USB->SPI->MAC-PHY. Do you have two LAN8650 MAC-PHY you can connect to
> one host?
Yes. I tried this test. It works as expected.

Setup:
------

-------------------      ---------------------------
|     RPI4 1      |      |           RPI4 2        |
| --------------- | N/W 1| ----------------------- |
| | 1st LAN8651 | |<---->| | 1st EVB-LAN8670-USB | |
| --------------- |      | ----------------------- |
|                 |      |                         |
| --------------- | N/W 2| ----------------------- |
| | 2nd LAN8651 | |<---->| | 2nd EVB-LAN8670-USB | |
| --------------- |      | ----------------------- |
-------------------      ---------------------------

        |---> 1st LAN8651 ---> IP: 192.168.5.100
RPI4 1 |
        |---> 2nd LAN8651 ---> IP: 192.168.5.100

        |---> 1st EVB-LAN8670-USB ---> IP: 192.168.5.101
RPI4 2 |
        |---> 2nd EVB-LAN8670-USB ---> IP: 192.168.6.101

Test case 1:
------------
  RPI4 1 (Sender):
  ----------------
   $ iperf3 -c 192.168.5.101 -u -b 5M -i 1 -t 0 -p 5001
   $ iperf3 -c 192.168.6.101 -u -b 5M -i 1 -t 0 -p 5002
  RPI4 2 (Receiver):
  ------------------
   $ iperf3 -s -p 5001
  RPI4 3 (Receiver):
  ------------------
   $ iperf3 -s -p 5002
  Result on RPI4 side:
  --------------------
   Each LAN8651 transmits 5Mbps without any error.

Test case 2:
------------
  RPI4 1 (Sender):
  ----------------
   $ iperf3 -c 192.168.5.101 -u -b 10M -i 1 -t 0 -p 5001
   $ iperf3 -c 192.168.6.101 -u -b 10M -i 1 -t 0 -p 5002
  RPI4 2 (Receiver):
  ------------------
   $ iperf3 -s -p 5001
  RPI4 3 (Receiver):
  ------------------
   $ iperf3 -s -p 5002
  Result on RPI4 side:
  --------------------
   Each LAN8651 transmitting approximately 5 or 6Mbps without any error.

Test case 3:
-----------------------
  RPI4 1 (Receiver):
  ----------------
   $ iperf3 -s -p 5001
   $ iperf3 -s -p 5002
  RPI4 2 (Sender):
  ----------------
   $ iperf3 -c 192.168.5.100 -u -b 5M -i 1 -t 0 -p 5001
  RPI4 3 (Sender):
  ----------------
   $ iperf3 -c 192.168.6.100 -u -b 5M -i 1 -t 0 -p 5002
  Result on RPI4 side:
  --------------------
   Each LAN8651 received 5Mbps without any error.

Test case 4:
-----------------------
  RPI4 1 (Receiver):
  ------------------
   $ iperf3 -s -p 5001
   $ iperf3 -s -p 5002
  RPI4 2 (Sender):
  ----------------
   $ iperf3 -c 192.168.5.100 -u -b 10M -i 1 -t 0 -p 5001
  RPI4 3 (Sender):
  ----------------
   $ iperf3 -c 192.168.6.100 -u -b 10M -i 1 -t 0 -p 5002
  Result on RPI4 side:
  --------------------
   Each LAN8651 received approximately 3Mbps with lot of "Receive buffer 
overflow error". I think it is expected as the single SPI master has to 
serve both LAN8651 at the same time and both LAN8651 will be receiving 
10Mbps on each.

Please let me know if I have any misunderstanding here or you need more 
details on the test cases.

Best regards,
Parthiban V
> 
>      Andrew
>
Andrew Lunn May 8, 2024, 5:04 p.m. UTC | #68
> Yes. I tried this test. It works as expected.

>    Each LAN8651 received approximately 3Mbps with lot of "Receive buffer 
> overflow error". I think it is expected as the single SPI master has to 
> serve both LAN8651 at the same time and both LAN8651 will be receiving 
> 10Mbps on each.

Thanks for testing this.

This also shows the "Receive buffer overflow error" needs to go away.
Either we don't care at all, and should not enable the interrupt, or
we do care and should increment a counter.

	Andrew
Parthiban Veerasooran May 9, 2024, 1:04 p.m. UTC | #69
Hi Andrew,

On 08/05/24 10:34 pm, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> Yes. I tried this test. It works as expected.
> 
>>     Each LAN8651 received approximately 3Mbps with lot of "Receive buffer
>> overflow error". I think it is expected as the single SPI master has to
>> serve both LAN8651 at the same time and both LAN8651 will be receiving
>> 10Mbps on each.
> 
> Thanks for testing this.
> 
> This also shows the "Receive buffer overflow error" needs to go away.
> Either we don't care at all, and should not enable the interrupt, or
> we do care and should increment a counter.
Thanks for your comments. I think, I would go for your 2nd proposal 
because having "Receive buffer overflow error" enabled will indicate the 
cause of the poor performance.

Already we have,
tc6->netdev->stats.rx_dropped++;
to increment the rx dropped counter in case of receive buffer overflow.

May be we can remove the print,
net_err_ratelimited("%s: Receive buffer overflow error\n", 
tc6->netdev->name);
as it might lead to additional poor performance by adding some delay.

Could you please provide your opinion on this?

Best regards,
Parthiban V
> 
>          Andrew
>
Andrew Lunn May 9, 2024, 8:39 p.m. UTC | #70
On Thu, May 09, 2024 at 01:04:52PM +0000, Parthiban.Veerasooran@microchip.com wrote:
> Hi Andrew,
> 
> On 08/05/24 10:34 pm, Andrew Lunn wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > 
> >> Yes. I tried this test. It works as expected.
> > 
> >>     Each LAN8651 received approximately 3Mbps with lot of "Receive buffer
> >> overflow error". I think it is expected as the single SPI master has to
> >> serve both LAN8651 at the same time and both LAN8651 will be receiving
> >> 10Mbps on each.
> > 
> > Thanks for testing this.
> > 
> > This also shows the "Receive buffer overflow error" needs to go away.
> > Either we don't care at all, and should not enable the interrupt, or
> > we do care and should increment a counter.
> Thanks for your comments. I think, I would go for your 2nd proposal 
> because having "Receive buffer overflow error" enabled will indicate the 
> cause of the poor performance.
> 
> Already we have,
> tc6->netdev->stats.rx_dropped++;
> to increment the rx dropped counter in case of receive buffer overflow.
> 
> May be we can remove the print,
> net_err_ratelimited("%s: Receive buffer overflow error\n", 
> tc6->netdev->name);
> as it might lead to additional poor performance by adding some delay.
> 
> Could you please provide your opinion on this?

This is your code. Ideally you should decide. I will only add review
comments if i think it is wrong. Any can decide between any correct
option.

	Andrew
Parthiban Veerasooran May 10, 2024, 11:22 a.m. UTC | #71
Hi Andrew,

On 10/05/24 2:09 am, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Thu, May 09, 2024 at 01:04:52PM +0000, Parthiban.Veerasooran@microchip.com wrote:
>> Hi Andrew,
>>
>> On 08/05/24 10:34 pm, Andrew Lunn wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>
>>>> Yes. I tried this test. It works as expected.
>>>
>>>>      Each LAN8651 received approximately 3Mbps with lot of "Receive buffer
>>>> overflow error". I think it is expected as the single SPI master has to
>>>> serve both LAN8651 at the same time and both LAN8651 will be receiving
>>>> 10Mbps on each.
>>>
>>> Thanks for testing this.
>>>
>>> This also shows the "Receive buffer overflow error" needs to go away.
>>> Either we don't care at all, and should not enable the interrupt, or
>>> we do care and should increment a counter.
>> Thanks for your comments. I think, I would go for your 2nd proposal
>> because having "Receive buffer overflow error" enabled will indicate the
>> cause of the poor performance.
>>
>> Already we have,
>> tc6->netdev->stats.rx_dropped++;
>> to increment the rx dropped counter in case of receive buffer overflow.
>>
>> May be we can remove the print,
>> net_err_ratelimited("%s: Receive buffer overflow error\n",
>> tc6->netdev->name);
>> as it might lead to additional poor performance by adding some delay.
>>
>> Could you please provide your opinion on this?
> 
> This is your code. Ideally you should decide. I will only add review
> comments if i think it is wrong. Any can decide between any correct
> option.
Sure, thanks for your advice. Let me stick with the above proposal until 
I get any others opinion.

Best regards,
Parthiban V
> 
>          Andrew
>
Ramón Nordin Rodriguez May 13, 2024, 6:41 a.m. UTC | #72
> The RXBOE is basically a warning, indicating that for some reason the SPI host is not fast enough in retrieving frames from the device.
> In my experience, this is typically caused by excessive latency of the SPI driver, or if you have an unoptimized network driver for the MACPHY.
> 
> Thanks,
> Piergiorgio
> 
> 
> > [  285.482275] LAN865X Rev.B0 Internal Phy spi0.0:00: attached PHY 
> > driver (mii_bus:phy_addr=spi0.0:00, irq=POLL) [  285.534760] lan865x 
> > spi0.0 eth1: Link is Up - 10Mbps/Half - flow control off [  
> > 341.466221] eth1: Receive buffer overflow error [  345.730222] eth1: 
> > Receive buffer overflow error [  345.891126] eth1: Receive buffer 
> > overflow error [  346.074220] eth1: Receive buffer overflow error
> 
> Generally we only log real errors. Is a receive buffer overflow a real error? I would say not. But it would be good to count it.
> 	Andrew

Hi,

I've been busy throwing stuff at the wall until something sticks. I've
managed to narrow a few things down.
First and foremost, when running a periodic udp6 multicast in the
background I don't get a hang in the driver, or it becomes considerably
harder to provoke.

When I make sure that the bespoke Ferroamp upd server is not started
(which just joins a mcast group and sends a less than MTU packet every
~500ms and listens for incoming multicast messages in the same group),
it becomes very simple to get to a live-lock.

My steasp of reproducing is setting a ipv4 address on both ends of the
link, then running the follwing script on both ends using the other ends
ip as argument.

#!/bin/env python3
import socket
import sys

if __name__ == '__main__':
    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

    while True:
        sock.sendto(b'0'*2048, (sys.argv[1], 4040))

Neither ends opens a listening socket. I get to the live lock in 10s or
less.
I've enabled some debugging options but so far nothing seems to hit.
What I've been able to conclude is that there still is SPI
communication, the macphy interrupt is still pulled low, and the cpu
does the ack so that it's reset to inactive.
But from there it seems no data is passed up the network stack. Some
symptoms are

* net_device stats are no longer incremented
* can't ping
* can't connect to sockets on the board etc.
* cpu usage jumps to and stays at 100% for the worker thread

The worker thread is released by the irq handler and it does some of the
expected work, but not all.
I'm adding some instrumentation to the code in an effort to figure out
where things break apart.
It might be possible to catch it in gdb as well, but I think you only
get one try as the timing will be pretty borked after the first break.

R
Andrew Lunn May 13, 2024, 1 p.m. UTC | #73
> I've enabled some debugging options but so far nothing seems to hit.
> What I've been able to conclude is that there still is SPI
> communication, the macphy interrupt is still pulled low, and the cpu
> does the ack so that it's reset to inactive.

Is it doing this in an endless cycle?

Probably the debug tools are not showing anything because it is not
looping in just one location. It is a complex loop, interrupts
triggering a thread which runs to completion etc. So it looks like
normal behaviour.

If it is an endless cycle, it sounds like an interrupt storm. Some
interrupt bit is not getting cleared, so it immediately fires again as
soon as interrupts are enabled.

Is this your dual device board? Do you have both devices on the same
SPI bus? Do they share interrupt lines?

	Andrew
Ramón Nordin Rodriguez May 13, 2024, 1:50 p.m. UTC | #74
On Mon, May 13, 2024 at 03:00:48PM +0200, Andrew Lunn wrote:
> > I've enabled some debugging options but so far nothing seems to hit.
> > What I've been able to conclude is that there still is SPI
> > communication, the macphy interrupt is still pulled low, and the cpu
> > does the ack so that it's reset to inactive.
> 
> Is it doing this in an endless cycle?

Exactly, so what I'm seeing is when the driver livelocks the macphy is
periodically pulling the irq pin low, the driver clears the interrupt
and repeat.

> 
> Probably the debug tools are not showing anything because it is not
> looping in just one location. It is a complex loop, interrupts
> triggering a thread which runs to completion etc. So it looks like
> normal behaviour.

Gotcha. The 'do work' func called in the worker threads loop does run
and return, so I guess there is not much to trigger on.

> 
> If it is an endless cycle, it sounds like an interrupt storm. Some
> interrupt bit is not getting cleared, so it immediately fires again as
> soon as interrupts are enabled.

Good input. I'll add some instrumentation/stats for how many jiffies
have elapsed between releases of the worker thread and for the irq
handler. I can probably find a gpio to toggle as well if it's really
tight timings.

The irq pin is inactive/high for 100s of us to ms in the measurments
I've done. But I've been using multiple channels and not the fanciest
equipment so samplerates might be playing tricks, I'll rerun some tests
while only measuring the irq pin.

> 
> Is this your dual device board? Do you have both devices on the same
> SPI bus? Do they share interrupt lines?
> 

It's on the dual device board, the macphys are using separate spi buses,
one chip shares the bus with another spi device, but the other is the
only tenant on the bus.

No device shares an irq line.

Pretty sure I can replicate the result for both devices, but need to 
double check, been to much testing of random things for me to keep track.

I'll do some more digging, I think we're getting pretty close to
understading the behaviour now.

R
Andrew Lunn May 13, 2024, 2:04 p.m. UTC | #75
> Good input. I'll add some instrumentation/stats for how many jiffies
> have elapsed between releases of the worker thread and for the irq
> handler. I can probably find a gpio to toggle as well if it's really
> tight timings.

What might be more interesting is the interrupt status registers. Is
there a bit always set which the driver is not clearly correctly?

You can try printing the values. But that might upset the timing so
you cannot reproduce the issue.

If the printk() does upset the timing, what i have done before is
allocate an array of u32 values. Write the interrupt status into it,
looping around when you get to the end of the array. And then use
debugfs_create_u32_array() to export the array in /sys/kernel/debugfs.
Trigger the problem and then look at the values.

> > Is this your dual device board? Do you have both devices on the same
> > SPI bus? Do they share interrupt lines?
> > 
> 
> It's on the dual device board, the macphys are using separate spi buses,
> one chip shares the bus with another spi device, but the other is the
> only tenant on the bus.
> 
> No device shares an irq line.

I was just wondering how your setup differs so you can trigger the
issue, but others have not been able to reproduce it. It might be
another clue as to what is going on. I don't think you need to do
anything with respect to this, its just information to keep in mind.

	Andrew
Parthiban Veerasooran May 14, 2024, 4:46 a.m. UTC | #76
Hi Ramon,

On 13/05/24 7:20 pm, Ramón Nordin Rodriguez wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Mon, May 13, 2024 at 03:00:48PM +0200, Andrew Lunn wrote:
>>> I've enabled some debugging options but so far nothing seems to hit.
>>> What I've been able to conclude is that there still is SPI
>>> communication, the macphy interrupt is still pulled low, and the cpu
>>> does the ack so that it's reset to inactive.
>>
>> Is it doing this in an endless cycle?
> 
> Exactly, so what I'm seeing is when the driver livelocks the macphy is
> periodically pulling the irq pin low, the driver clears the interrupt
> and repeat.
If I understand correctly, you are keep on getting interrupt without 
indicating anything in the footer?. Are you using LAN8650 Rev.B0 or B1?. 
If it is B0 then can you try with Rev.B1 once?

Best regards,
Parthiban V
> 
>>
>> Probably the debug tools are not showing anything because it is not
>> looping in just one location. It is a complex loop, interrupts
>> triggering a thread which runs to completion etc. So it looks like
>> normal behaviour.
> 
> Gotcha. The 'do work' func called in the worker threads loop does run
> and return, so I guess there is not much to trigger on.
> 
>>
>> If it is an endless cycle, it sounds like an interrupt storm. Some
>> interrupt bit is not getting cleared, so it immediately fires again as
>> soon as interrupts are enabled.
> 
> Good input. I'll add some instrumentation/stats for how many jiffies
> have elapsed between releases of the worker thread and for the irq
> handler. I can probably find a gpio to toggle as well if it's really
> tight timings.
> 
> The irq pin is inactive/high for 100s of us to ms in the measurments
> I've done. But I've been using multiple channels and not the fanciest
> equipment so samplerates might be playing tricks, I'll rerun some tests
> while only measuring the irq pin.
> 
>>
>> Is this your dual device board? Do you have both devices on the same
>> SPI bus? Do they share interrupt lines?
>>
> 
> It's on the dual device board, the macphys are using separate spi buses,
> one chip shares the bus with another spi device, but the other is the
> only tenant on the bus.
> 
> No device shares an irq line.
> 
> Pretty sure I can replicate the result for both devices, but need to
> double check, been to much testing of random things for me to keep track.
> 
> I'll do some more digging, I think we're getting pretty close to
> understading the behaviour now.
> 
> R
Ramón Nordin Rodriguez May 15, 2024, 9:45 p.m. UTC | #77
> > Good input. I'll add some instrumentation/stats for how many jiffies
> > have elapsed between releases of the worker thread and for the irq
> > handler. I can probably find a gpio to toggle as well if it's really
> > tight timings.
> 
> What might be more interesting is the interrupt status registers. Is
> there a bit always set which the driver is not clearly correctly?
> 

Ah great point, I'll dump the irq status registers when things go awry.

> You can try printing the values. But that might upset the timing so
> you cannot reproduce the issue.
> 
> If the printk() does upset the timing, what i have done before is
> allocate an array of u32 values. Write the interrupt status into it,
> looping around when you get to the end of the array. And then use
> debugfs_create_u32_array() to export the array in /sys/kernel/debugfs.
> Trigger the problem and then look at the values.

Good tip, sounds like the exact strategy I need. Appreciate the hands on
suggestions!

> 
> > > Is this your dual device board? Do you have both devices on the same
> > > SPI bus? Do they share interrupt lines?
> > > 
> > 
> > It's on the dual device board, the macphys are using separate spi buses,
> > one chip shares the bus with another spi device, but the other is the
> > only tenant on the bus.
> > 
> > No device shares an irq line.
> 
> I was just wondering how your setup differs so you can trigger the
> issue, but others have not been able to reproduce it. It might be
> another clue as to what is going on. I don't think you need to do
> anything with respect to this, its just information to keep in mind.
> 

My typical setup is weird for sure, I skip the initramfs and load a
kernel and dtb into ram (which means I don't have DMA, due to missing fw),
but I'm in the habit of occasionally flashing and running a 'normal system'
every once in a while for verifcation.

I can't think of anything on top of my head that would set my system in
a unique position. But since I don't get the failure when I have a UDP
multicast running I'm guessing different networking daemons might affect
things, in this case we're running systemd-networkd and I manually set
an ipv4 address on the interfaces.

R
Ramón Nordin Rodriguez May 15, 2024, 9:48 p.m. UTC | #78
On Tue, May 14, 2024 at 04:46:58AM +0000, Parthiban.Veerasooran@microchip.com wrote:
> >> Is it doing this in an endless cycle?
> > 
> > Exactly, so what I'm seeing is when the driver livelocks the macphy is
> > periodically pulling the irq pin low, the driver clears the interrupt
> > and repeat.
> If I understand correctly, you are keep on getting interrupt without 
> indicating anything in the footer?. Are you using LAN8650 Rev.B0 or B1?. 
> If it is B0 then can you try with Rev.B1 once?
> 

I'll check the footer content, thanks for the tip!

All testing has bee done with Rev.B0, we've located a set of B1 chips.
So we'll get on resoldering and rerunning the test scenario.

R
Parthiban Veerasooran May 17, 2024, 9:38 a.m. UTC | #79
Hi Ramon,

On 16/05/24 3:18 am, Ramón Nordin Rodriguez wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Tue, May 14, 2024 at 04:46:58AM +0000, Parthiban.Veerasooran@microchip.com wrote:
>>>> Is it doing this in an endless cycle?
>>>
>>> Exactly, so what I'm seeing is when the driver livelocks the macphy is
>>> periodically pulling the irq pin low, the driver clears the interrupt
>>> and repeat.
>> If I understand correctly, you are keep on getting interrupt without
>> indicating anything in the footer?. Are you using LAN8650 Rev.B0 or B1?.
>> If it is B0 then can you try with Rev.B1 once?
>>
> 
> I'll check the footer content, thanks for the tip!
> 
> All testing has bee done with Rev.B0, we've located a set of B1 chips.
> So we'll get on resoldering and rerunning the test scenario.
Thanks for the consideration. But be informed that the internal PHY 
initial settings are updated for the Rev.B1. But the one from the 
mainline still supports for Rev.B0. So that microchip_t1s.c to be 
updated to support Rev.B1.

Also I am in talk with our design team that whether the updated initial 
settings for B1 are also applicable for B0. If so, then we will have 
only one updated initial setting which supports both B0 and B1.

Do you have any plan to update the microchip_t1s.c for Rev.B1 support OR 
do you want me to do it? If you want me to do it then I will prepare a 
separate patch series for the support?

Best regards,
Parthiban V
> 
> R
>
Ramón Nordin Rodriguez May 17, 2024, 12:43 p.m. UTC | #80
> Thanks for the consideration. But be informed that the internal PHY 
> initial settings are updated for the Rev.B1. But the one from the 
> mainline still supports for Rev.B0. So that microchip_t1s.c to be 
> updated to support Rev.B1.

So I figured, thanks for making it clear.

> 
> Also I am in talk with our design team that whether the updated initial 
> settings for B1 are also applicable for B0. If so, then we will have 
> only one updated initial setting which supports both B0 and B1.

Sounds ideal if one init procedure can cover both chip revs.

> 
> Do you have any plan to update the microchip_t1s.c for Rev.B1 support OR 
> do you want me to do it? If you want me to do it then I will prepare a 
> separate patch series for the support?

I'm keen on taking a Rev.B1 chip for a spin, so I'll jump on doing doing
the init/fixup. I can probably post it to the mailing list during the
weekend.
If you're planning on being faster than that I'll wait for your
submission.

Out of curiosity do you have any insight into if the rev b1 is expected
to behave differently?

R
Ramón Nordin Rodriguez May 24, 2024, 6:12 p.m. UTC | #81
> >>>> Is it doing this in an endless cycle?
> >>>
> >>> Exactly, so what I'm seeing is when the driver livelocks the macphy is
> >>> periodically pulling the irq pin low, the driver clears the interrupt
> >>> and repeat.
> >> If I understand correctly, you are keep on getting interrupt without
> >> indicating anything in the footer?. Are you using LAN8650 Rev.B0 or B1?.
> >> If it is B0 then can you try with Rev.B1 once?
> >>

After a considerable ammount of headscratching it seems that disabling collision
detection on the macphy is the only way of getting it stable.
When PLCA is enabled it's expected that CD causes problems, when running
in CSMA/CD mode it was unexpected (for me at least).

Disabling collision detection was discussed here 
https://lore.kernel.org/netdev/20231127104045.96722-1-ramon.nordin.rodriguez@ferroamp.se/
in a patchset that I haven't gotten around to testing through properly
and fixing up, but now it's definetly a priority.

Rev.b0 and b1 gives similar results in this domain, though I'm getting
lower throughput and it's easier/faster to get the internal error state
on rev.b1.

When CD is disabled both chip revs seems stable in all of my testing.

> > 
> > I'll check the footer content, thanks for the tip!
> > 
> > All testing has bee done with Rev.B0, we've located a set of B1 chips.
> > So we'll get on resoldering and rerunning the test scenario.
> Thanks for the consideration. But be informed that the internal PHY 
> initial settings are updated for the Rev.B1. But the one from the 
> mainline still supports for Rev.B0. So that microchip_t1s.c to be 
> updated to support Rev.B1.

I posted a suggestion for how to bringup rev.b1
https://lore.kernel.org/netdev/20240524140706.359537-1-ramon.nordin.rodriguez@ferroamp.se/

I should have prefaced the cover letter with 'ugly hacks ahead'.

> 
> Also I am in talk with our design team that whether the updated initial 
> settings for B1 are also applicable for B0. If so, then we will have 
> only one updated initial setting which supports both B0 and B1.

Any update on this?

I will submit a new revision of the lan8670 revc + disable collision
detection pathset where CD is disabled regardless of operating mode.

R
Andrew Lunn May 24, 2024, 6:31 p.m. UTC | #82
> After a considerable ammount of headscratching it seems that disabling collision
> detection on the macphy is the only way of getting it stable.
> When PLCA is enabled it's expected that CD causes problems, when running
> in CSMA/CD mode it was unexpected (for me at least).

Now we are back to, why is your system different? What is triggering a
collision for you, but not Parthiban?

There is nothing in the standard about reporting a collision. So this
is a Microchip extension? So the framework is not doing anything when
it happens, which will explain why it becomes a storm.... Until we do
have a mechanism to handle vendor specific interrupts, the frame work
should disable them all, to avoid this storm.

Does the datasheet document what to do on a collision? How are you
supposed to clear the condition?

       Andrew
Piergiorgio Beruto May 24, 2024, 6:49 p.m. UTC | #83
Hi all,
Just my two cents here...

Collision detection is a fundamental building block of the CSMA/CD mechanism.
The PHY detects physical collisions and reports them to the MAC via the COL pin.
The MAC is supposed to perform the normal back-off operation: stop the current transmission and re-transmit at a later time following the exponentially increasing random algorithm (See IEEE 802.3 Clause 4).
** Therefore, collision detect shall NOT be brought up to the user. In a MACPHY system it is supposed to be handled entirely by the PHY. **

With the introduction of PLCA, the PHY may report also "logical collisions" to the MAC.  These are not real collisions as they don't happen on the line. They are part of the normal PLCA back-pressuring mechanism that allows the PHY to send a frame only during a specific transmit opportunity. So once more, this kind of collision shall not be reported to the user, it is just normal behavior.

However, physical collisions (i.e., collisions that really happens on the line) are NOT supposed to happen when PLCA is enabled and configured correctly.
For this reason we have a standard IEEE register (PCS Diagnostic 2) which still captures physical collisions (not logical ones). This register is supposed to be used as a diagnostic information to let the user know that something is misconfigured.
Now, some PHYs can be configured to provide IRQs when this kind of collision happens, indicating a configuration problem.

Additionally, many PHYs allows the user to completely switch off the PHY physical collision detection when PLCA is enabled. This is also recommended by the OPEN Alliance specifications, although no standard register was defined to achieve that (maybe we should add it to the standard...).

I'm not sure I followed all this discussion, I just hope this clarification might help in finding a good solution.
In short:

- Collisions are NOT to be reported to Linux. The MAC/PHY shall handle those internally.
- Physical collisions are still counted into a diagnostic register (could be good to add it to ethtool MAC statistics)
- Disabling collision detection is allowed only when PLCA is enabled, but it is not a standard feature, although automotive specs recommends it.

Thanks,
Piergiorgio

-----Original Message-----
From: Andrew Lunn <andrew@lunn.ch> 
Sent: 24 May, 2024 20:32
To: Ramón Nordin Rodriguez <ramon.nordin.rodriguez@ferroamp.se>
Cc: Parthiban.Veerasooran@microchip.com; Piergiorgio Beruto <Pier.Beruto@onsemi.com>; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; horms@kernel.org; saeedm@nvidia.com; anthony.l.nguyen@intel.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; corbet@lwn.net; linux-doc@vger.kernel.org; robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org; conor+dt@kernel.org; devicetree@vger.kernel.org; Horatiu.Vultur@microchip.com; ruanjinjie@huawei.com; Steen.Hegelund@microchip.com; vladimir.oltean@nxp.com; UNGLinuxDriver@microchip.com; Thorsten.Kummermehr@microchip.com; Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>; Nicolas.Ferre@microchip.com; benjamin.bigler@bernformulastudent.ch
Subject: Re: [PATCH net-next v4 05/12] net: ethernet: oa_tc6: implement error interrupts unmasking

[External Email]: This email arrived from an external source - Please exercise caution when opening any attachments or clicking on links.

> After a considerable ammount of headscratching it seems that disabling 
> collision detection on the macphy is the only way of getting it stable.
> When PLCA is enabled it's expected that CD causes problems, when 
> running in CSMA/CD mode it was unexpected (for me at least).

Now we are back to, why is your system different? What is triggering a collision for you, but not Parthiban?

There is nothing in the standard about reporting a collision. So this is a Microchip extension? So the framework is not doing anything when it happens, which will explain why it becomes a storm.... Until we do have a mechanism to handle vendor specific interrupts, the frame work should disable them all, to avoid this storm.

Does the datasheet document what to do on a collision? How are you supposed to clear the condition?

       Andrew
Selvamani Rajagopal May 24, 2024, 8:52 p.m. UTC | #84
Andrew/Parthiban,

Sorry I couldn't get back earlier on this subject as I just had to time review the organization of the code with respect to our device. 
I have a proposal for the way MDIO APIs are implemented in common code (oa_tc6.c).

At present, mii_bus structure is in common code without any visibility to vendor specific code. Though this is a good design to get consistent
behavior between different vendors, we can't customize the functions stored in  read, write, read_c45, and write_c45 function pointers.

In our MDIO functions, we do certain things based on PHY ID, also our driver deal with vendor specific register, MMS 12 (refer Table 6 in section 9.1
Open Alliance MAC-PHY Serial interface specification document.)

I hope it is not uncommon to expect the ability to implement vendor specific mdio read/write. At present, there is no alternative
to functions like oa_tc6_mdiobus_read, oa_tc6_mdiobus_write, oa_tc6_mdiobus_read_c45, and oa_tc6_mdiobus_write_c45.

I am sure we can resolve this issue in many ways.  One way is to provide a public function to populate mii_bus pointer (tc6->mdiobus
in the code) from the vendor driver, or provide a way the pass when we call oa_tc6_init.

Vendors who don't require such customization can use existing functionality. Those who require customization, can have extra code
before calling this standard MDIO functions.

Sincerely
Selva

> -----Original Message-----
> From: Parthiban.Veerasooran@microchip.com
> <Parthiban.Veerasooran@microchip.com>
> Sent: Friday, May 10, 2024 4:22 AM
> To: andrew@lunn.ch
> Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; horms@kernel.org; saeedm@nvidia.com;
> anthony.l.nguyen@intel.com; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; corbet@lwn.net; linux-doc@vger.kernel.org;
> robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org;
> conor+dt@kernel.org; devicetree@vger.kernel.org;
> Horatiu.Vultur@microchip.com; ruanjinjie@huawei.com;
> Steen.Hegelund@microchip.com; vladimir.oltean@nxp.com;
> UNGLinuxDriver@microchip.com;
> Thorsten.Kummermehr@microchip.com; Piergiorgio Beruto
> <Pier.Beruto@onsemi.com>; Selvamani Rajagopal
> <Selvamani.Rajagopal@onsemi.com>; Nicolas.Ferre@microchip.com;
> benjamin.bigler@bernformulastudent.ch
> Subject: Re: [PATCH net-next v4 00/12] Add support for OPEN Alliance
> 10BASE-T1x MACPHY Serial Interface
> 
> [External Email]: This email arrived from an external source - Please
> exercise caution when opening any attachments or clicking on links.
> 
> Hi Andrew,
> 
> On 10/05/24 2:09 am, Andrew Lunn wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> >
> > On Thu, May 09, 2024 at 01:04:52PM +0000,
> Parthiban.Veerasooran@microchip.com wrote:
> >> Hi Andrew,
> >>
> >> On 08/05/24 10:34 pm, Andrew Lunn wrote:
> >>> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> >>>
> >>>> Yes. I tried this test. It works as expected.
> >>>
> >>>>      Each LAN8651 received approximately 3Mbps with lot of
> "Receive buffer
> >>>> overflow error". I think it is expected as the single SPI master has to
> >>>> serve both LAN8651 at the same time and both LAN8651 will be
> receiving
> >>>> 10Mbps on each.
> >>>
> >>> Thanks for testing this.
> >>>
> >>> This also shows the "Receive buffer overflow error" needs to go
> away.
> >>> Either we don't care at all, and should not enable the interrupt, or
> >>> we do care and should increment a counter.
> >> Thanks for your comments. I think, I would go for your 2nd proposal
> >> because having "Receive buffer overflow error" enabled will indicate
> the
> >> cause of the poor performance.
> >>
> >> Already we have,
> >> tc6->netdev->stats.rx_dropped++;
> >> to increment the rx dropped counter in case of receive buffer
> overflow.
> >>
> >> May be we can remove the print,
> >> net_err_ratelimited("%s: Receive buffer overflow error\n",
> >> tc6->netdev->name);
> >> as it might lead to additional poor performance by adding some
> delay.
> >>
> >> Could you please provide your opinion on this?
> >
> > This is your code. Ideally you should decide. I will only add review
> > comments if i think it is wrong. Any can decide between any correct
> > option.
> Sure, thanks for your advice. Let me stick with the above proposal until
> I get any others opinion.
> 
> Best regards,
> Parthiban V
> >
> >          Andrew
> >
Andrew Lunn May 24, 2024, 9:27 p.m. UTC | #85
> In our MDIO functions, we do certain things based on PHY ID, also
> our driver deal with vendor specific register, MMS 12 (refer Table 6
> in section 9.1

That is a bad design. Vendor specific PHY registers should be in MMS 4
which is MMD 31, where the PHY driver can access them. Table 6 says:
"PHY – Vendor Specific" for MMS 4, so clearly that is where the
standards committee expected PHY vendor registers to be.

Anyway, does the PHY driver actually need to access MMS 12? Or can the
MAC driver do it? That is the same question i asked Ramón about the
Microchip part. We really should avoid layering violations as much as
we can, and we should not have the framework make it easy to violate
layering. We want all such horrible hacks hidden in the MAC driver
which needs such horrible hacks because of bad design.

	Andrew
Piergiorgio Beruto May 24, 2024, 9:45 p.m. UTC | #86
Hi Andrew,
In reality, it is not the PHY having register in MMS12, and not even the MAC. These are really "chip-specific" registers, unrelated to networking (e.g., GPIOs, HW diagnostics, etc.).
They are in MMS12 exactly because they cannot be considered PHY functions, nor MAC functions.
And we have PHY specific registers in MMS4, just as you said.

Although, I think it is a good idea anyway to allow the MACPHY drivers to hook into / extend the MDIO access functions.
If anything, because of the hacks you mentioned. But also to allow vendor-specific extensions.

Makes sense?

Thanks,
Piergiorgio

-----Original Message-----
From: Andrew Lunn <andrew@lunn.ch> 
Sent: 24 May, 2024 23:27
To: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
Cc: Parthiban.Veerasooran@microchip.com; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; horms@kernel.org; saeedm@nvidia.com; anthony.l.nguyen@intel.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; corbet@lwn.net; linux-doc@vger.kernel.org; robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org; conor+dt@kernel.org; devicetree@vger.kernel.org; Horatiu.Vultur@microchip.com; ruanjinjie@huawei.com; Steen.Hegelund@microchip.com; vladimir.oltean@nxp.com; UNGLinuxDriver@microchip.com; Thorsten.Kummermehr@microchip.com; Piergiorgio Beruto <Pier.Beruto@onsemi.com>; Nicolas.Ferre@microchip.com; benjamin.bigler@bernformulastudent.ch
Subject: Re: [PATCH net-next v4 00/12] Add support for OPEN Alliance 10BASE-T1x MACPHY Serial Interface

[External Email]: This email arrived from an external source - Please exercise caution when opening any attachments or clicking on links.

> In our MDIO functions, we do certain things based on PHY ID, also our 
> driver deal with vendor specific register, MMS 12 (refer Table 6 in 
> section 9.1

That is a bad design. Vendor specific PHY registers should be in MMS 4 which is MMD 31, where the PHY driver can access them. Table 6 says:
"PHY – Vendor Specific" for MMS 4, so clearly that is where the standards committee expected PHY vendor registers to be.

Anyway, does the PHY driver actually need to access MMS 12? Or can the MAC driver do it? That is the same question i asked Ramón about the Microchip part. We really should avoid layering violations as much as we can, and we should not have the framework make it easy to violate layering. We want all such horrible hacks hidden in the MAC driver which needs such horrible hacks because of bad design.

	Andrew
Andrew Lunn May 24, 2024, 9:54 p.m. UTC | #87
> In reality, it is not the PHY having register in MMS12, and not even
> the MAC. These are really "chip-specific" registers, unrelated to
> networking (e.g., GPIOs, HW diagnostics, etc.).

Having a GPIO driver within the MAC driver is O.K. For hardware
diagnostics you should be using devlink, which many MAC drivers
have. So i don't see a need for the PHY driver to access MMS 12.

Anyway, we can do a real review when you post your code.

> Although, I think it is a good idea anyway to allow the MACPHY
> drivers to hook into / extend the MDIO access functions.  If
> anything, because of the hacks you mentioned. But also to allow
> vendor-specific extensions.

But we don't want vendor specific extensions. OS 101, the OS is there
to make all hardware look the same. And in general, it is not often
that vendors actually come up with anything unique. And if they do,
and it is useful, other vendors will copy it. So rather than doing
vendor specific extensions, you should be thinking about how to export
it in a way which is common across multiple vendors.

   Andrew
Piergiorgio Beruto May 24, 2024, 10:08 p.m. UTC | #88
> Having a GPIO driver within the MAC driver is O.K. For hardware diagnostics you should be using devlink, which many MAC drivers have. So i don't see a need for the PHY driver to access MMS 12.
What I was trying to say is that I actually agree the PHY driver does not need to access MMS12.
But the MAC driver might need to access MMS-es for vendor specific stuff. In our case, there is a model specific register we need to access during probe.

> But we don't want vendor specific extensions. OS 101, the OS is there to make all hardware look the same. And in general, it is not often that vendors actually come up with anything unique. And if they do, and it is useful, other vendors will copy it. So rather than doing vendor specific extensions, you should be thinking about how to export it in a way which is common across multiple vendors.

Fair enough, let's keep it for "hacks" then. Still, I think there are features that -initially- are kind of vendor specific, but in the long run they turn into standards or de-facto standards.
I assume we want to help this happening (step-wise), don't we?

For example, one big feature I think at some point we should understand how to deal with, is topology discovery for multi-drop.
Maybe you've heard about it already, but in short it is a feature that allows one PHY to measure the distance (or rather, the propagation delay) to another node on the same multi-drop segment.
Knowing the cable Tpd (~5ns/m), this allows you to get also the physical distance.

It is a cool feature that has been also standardized in OPEN alliance and many vendors are already implemented it.
As for other stuff, we put a lot of effort in convincing the committees to standardize the registers too, and fortunately we did it.

The "problem" with topology discovery is that it involves both physical layer functions (to activate the functions that do the actual measurement) but also "upper layer" protocols to communicate with the nodes and carry on the actual measurement.
For this "upper layer" stuff we did not define a standard.

In my view, we should probably create some PHY extensions in the kernel to activate the physical layer part, leaving the "protocol" to the userland.
May I ask your opinion?

Thanks,
Piergiorgio
Andrew Lunn May 25, 2024, 2:46 p.m. UTC | #89
On Fri, May 24, 2024 at 10:08:54PM +0000, Piergiorgio Beruto wrote:
> > Having a GPIO driver within the MAC driver is O.K. For hardware diagnostics you should be using devlink, which many MAC drivers have. So i don't see a need for the PHY driver to access MMS 12.

> But the MAC driver might need to access MMS-es for vendor specific
> stuff. In our case, there is a model specific register we need to
> access during probe.

Which is fine, and currently supported. Look at the Microchip driver,
it access some registers at startup. The framwork just provides the
core of moving packets around, and MDIO access. The rest is up to MAC
driver.

> Fair enough, let's keep it for "hacks" then. Still, I think there
> are features that -initially- are kind of vendor specific, but in
> the long run they turn into standards or de-facto standards.

> I assume we want to help this happening (step-wise), don't we?

Maybe, but maybe not. We have been developing MAC drivers for over 25
years. There are a number of mechanisms for exporting things to user
space. We have to see if your features fit one of them.

> For example, one big feature I think at some point we should
> understand how to deal with, is topology discovery for multi-drop.
> Maybe you've heard about it already, but in short it is a feature
> that allows one PHY to measure the distance (or rather, the
> propagation delay) to another node on the same multi-drop segment.

> Knowing the cable Tpd (~5ns/m), this allows you to get also the physical distance.

We already have something very similar to this. Cable testing results
which make use of Time Domain Reflectromitery. I would look at
re-using the API as much as possible.

> In my view, we should probably create some PHY extensions in the
> kernel to activate the physical layer part, leaving the "protocol"
> to the userland.  May I ask your opinion?

Please look at how cable testing works.

       Andrew
Parthiban Veerasooran May 27, 2024, 8:38 a.m. UTC | #90
Hi Ramon,

On 24/05/24 11:42 pm, Ramón Nordin Rodriguez wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>>>>>> Is it doing this in an endless cycle?
>>>>>
>>>>> Exactly, so what I'm seeing is when the driver livelocks the macphy is
>>>>> periodically pulling the irq pin low, the driver clears the interrupt
>>>>> and repeat.
>>>> If I understand correctly, you are keep on getting interrupt without
>>>> indicating anything in the footer?. Are you using LAN8650 Rev.B0 or B1?.
>>>> If it is B0 then can you try with Rev.B1 once?
>>>>
> 
> After a considerable ammount of headscratching it seems that disabling collision
> detection on the macphy is the only way of getting it stable.
> When PLCA is enabled it's expected that CD causes problems, when running
> in CSMA/CD mode it was unexpected (for me at least).
Thanks for the feedback.
> 
> Disabling collision detection was discussed here
> https://lore.kernel.org/netdev/20231127104045.96722-1-ramon.nordin.rodriguez@ferroamp.se/
As you started this thread long back, I thought that those patches are 
already in but now I understand that they are not. In all my testings I 
have my CD disable fix in my PHY driver.
> in a patchset that I haven't gotten around to testing through properly
> and fixing up, but now it's definetly a priority.
> 
> Rev.b0 and b1 gives similar results in this domain, though I'm getting
> lower throughput and it's easier/faster to get the internal error state
> on rev.b1.
> 
> When CD is disabled both chip revs seems stable in all of my testing.
If I understand correctly, disabling CD when PLCA enabled works as 
expected in both B0 and B1? correct me if I am wrong. If that is the 
case, then I would recommend to concentrate on the below patch set to 
get in the mainline first instead of focusing on B1 patches which can be 
done after that.
https://lore.kernel.org/netdev/20231127104045.96722-1-ramon.nordin.rodriguez@ferroamp.se/
> 
>>>
>>> I'll check the footer content, thanks for the tip!
>>>
>>> All testing has bee done with Rev.B0, we've located a set of B1 chips.
>>> So we'll get on resoldering and rerunning the test scenario.
>> Thanks for the consideration. But be informed that the internal PHY
>> initial settings are updated for the Rev.B1. But the one from the
>> mainline still supports for Rev.B0. So that microchip_t1s.c to be
>> updated to support Rev.B1.
> 
> I posted a suggestion for how to bringup rev.b1
> https://lore.kernel.org/netdev/20240524140706.359537-1-ramon.nordin.rodriguez@ferroamp.se/
> 
> I should have prefaced the cover letter with 'ugly hacks ahead'.
> 
>>
>> Also I am in talk with our design team that whether the updated initial
>> settings for B1 are also applicable for B0. If so, then we will have
>> only one updated initial setting which supports both B0 and B1.
> 
> Any update on this?
I think, I have answered for this in another mail.
> 
> I will submit a new revision of the lan8670 revc + disable collision
> detection pathset where CD is disabled regardless of operating mode.
Yes, I would recommend to do it.

Best regards,
Parthiban V
> 
> R
>
Parthiban Veerasooran May 27, 2024, 9:30 a.m. UTC | #91
Hi Andrew,

On 25/05/24 12:01 am, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> After a considerable ammount of headscratching it seems that disabling collision
>> detection on the macphy is the only way of getting it stable.
>> When PLCA is enabled it's expected that CD causes problems, when running
>> in CSMA/CD mode it was unexpected (for me at least).
> 
> Now we are back to, why is your system different? What is triggering a
> collision for you, but not Parthiban?
I am using PHY driver which has "Disable CD if PLCA is enabled" fix. 
Probably that could be the reason that why I am not running into these 
issues.
> 
> There is nothing in the standard about reporting a collision. So this
> is a Microchip extension? So the framework is not doing anything when
> it happens, which will explain why it becomes a storm.... Until we do
> have a mechanism to handle vendor specific interrupts, the frame work
> should disable them all, to avoid this storm.
"IEEE 10BASE-T1S Implementation Specification" from OPEN Alliance does 
specify this in the section "5.2 Collision Detection (CD) / Handling" 
for the Automotive environment support.

https://opensig.org/wp-content/uploads/2023/12/20230215_10BASE-T1S_system_implementation_V1_0.pdf

The automotive EMC immunity requirements exceeds the alien crosstalk 
noise levels defined in IEEE 802.3cgTM-2019 [1]. Therefore, in such 
environment the CD mechanism of the PHY may not be able to distinguish 
noise from collisions, limiting the achievable level of immunity.
> 
> Does the datasheet document what to do on a collision? How are you
> supposed to clear the condition?
"8.5 PLCA Collision Detection" section in the LAN8650/1 datasheet 
describes the importance of disabling collision detection in case of 
PLCA mode enabled.

When nodes in a mixing segment are properly configured for PLCA 
operation there will be no physical collisions. However, under certain 
conditions, including mixing segments with significant inherent noise 
due to reflections, and systems under high electromagnetic stress, false 
collisions may be detected. The false detection of late collisions will 
result in the transmitting node dropping the packet. As packets are 
typically received correctly in these conditions, it is recommended to 
disable collision detection at any time that PLCA is enabled and active. 
Collision detection is disabled by writing a zero to the Collision 
Detect Enable (CDEN) bit in the Collision Detector Control 0 (CDCTL0) 
register.

https://ww1.microchip.com/downloads/aemDocuments/documents/AIS/ProductDocuments/DataSheets/LAN8650-1-Data-Sheet-60001734.pdf

Hope this clarifies.

Best regards,
Parthiban V
> 
>         Andrew
Piergiorgio Beruto May 30, 2024, 9:43 a.m. UTC | #92
Hello Andrew,

I was reading back into the MACPHY specifications in OPEN Alliance, and it seems like MMS 10 to MMS 15 are actually allowed as vendor specific registers. See page 50.
The specifications further say that vendor specific registers of the PHY that would normally be in MMD30-31 (ie, excluding the PLCA registers and the other OPEN standard registers) would go into MMS10 to MMS15.

So I'm wondering, why is it bad to have vendor specific registers into MMD10 to MMD15?
I think the framework should allow non-standard stuff to be mapped into these, no?

Thanks,
Piergiorgio

-----Original Message-----
From: Andrew Lunn <andrew@lunn.ch> 
Sent: 24 May, 2024 23:55
To: Piergiorgio Beruto <Pier.Beruto@onsemi.com>
Cc: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>; Parthiban.Veerasooran@microchip.com; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; horms@kernel.org; saeedm@nvidia.com; anthony.l.nguyen@intel.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; corbet@lwn.net; linux-doc@vger.kernel.org; robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org; conor+dt@kernel.org; devicetree@vger.kernel.org; Horatiu.Vultur@microchip.com; ruanjinjie@huawei.com; Steen.Hegelund@microchip.com; vladimir.oltean@nxp.com; UNGLinuxDriver@microchip.com; Thorsten.Kummermehr@microchip.com; Nicolas.Ferre@microchip.com; benjamin.bigler@bernformulastudent.ch
Subject: Re: [PATCH net-next v4 00/12] Add support for OPEN Alliance 10BASE-T1x MACPHY Serial Interface

[External Email]: This email arrived from an external source - Please exercise caution when opening any attachments or clicking on links.

> In reality, it is not the PHY having register in MMS12, and not even 
> the MAC. These are really "chip-specific" registers, unrelated to 
> networking (e.g., GPIOs, HW diagnostics, etc.).

Having a GPIO driver within the MAC driver is O.K. For hardware diagnostics you should be using devlink, which many MAC drivers have. So i don't see a need for the PHY driver to access MMS 12.

Anyway, we can do a real review when you post your code.

> Although, I think it is a good idea anyway to allow the MACPHY drivers 
> to hook into / extend the MDIO access functions.  If anything, because 
> of the hacks you mentioned. But also to allow vendor-specific 
> extensions.

But we don't want vendor specific extensions. OS 101, the OS is there to make all hardware look the same. And in general, it is not often that vendors actually come up with anything unique. And if they do, and it is useful, other vendors will copy it. So rather than doing vendor specific extensions, you should be thinking about how to export it in a way which is common across multiple vendors.

   Andrew
Andrew Lunn May 30, 2024, 1:06 p.m. UTC | #93
On Thu, May 30, 2024 at 09:43:56AM +0000, Piergiorgio Beruto wrote:
> Hello Andrew,
> 
> I was reading back into the MACPHY specifications in OPEN Alliance, and it seems like MMS 10 to MMS 15 are actually allowed as vendor specific registers. See page 50.
> The specifications further say that vendor specific registers of the PHY that would normally be in MMD30-31 (ie, excluding the PLCA registers and the other OPEN standard registers) would go into MMS10 to MMS15.
> 
> So I'm wondering, why is it bad to have vendor specific registers into MMD10 to MMD15?
> I think the framework should allow non-standard stuff to be mapped into these, no?

From an architecture perspicuity, PHY vendor specific registers should
be in the PHY register address space. MAC vendor specific registers
should be in the MAC register address space.

It seems like the Microchip device has some PHY vendor specific
registers in the MAC address space. That is bad.

Both your and Microchip device is a single piece of silicon. But i
doubt there is anything in the standard which actually requires
this. The PHY could be discrete, on the end of an MDIO bus and an MII
bus. That is the typical design for the last 30 years, and what linux
is built around. The MAC should not assume anything about the PHY, the
PHY should not assume anything about the MAC, because they are
interchangeable.

The framework does allow you to poke any register anywhere. But i
would strongly avoid breaking the layering, it is going to cause you
long term maintenance problems, and is ugly.

	Andrew
Piergiorgio Beruto May 30, 2024, 9:37 p.m. UTC | #94
> From an architecture perspicuity, PHY vendor specific registers should be in the PHY register address space. MAC vendor specific registers should be in the MAC register address space.
I agree 100%. The registers I'm talking about are not PHY or MAC specific. They are related to other functions. For example, configuring pins to output a clock, an SFD indication, a LED, or other.
Some devices also can configure events driven by the PTP timer to toggle GPIOs, capture the timer on rising/falling edge of a GPIO or similar.

> It seems like the Microchip device has some PHY vendor specific registers in the MAC address space. That is bad.
I was not aware of that. I can tell you this is not my case.

> Both your and Microchip device is a single piece of silicon. But i doubt there is anything in the standard which actually requires this. The PHY could be discrete, on the end of an MDIO bus and an MII bus. That is the typical design for the last 30 years, and what linux is built around. The MAC should not assume anything about the PHY, the PHY should not assume anything about the MAC, because they are interchangeable.
Yes, to me the MAC and PHY must be separate entities even for the OPEN Alliance TC14/6 protocol. Besides, you can stuff whatever PHY within your MACPHY chip (e.g. 10BASE-T1L).
In "my" driver you need to specify in the DTS the kind of net device and the PHY as a handle.

> The framework does allow you to poke any register anywhere. But i would strongly avoid breaking the layering, it is going to cause you long term maintenance problems, and is ugly.
On that we agree. Maybe I was just misunderstanding the earlier conversation where I thought you would not allow specific drivers to access MMS other than 0,1,4 and the ones that map to MMDs.
If this is not the case, then I think I'm good.

Thanks!
--Pier
Andrew Lunn May 31, 2024, 12:33 a.m. UTC | #95
> On that we agree. Maybe I was just misunderstanding the earlier
> conversation where I thought you would not allow specific drivers to
> access MMS other than 0,1,4 and the ones that map to MMDs.

I would be disappointed to see a driver access registers which is not
in its address space. The LED driver can be in the MAC driver, or the
PHY driver, depending on where its registers are. The PTP driver can
be in the MAC driver, or the PHY driver, depending on where its
registers are. Hopefully you don't have anything which is split over
both addresses spaces.

     Andrew
Parthiban Veerasooran May 31, 2024, 12:13 p.m. UTC | #96
Hi All,

First of all, I thank all of you for the comments and response. In my 
opinion, the framework what we have in this patch series will support 
all the necessary features to enable basic 10Base-T1S Ethernet 
communication and also we tested this with Microchip LAN8650/1. If it is 
not supporting for other vendor's devices, then please let me know we 
can add necessary changes to support them. The basic idea with this 
patch series is to baseline an initial version which basically supports 
10Base-T1S Ethernet communication.

I agree we may have some more further features to be implemented in the 
framework but they can be done later once we have a basic version 
mainlined. We can't have all things together in the 1st version of the 
patch series which will create unnecessary deviations from our focus.

So I would request all of you to give your comments on the existing 
implementation in the patch series to improve better. Once this version 
is mainlined we will discuss further to implement further features 
supported. I feel the current discussion doesn't have any impact on the 
existing implementation which supports basic 10Base-T1S Ethernet 
communication.

Thanks for your understanding. Please let me know if you have any 
opinion on this.

Best regards,
Parthiban V

On 30/05/24 3:13 pm, Piergiorgio Beruto wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Hello Andrew,
> 
> I was reading back into the MACPHY specifications in OPEN Alliance, and it seems like MMS 10 to MMS 15 are actually allowed as vendor specific registers. See page 50.
> The specifications further say that vendor specific registers of the PHY that would normally be in MMD30-31 (ie, excluding the PLCA registers and the other OPEN standard registers) would go into MMS10 to MMS15.
> 
> So I'm wondering, why is it bad to have vendor specific registers into MMD10 to MMD15?
> I think the framework should allow non-standard stuff to be mapped into these, no?
> 
> Thanks,
> Piergiorgio
> 
> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: 24 May, 2024 23:55
> To: Piergiorgio Beruto <Pier.Beruto@onsemi.com>
> Cc: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>; Parthiban.Veerasooran@microchip.com; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; horms@kernel.org; saeedm@nvidia.com; anthony.l.nguyen@intel.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; corbet@lwn.net; linux-doc@vger.kernel.org; robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org; conor+dt@kernel.org; devicetree@vger.kernel.org; Horatiu.Vultur@microchip.com; ruanjinjie@huawei.com; Steen.Hegelund@microchip.com; vladimir.oltean@nxp.com; UNGLinuxDriver@microchip.com; Thorsten.Kummermehr@microchip.com; Nicolas.Ferre@microchip.com; benjamin.bigler@bernformulastudent.ch
> Subject: Re: [PATCH net-next v4 00/12] Add support for OPEN Alliance 10BASE-T1x MACPHY Serial Interface
> 
> [External Email]: This email arrived from an external source - Please exercise caution when opening any attachments or clicking on links.
> 
>> In reality, it is not the PHY having register in MMS12, and not even
>> the MAC. These are really "chip-specific" registers, unrelated to
>> networking (e.g., GPIOs, HW diagnostics, etc.).
> 
> Having a GPIO driver within the MAC driver is O.K. For hardware diagnostics you should be using devlink, which many MAC drivers have. So i don't see a need for the PHY driver to access MMS 12.
> 
> Anyway, we can do a real review when you post your code.
> 
>> Although, I think it is a good idea anyway to allow the MACPHY drivers
>> to hook into / extend the MDIO access functions.  If anything, because
>> of the hacks you mentioned. But also to allow vendor-specific
>> extensions.
> 
> But we don't want vendor specific extensions. OS 101, the OS is there to make all hardware look the same. And in general, it is not often that vendors actually come up with anything unique. And if they do, and it is useful, other vendors will copy it. So rather than doing vendor specific extensions, you should be thinking about how to export it in a way which is common across multiple vendors.
> 
>     Andrew
Andrew Lunn May 31, 2024, 12:37 p.m. UTC | #97
> So I would request all of you to give your comments on the existing 
> implementation in the patch series to improve better. Once this version 
> is mainlined we will discuss further to implement further features 
> supported. I feel the current discussion doesn't have any impact on the 
> existing implementation which supports basic 10Base-T1S Ethernet 
> communication.

Agreed. Lets focus on what we have now.

https://patchwork.kernel.org/project/netdevbpf/patch/20240418125648.372526-2-Parthiban.Veerasooran@microchip.com/

Version 4 failed to apply. So we are missing all the CI tests. We need
a v5 which cleanly applies to net-next in order for those tests to
run.

I think we should disable vendor interrupts by default, since we
currently have no way to handle them.

I had a quick look at the comments on the patches. I don't think we
have any other big issues not agreed on. So please post a v5 with them
all addressed and we will see what the CI says.

Piergiorgio, if you have any real problems getting basic support for
your device working with this framework, now would be a good time to
raise the problems.

	Andrew
Piergiorgio Beruto May 31, 2024, 3:13 p.m. UTC | #98
Hi Andrew,
We're currently working on re-factoring our driver onto the framework.
I will make sure we can give you a feedback ASAP.

We're also trying to asses the performance difference between what we have now and what we can achieve after re-factorng.

Thanks,
Piergiorgio

-----Original Message-----
From: Andrew Lunn <andrew@lunn.ch>
Sent: 31 May, 2024 14:37
To: Parthiban.Veerasooran@microchip.com
Cc: Piergiorgio Beruto <Pier.Beruto@onsemi.com>; Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; horms@kernel.org; saeedm@nvidia.com; anthony.l.nguyen@intel.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; corbet@lwn.net; linux-doc@vger.kernel.org; robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org; conor+dt@kernel.org; devicetree@vger.kernel.org; Horatiu.Vultur@microchip.com; ruanjinjie@huawei.com; Steen.Hegelund@microchip.com; vladimir.oltean@nxp.com; UNGLinuxDriver@microchip.com; Thorsten.Kummermehr@microchip.com; Nicolas.Ferre@microchip.com; benjamin.bigler@bernformulastudent.ch; Viliam Vozar <Viliam.Vozar@onsemi.com>; Arndt Schuebel <Arndt.Schuebel@onsemi.com>
Subject: Re: [PATCH net-next v4 00/12] Add support for OPEN Alliance 10BASE-T1x MACPHY Serial Interface

[External Email]: This email arrived from an external source - Please exercise caution when opening any attachments or clicking on links.

> So I would request all of you to give your comments on the existing
> implementation in the patch series to improve better. Once this
> version is mainlined we will discuss further to implement further
> features supported. I feel the current discussion doesn't have any
> impact on the existing implementation which supports basic 10Base-T1S
> Ethernet communication.

Agreed. Lets focus on what we have now.

https://urldefense.com/v3/__https://patchwork.kernel.org/project/netdevbpf/patch/20240418125648.372526-2-Parthiban.Veerasooran@microchip.com/__;!!KkVubWw!n9QOIA72sKA9z72UFogHeBRnA8Hse9gmIqzNv27f7Tc-4dYH1KA__DfMSmln-uBotO-bnw3PC2qXbfRn$

Version 4 failed to apply. So we are missing all the CI tests. We need a v5 which cleanly applies to net-next in order for those tests to run.

I think we should disable vendor interrupts by default, since we currently have no way to handle them.

I had a quick look at the comments on the patches. I don't think we have any other big issues not agreed on. So please post a v5 with them all addressed and we will see what the CI says.

Piergiorgio, if you have any real problems getting basic support for your device working with this framework, now would be a good time to raise the problems.

        Andrew
Parthiban Veerasooran June 3, 2024, 6:55 a.m. UTC | #99
Hi Andrew,

On 31/05/24 6:07 pm, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> So I would request all of you to give your comments on the existing
>> implementation in the patch series to improve better. Once this version
>> is mainlined we will discuss further to implement further features
>> supported. I feel the current discussion doesn't have any impact on the
>> existing implementation which supports basic 10Base-T1S Ethernet
>> communication.
> 
> Agreed. Lets focus on what we have now.
Great. Thanks for your opinion on this.
> 
> https://patchwork.kernel.org/project/netdevbpf/patch/20240418125648.372526-2-Parthiban.Veerasooran@microchip.com/
> 
> Version 4 failed to apply. So we are missing all the CI tests. We need
> a v5 which cleanly applies to net-next in order for those tests to
> run.
Sure I will start preparing the v5 now.
> 
> I think we should disable vendor interrupts by default, since we
> currently have no way to handle them.
OK, I will remove the patch which enables the interrupts.
> 
> I had a quick look at the comments on the patches. I don't think we
> have any other big issues not agreed on. So please post a v5 with them
> all addressed and we will see what the CI says.
Sure, thanks for the confirmation.

Best regards,
Parthiban V
> 
> Piergiorgio, if you have any real problems getting basic support for
> your device working with this framework, now would be a good time to
> raise the problems.
> 
>          Andrew
Parthiban Veerasooran June 3, 2024, 6:55 a.m. UTC | #100
Hi Piergiorgio,

On 31/05/24 8:43 pm, Piergiorgio Beruto wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Hi Andrew,
> We're currently working on re-factoring our driver onto the framework.
> I will make sure we can give you a feedback ASAP.
> 
> We're also trying to asses the performance difference between what we have now and what we can achieve after re-factorng.
That's cool. As Andrew commented in the other email, let me know the 
feedback if you have any to be addressed in the v5 patch series to 
support the basic communication.

Best regards,
Parthiban V
> 
> Thanks,
> Piergiorgio
> 
> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: 31 May, 2024 14:37
> To: Parthiban.Veerasooran@microchip.com
> Cc: Piergiorgio Beruto <Pier.Beruto@onsemi.com>; Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; horms@kernel.org; saeedm@nvidia.com; anthony.l.nguyen@intel.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; corbet@lwn.net; linux-doc@vger.kernel.org; robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org; conor+dt@kernel.org; devicetree@vger.kernel.org; Horatiu.Vultur@microchip.com; ruanjinjie@huawei.com; Steen.Hegelund@microchip.com; vladimir.oltean@nxp.com; UNGLinuxDriver@microchip.com; Thorsten.Kummermehr@microchip.com; Nicolas.Ferre@microchip.com; benjamin.bigler@bernformulastudent.ch; Viliam Vozar <Viliam.Vozar@onsemi.com>; Arndt Schuebel <Arndt.Schuebel@onsemi.com>
> Subject: Re: [PATCH net-next v4 00/12] Add support for OPEN Alliance 10BASE-T1x MACPHY Serial Interface
> 
> [External Email]: This email arrived from an external source - Please exercise caution when opening any attachments or clicking on links.
> 
>> So I would request all of you to give your comments on the existing
>> implementation in the patch series to improve better. Once this
>> version is mainlined we will discuss further to implement further
>> features supported. I feel the current discussion doesn't have any
>> impact on the existing implementation which supports basic 10Base-T1S
>> Ethernet communication.
> 
> Agreed. Lets focus on what we have now.
> 
> https://urldefense.com/v3/__https://patchwork.kernel.org/project/netdevbpf/patch/20240418125648.372526-2-Parthiban.Veerasooran@microchip.com/__;!!KkVubWw!n9QOIA72sKA9z72UFogHeBRnA8Hse9gmIqzNv27f7Tc-4dYH1KA__DfMSmln-uBotO-bnw3PC2qXbfRn$
> 
> Version 4 failed to apply. So we are missing all the CI tests. We need a v5 which cleanly applies to net-next in order for those tests to run.
> 
> I think we should disable vendor interrupts by default, since we currently have no way to handle them.
> 
> I had a quick look at the comments on the patches. I don't think we have any other big issues not agreed on. So please post a v5 with them all addressed and we will see what the CI says.
> 
> Piergiorgio, if you have any real problems getting basic support for your device working with this framework, now would be a good time to raise the problems.
> 
>          Andrew
Selvamani Rajagopal June 5, 2024, 9:40 p.m. UTC | #101
Parthiban/Andrew,

Couple of requests / suggestions after completing the integration of our drivers to the current framework.

1) Can we move memory map selector definitions (OA_TC6_PHY_C45_PCS_MMS2 and other 4 definitions) to the header file
     include/linux/oa_tc6.h?
     Also, if possible, could we add the MMS0, MMS1?. Our driver is using them. Of course, we could add it when we submit our driver.

2) If it not too late to ask, Is it possible to move interrupt handler to vendor's code? ( oa_tc6_macphy_isr ). Instead, we could provide a function
    oa_tc6_wakeup(priv->oa_tc6) in oa_tc6.c, so that vendors' ISR code can use them to notify the  work queue tc6->spi_wq.
    This way, it will provide vendors' code an ability to deal with some of the interrupts. For example, our code deals with PHYINT bit. In the current
    framework, interrupts are handled in the common framework (oa_tc6.c) and status bits are cleared there.

    I believe that change would require vendors to call "request_irq", which should be OK.

Sincerely
Selva

> -----Original Message-----
> From: Parthiban.Veerasooran@microchip.com
> <Parthiban.Veerasooran@microchip.com>
> Sent: Sunday, June 2, 2024 11:56 PM
> To: Piergiorgio Beruto <Pier.Beruto@onsemi.com>; andrew@lunn.ch
> Cc: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; horms@kernel.org; saeedm@nvidia.com;
> anthony.l.nguyen@intel.com; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; corbet@lwn.net; linux-doc@vger.kernel.org;
> robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org;
> conor+dt@kernel.org; devicetree@vger.kernel.org;
> Horatiu.Vultur@microchip.com; ruanjinjie@huawei.com;
> Steen.Hegelund@microchip.com; vladimir.oltean@nxp.com;
> UNGLinuxDriver@microchip.com;
> Thorsten.Kummermehr@microchip.com;
> Nicolas.Ferre@microchip.com;
> benjamin.bigler@bernformulastudent.ch; Viliam Vozar
> <Viliam.Vozar@onsemi.com>; Arndt Schuebel
> <Arndt.Schuebel@onsemi.com>
> Subject: Re: [PATCH net-next v4 00/12] Add support for OPEN Alliance
> 10BASE-T1x MACPHY Serial Interface
> 
> [External Email]: This email arrived from an external source - Please
> exercise caution when opening any attachments or clicking on links.
> 
> Hi Piergiorgio,
> 
> On 31/05/24 8:43 pm, Piergiorgio Beruto wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> >
> > Hi Andrew,
> > We're currently working on re-factoring our driver onto the
> framework.
> > I will make sure we can give you a feedback ASAP.
> >
> > We're also trying to asses the performance difference between what
> we have now and what we can achieve after re-factorng.
> That's cool. As Andrew commented in the other email, let me know the
> feedback if you have any to be addressed in the v5 patch series to
> support the basic communication.
> 
> Best regards,
> Parthiban V
> >
> > Thanks,
> > Piergiorgio
> >
> > -----Original Message-----
> > From: Andrew Lunn <andrew@lunn.ch>
> > Sent: 31 May, 2024 14:37
> > To: Parthiban.Veerasooran@microchip.com
> > Cc: Piergiorgio Beruto <Pier.Beruto@onsemi.com>; Selvamani
> Rajagopal <Selvamani.Rajagopal@onsemi.com>;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; horms@kernel.org; saeedm@nvidia.com;
> anthony.l.nguyen@intel.com; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; corbet@lwn.net; linux-doc@vger.kernel.org;
> robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org;
> conor+dt@kernel.org; devicetree@vger.kernel.org;
> Horatiu.Vultur@microchip.com; ruanjinjie@huawei.com;
> Steen.Hegelund@microchip.com; vladimir.oltean@nxp.com;
> UNGLinuxDriver@microchip.com;
> Thorsten.Kummermehr@microchip.com;
> Nicolas.Ferre@microchip.com;
> benjamin.bigler@bernformulastudent.ch; Viliam Vozar
> <Viliam.Vozar@onsemi.com>; Arndt Schuebel
> <Arndt.Schuebel@onsemi.com>
> > Subject: Re: [PATCH net-next v4 00/12] Add support for OPEN
> Alliance 10BASE-T1x MACPHY Serial Interface
> >
> > [External Email]: This email arrived from an external source - Please
> exercise caution when opening any attachments or clicking on links.
> >
> >> So I would request all of you to give your comments on the existing
> >> implementation in the patch series to improve better. Once this
> >> version is mainlined we will discuss further to implement further
> >> features supported. I feel the current discussion doesn't have any
> >> impact on the existing implementation which supports basic 10Base-
> T1S
> >> Ethernet communication.
> >
> > Agreed. Lets focus on what we have now.
> >
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%
> 2Furldefense.com%2Fv3%2F__https%3A%2F%2Fpatchwork.kernel.org
> %2Fproject%2Fnetdevbpf%2Fpatch%2F20240418125648.372526-2-
> Parthiban.Veerasooran%40microchip.com%2F__%3B!!KkVubWw!n9Q
> OIA72sKA9z72UFogHeBRnA8Hse9gmIqzNv27f7Tc-
> 4dYH1KA__DfMSmln-uBotO-
> bnw3PC2qXbfRn%24&data=05%7C02%7CSelvamani.Rajagopal%40on
> semi.com%7Cc9784216cb1143af435408dc839a3822%7C04e1674b7
> af54d13a08264fc6e42384c%7C1%7C0%7C638529945611367664%7
> CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2lu
> MzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=eT
> wuSpyx1KIPuOXV3krpjqElcbhU0zBoTmPjj0srUTs%3D&reserved=0
> >
> > Version 4 failed to apply. So we are missing all the CI tests. We need a
> v5 which cleanly applies to net-next in order for those tests to run.
> >
> > I think we should disable vendor interrupts by default, since we
> currently have no way to handle them.
> >
> > I had a quick look at the comments on the patches. I don't think we
> have any other big issues not agreed on. So please post a v5 with them
> all addressed and we will see what the CI says.
> >
> > Piergiorgio, if you have any real problems getting basic support for
> your device working with this framework, now would be a good time to
> raise the problems.
> >
> >          Andrew
Andrew Lunn June 5, 2024, 11:43 p.m. UTC | #102
On Wed, Jun 05, 2024 at 09:40:12PM +0000, Selvamani Rajagopal wrote:
> Parthiban/Andrew,
> 
> Couple of requests / suggestions after completing the integration of our drivers to the current framework.

Please configure your email client to wrap lines at about 78
characters.


> 
> 1) Can we move memory map selector definitions (OA_TC6_PHY_C45_PCS_MMS2 and other 4 definitions) to the header file
>      include/linux/oa_tc6.h?
>      Also, if possible, could we add the MMS0, MMS1?. Our driver is using them. Of course, we could add it when we submit our driver.

Interesting. So you have vendor registers outside of MMS 10-15?

Or do you need to access standard registers? I would prefer to see
your use cases before deciding this. If you want to access standard
registers, you are probably doing stuff other vendors also want to do,
so we should add a helper in the framework.

2) If it not too late to ask, Is it possible to move interrupt
> handler to vendor's code?

I would say no, not at the moment.

What we can do in the future is allow a driver to register a function
to handle the vendor interrupts, leaving the framework to handle the
standard interrupts, and chain into the specific driver vendor
interrupt handler when a vendor interrupt it indicated.

> This way, it will provide vendors' code an ability to deal with some
> of the interrupts. For example, our code deals with PHYINT bit.

Please explain what you are doing here? What are you doing which the
framework does not cover.

	Andrew
Selvamani Rajagopal June 6, 2024, 12:35 a.m. UTC | #103
> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: Wednesday, June 5, 2024 4:43 PM
> To: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
> Cc: Parthiban.Veerasooran@microchip.com; Piergiorgio Beruto
> <Pier.Beruto@onsemi.com>; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
> horms@kernel.org; saeedm@nvidia.com; anthony.l.nguyen@intel.com;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; corbet@lwn.net;
> linux-doc@vger.kernel.org; robh+dt@kernel.org;
> krzysztof.kozlowski+dt@linaro.org; conor+dt@kernel.org;
> devicetree@vger.kernel.org; Horatiu.Vultur@microchip.com;
> ruanjinjie@huawei.com; Steen.Hegelund@microchip.com;
> vladimir.oltean@nxp.com; UNGLinuxDriver@microchip.com;
> Thorsten.Kummermehr@microchip.com;
> Nicolas.Ferre@microchip.com;
> benjamin.bigler@bernformulastudent.ch; Viliam Vozar
> <Viliam.Vozar@onsemi.com>; Arndt Schuebel
> <Arndt.Schuebel@onsemi.com>
> Subject: Re: [PATCH net-next v4 00/12] Add support for OPEN Alliance
> 10BASE-T1x MACPHY Serial Interface
> 
> [External Email]: This email arrived from an external source - Please
> exercise caution when opening any attachments or clicking on links.
> 
> 
> On Wed, Jun 05, 2024 at 09:40:12PM +0000, Selvamani Rajagopal
> wrote:
> > Parthiban/Andrew,
> >
> > Couple of requests / suggestions after completing the integration of
> our drivers to the current framework.
> 
> Please configure your email client to wrap lines at about 78
> characters.
> 

I believe my client is configured to wrap at 70th characters. 
Not sure why it is not doing it.

> 
> >
> > 1) Can we move memory map selector definitions
> (OA_TC6_PHY_C45_PCS_MMS2 and other 4 definitions) to the header
> file
> >      include/linux/oa_tc6.h?
> >      Also, if possible, could we add the MMS0, MMS1?. Our driver is
> using them. Of course, we could add it when we submit our driver.
> 
> Interesting. So you have vendor registers outside of MMS 10-15?

This is not about vendor registers. The current oa_tc6 defines 
MMS selector values for 2, 3, 4, 5, 6. I am asking, if 0, 1 can be added, 
which are meant for "Standard Control and Status" and MAC respectively, 
according to MMS assignment table 6 on OA standard.

> 
> Or do you need to access standard registers? I would prefer to see
> your use cases before deciding this. If you want to access standard
> registers, you are probably doing stuff other vendors also want to do,
> so we should add a helper in the framework.
> 
> 2) If it not too late to ask, Is it possible to move interrupt
> > handler to vendor's code?
> 
> I would say no, not at the moment.
> 
> What we can do in the future is allow a driver to register a function
> to handle the vendor interrupts, leaving the framework to handle the
> standard interrupts, and chain into the specific driver vendor
> interrupt handler when a vendor interrupt it indicated.
> 
> > This way, it will provide vendors' code an ability to deal with some
> > of the interrupts. For example, our code deals with PHYINT bit.
> 
> Please explain what you are doing here? What are you doing which the
> framework does not cover.

One example I can think of is, to handle PHYINT status bit
that may be set in STATUS0 register. Another example could be,
to give a vendor flexibility to not to use interrupt mode. 
FYI: Our driver uses interrupts. So, this is not the main reason.

> 
> 	Andrew
Andrew Lunn June 6, 2024, 1:12 p.m. UTC | #104
> I believe my client is configured to wrap at 70th characters. 
> Not sure why it is not doing it.


It could be you also send a MIME obfuscated copy which is not wrapped
correctly?

> > > 1) Can we move memory map selector definitions
> > (OA_TC6_PHY_C45_PCS_MMS2 and other 4 definitions) to the header
> > file
> > >      include/linux/oa_tc6.h?
> > >      Also, if possible, could we add the MMS0, MMS1?. Our driver is
> > using them. Of course, we could add it when we submit our driver.
> > 
> > Interesting. So you have vendor registers outside of MMS 10-15?
> 
> This is not about vendor registers. The current oa_tc6 defines 
> MMS selector values for 2, 3, 4, 5, 6. I am asking, if 0, 1 can be added, 
> which are meant for "Standard Control and Status" and MAC respectively, 
> according to MMS assignment table 6 on OA standard.

But why would a MAC driver need access to those? Everything using
those registers should be defined in the standard. So the framework
should handle them.

> One example I can think of is, to handle PHYINT status bit
> that may be set in STATUS0 register. Another example could be,
> to give a vendor flexibility to not to use interrupt mode.

But that is part of the standard. Why would a driver need to do
anything, the framework should handle PHYINT, calling
phy_mac_interrupt(phydev).

I really think you need to post patches. We can then discuss each use
case, and i can give you concrete feedback.

But in general, if it is part of the standard it should be in the
framework. Support for features which are not part of the standard,
and workarounds for where a device violates the standard, should be in
the MAC driver, or the PHY driver.

	Andrew
Selvamani Rajagopal June 7, 2024, 4:40 a.m. UTC | #105
> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: Thursday, June 6, 2024 6:12 AM
> To: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
> Cc: Parthiban.Veerasooran@microchip.com; Piergiorgio Beruto
> <Pier.Beruto@onsemi.com>; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
> horms@kernel.org; saeedm@nvidia.com; anthony.l.nguyen@intel.com;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; corbet@lwn.net;
> linux-doc@vger.kernel.org; robh+dt@kernel.org;
> krzysztof.kozlowski+dt@linaro.org; conor+dt@kernel.org;
> devicetree@vger.kernel.org; Horatiu.Vultur@microchip.com;
> ruanjinjie@huawei.com; Steen.Hegelund@microchip.com;
> vladimir.oltean@nxp.com; UNGLinuxDriver@microchip.com;
> Thorsten.Kummermehr@microchip.com; Nicolas.Ferre@microchip.com;
> benjamin.bigler@bernformulastudent.ch; Viliam Vozar
> <Viliam.Vozar@onsemi.com>; Arndt Schuebel
> <Arndt.Schuebel@onsemi.com>
> Subject: Re: [PATCH net-next v4 00/12] Add support for OPEN Alliance
> 10BASE-T1x MACPHY Serial Interface
> 
> [External Email]: This email arrived from an external source - Please
> exercise caution when opening any attachments or clicking on links.
> 
> > I believe my client is configured to wrap at 70th characters.
> > Not sure why it is not doing it.
> 
> 
> It could be you also send a MIME obfuscated copy which is not wrapped
> correctly?
> 
> > > > 1) Can we move memory map selector definitions
> > > (OA_TC6_PHY_C45_PCS_MMS2 and other 4 definitions) to the header
> > > file
> > > >      include/linux/oa_tc6.h?
> > > >      Also, if possible, could we add the MMS0, MMS1?. Our driver is
> > > using them. Of course, we could add it when we submit our driver.
> > >
> > > Interesting. So you have vendor registers outside of MMS 10-15?
> >
> > This is not about vendor registers. The current oa_tc6 defines
> > MMS selector values for 2, 3, 4, 5, 6. I am asking, if 0, 1 can be added,
> > which are meant for "Standard Control and Status" and MAC
> respectively,
> > according to MMS assignment table 6 on OA standard.
> 
> But why would a MAC driver need access to those? Everything using
> those registers should be defined in the standard. So the framework
> should handle them.
> 
> > One example I can think of is, to handle PHYINT status bit
> > that may be set in STATUS0 register. Another example could be,
> > to give a vendor flexibility to not to use interrupt mode.
> 
> But that is part of the standard. Why would a driver need to do
> anything, the framework should handle PHYINT, calling
> phy_mac_interrupt(phydev).
> 
> I really think you need to post patches. We can then discuss each use
> case, and i can give you concrete feedback.


True.  That would be better. Will work on the patches so that it is clear.


> 
> But in general, if it is part of the standard it should be in the
> framework. Support for features which are not part of the standard,
> and workarounds for where a device violates the standard, should be in
> the MAC driver, or the PHY driver.
> 
> 	Andrew
Stefan Bigler July 16, 2024, 5:35 a.m. UTC | #106
Hi Parthiban

I'm using v4 of the driver an switched from ipv4 to ipv6.
I recognized, that the Neighbor Discovery is not working as expected.
The reason for this is, that the Neighbor Solicitation packet is not recived.
This packet is sent with a Multicast MAC Address.
When I checked the code and compared it to the documentaton the calculation of the Address Hash is not ok.
See Chapter 6.4.6 Hash Addressing

Changing the function lan865x_hash fixed the problem

+static inline u32 getAddrBit(u8 addr[ETH_ALEN], u32 bit)
+{
+       return ((addr[bit/8]) >> (bit % 8)) & 1;
+}
+
 static u32 lan865x_hash(u8 addr[ETH_ALEN])
 {
-       return (ether_crc(ETH_ALEN, addr) >> 26) & GENMASK(5, 0);
+       u32 hash_index = 0;
+       for (int i=0; i<6; i++)
+       {
+               u32 hash = 0;
+               for (int j=0; j<8; j++) {
+                       hash ^= getAddrBit(addr, (j*6)+i);
+               }
+               hash_index |= (hash << i);
+       }
+       return hash_index;
 }

also the fuction lan865x_set_specific_multicast_addr() must be fixed due to the overflow of the mask variable

  static void lan865x_set_specific_multicast_addr(struct net_device *netdev)
@@ -301,15 +315,11 @@ static void lan865x_set_specific_multicast_addr(struct net_device *netdev)
 
 	netdev_for_each_mc_addr(ha, netdev) {
 		u32 bit_num = lan865x_hash(ha->addr);
-		u32 mask = BIT(bit_num);
 
-		/* 5th bit of the 6 bits hash value is used to determine which
-		 * bit to set in either a high or low hash register.
-		 */
-		if (bit_num & BIT(5))
-			hash_hi |= mask;
+		if (bit_num >= 32)
+			hash_hi |= (1 << (bit_num-32));
 		else
-			hash_lo |= mask;
+			hash_lo |= (1 << bit_num);
 	}
 
 	/* Enabling specific multicast addresses */

I would be great if the fix can be included in the new version 5 of your driver.

Thanks a lot and best regards
Stefan


Am 2024-04-18T14:56:47.000+02:00 hat Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com> geschrieben:

>  The LAN8650/1 is designed to conform to the OPEN Alliance 10BASE-T1x
> MAC-PHY Serial Interface specification, Version 1.1. The IEEE Clause 4
> MAC integration provides the low pin count standard SPI interface to any
> microcontroller therefore providing Ethernet functionality without
> requiring MAC integration within the microcontroller. The LAN8650/1
> operates as an SPI client supporting SCLK clock rates up to a maximum of
> 25 MHz. This SPI interface supports the transfer of both data (Ethernet
> frames) and control (register access).
> 
> By default, the chunk data payload is 64 bytes in size. The Ethernet
> Media Access Controller (MAC) module implements a 10 Mbps half duplex
> Ethernet MAC, compatible with the IEEE 802.3 standard. 10BASE-T1S
> physical layer transceiver integrated is into the LAN8650/1. The PHY and
> MAC are connected via an internal Media Independent Interface (MII).
> 
> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
> ---
>  MAINTAINERS                                   |   6 +
>  drivers/net/ethernet/microchip/Kconfig        |   1 +
>  drivers/net/ethernet/microchip/Makefile       |   1 +
>  .../net/ethernet/microchip/lan865x/Kconfig    |  19 +
>  .../net/ethernet/microchip/lan865x/Makefile   |   6 +
>  .../net/ethernet/microchip/lan865x/lan865x.c  | 384 ++++++++++++++++++
>  6 files changed, 417 insertions(+)
>  create mode 100644 drivers/net/ethernet/microchip/lan865x/Kconfig
>  create mode 100644 drivers/net/ethernet/microchip/lan865x/Makefile
>  create mode 100644 drivers/net/ethernet/microchip/lan865x/lan865x.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 603528948f61..f41b7f2257d2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14374,6 +14374,12 @@ L:	netdev@vger.kernel.org
>  S:	Maintained
>  F:	drivers/net/ethernet/microchip/lan743x_*
>  
> +MICROCHIP LAN8650/1 10BASE-T1S MACPHY ETHERNET DRIVER
> +M:	Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
> +L:	netdev@vger.kernel.org
> +S:	Maintained
> +F:	drivers/net/ethernet/microchip/lan865x/lan865x.c
> +
>  MICROCHIP LAN87xx/LAN937x T1 PHY DRIVER
>  M:	Arun Ramadoss <arun.ramadoss@microchip.com>
>  R:	UNGLinuxDriver@microchip.com
> diff --git a/drivers/net/ethernet/microchip/Kconfig b/drivers/net/ethernet/microchip/Kconfig
> index 43ba71e82260..06ca79669053 100644
> --- a/drivers/net/ethernet/microchip/Kconfig
> +++ b/drivers/net/ethernet/microchip/Kconfig
> @@ -56,6 +56,7 @@ config LAN743X
>  	  To compile this driver as a module, choose M here. The module will be
>  	  called lan743x.
>  
> +source "drivers/net/ethernet/microchip/lan865x/Kconfig"
>  source "drivers/net/ethernet/microchip/lan966x/Kconfig"
>  source "drivers/net/ethernet/microchip/sparx5/Kconfig"
>  source "drivers/net/ethernet/microchip/vcap/Kconfig"
> diff --git a/drivers/net/ethernet/microchip/Makefile b/drivers/net/ethernet/microchip/Makefile
> index bbd349264e6f..15dfbb321057 100644
> --- a/drivers/net/ethernet/microchip/Makefile
> +++ b/drivers/net/ethernet/microchip/Makefile
> @@ -9,6 +9,7 @@ obj-$(CONFIG_LAN743X) += lan743x.o
>  
>  lan743x-objs := lan743x_main.o lan743x_ethtool.o lan743x_ptp.o
>  
> +obj-$(CONFIG_LAN865X) += lan865x/
>  obj-$(CONFIG_LAN966X_SWITCH) += lan966x/
>  obj-$(CONFIG_SPARX5_SWITCH) += sparx5/
>  obj-$(CONFIG_VCAP) += vcap/
> diff --git a/drivers/net/ethernet/microchip/lan865x/Kconfig b/drivers/net/ethernet/microchip/lan865x/Kconfig
> new file mode 100644
> index 000000000000..f3d60d14e202
> --- /dev/null
> +++ b/drivers/net/ethernet/microchip/lan865x/Kconfig
> @@ -0,0 +1,19 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Microchip LAN865x Driver Support
> +#
> +
> +if NET_VENDOR_MICROCHIP
> +
> +config LAN865X
> +	tristate "LAN865x support"
> +	depends on SPI
> +	depends on OA_TC6
> +	help
> +	  Support for the Microchip LAN8650/1 Rev.B1 MACPHY Ethernet chip. It
> +	  uses OPEN Alliance 10BASE-T1x Serial Interface specification.
> +
> +	  To compile this driver as a module, choose M here. The module will be
> +	  called lan865x.
> +
> +endif # NET_VENDOR_MICROCHIP
> diff --git a/drivers/net/ethernet/microchip/lan865x/Makefile b/drivers/net/ethernet/microchip/lan865x/Makefile
> new file mode 100644
> index 000000000000..9f5dd89c1eb8
> --- /dev/null
> +++ b/drivers/net/ethernet/microchip/lan865x/Makefile
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Makefile for the Microchip LAN865x Driver
> +#
> +
> +obj-$(CONFIG_LAN865X) += lan865x.o
> diff --git a/drivers/net/ethernet/microchip/lan865x/lan865x.c b/drivers/net/ethernet/microchip/lan865x/lan865x.c
> new file mode 100644
> index 000000000000..9abefa8b9d9f
> --- /dev/null
> +++ b/drivers/net/ethernet/microchip/lan865x/lan865x.c
> @@ -0,0 +1,384 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Microchip's LAN865x 10BASE-T1S MAC-PHY driver
> + *
> + * Author: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/phy.h>
> +#include <linux/oa_tc6.h>
> +
> +#define DRV_NAME			"lan865x"
> +
> +/* MAC Network Control Register */
> +#define LAN865X_REG_MAC_NET_CTL		0x00010000
> +#define MAC_NET_CTL_TXEN		BIT(3) /* Transmit Enable */
> +#define MAC_NET_CTL_RXEN		BIT(2) /* Receive Enable */
> +
> +#define LAN865X_REG_MAC_NET_CFG		0x00010001 /* MAC Network Configuration Reg */
> +#define MAC_NET_CFG_PROMISCUOUS_MODE	BIT(4)
> +#define MAC_NET_CFG_MULTICAST_MODE	BIT(6)
> +#define MAC_NET_CFG_UNICAST_MODE	BIT(7)
> +
> +#define LAN865X_REG_MAC_L_HASH		0x00010020 /* MAC Hash Register Bottom */
> +#define LAN865X_REG_MAC_H_HASH		0x00010021 /* MAC Hash Register Top */
> +#define LAN865X_REG_MAC_L_SADDR1	0x00010022 /* MAC Specific Addr 1 Bottom Reg */
> +#define LAN865X_REG_MAC_H_SADDR1	0x00010023 /* MAC Specific Addr 1 Top Reg */
> +
> +/* OPEN Alliance Configuration Register #0 */
> +#define OA_TC6_REG_CONFIG0		0x0004
> +#define CONFIG0_ZARFE_ENABLE		BIT(12)
> +
> +struct lan865x_priv {
> +	struct work_struct multicast_work;
> +	struct net_device *netdev;
> +	struct spi_device *spi;
> +	struct oa_tc6 *tc6;
> +};
> +
> +static int lan865x_set_hw_macaddr_low_bytes(struct oa_tc6 *tc6, const u8 *mac)
> +{
> +	u32 regval;
> +
> +	regval = (mac[3] << 24) | (mac[2] << 16) | (mac[1] << 8) | mac[0];
> +
> +	return oa_tc6_write_register(tc6, LAN865X_REG_MAC_L_SADDR1, regval);
> +}
> +
> +static int lan865x_set_hw_macaddr(struct lan865x_priv *priv, const u8 *mac)
> +{
> +	int restore_ret;
> +	u32 regval;
> +	int ret;
> +
> +	/* Configure MAC address low bytes */
> +	ret = lan865x_set_hw_macaddr_low_bytes(priv->tc6, mac);
> +	if (ret)
> +		return ret;
> +
> +	/* Prepare and configure MAC address high bytes */
> +	regval = (mac[5] << 8) | mac[4];
> +	ret = oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_H_SADDR1, regval);
> +	if (!ret)
> +		return 0;
> +
> +	/* Restore the old MAC address low bytes from netdev if the new MAC
> +	 * address high bytes setting failed.
> +	 */
> +	restore_ret = lan865x_set_hw_macaddr_low_bytes(priv->tc6,
> +						       priv->netdev->dev_addr);
> +	if (restore_ret)
> +		return restore_ret;
> +
> +	return ret;
> +}
> +
> +static void
> +lan865x_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *info)
> +{
> +	strscpy(info->driver, DRV_NAME, sizeof(info->driver));
> +	strscpy(info->bus_info, dev_name(netdev->dev.parent),
> +		sizeof(info->bus_info));
> +}
> +
> +static const struct ethtool_ops lan865x_ethtool_ops = {
> +	.get_drvinfo        = lan865x_get_drvinfo,
> +	.get_link_ksettings = phy_ethtool_get_link_ksettings,
> +	.set_link_ksettings = phy_ethtool_set_link_ksettings,
> +};
> +
> +static int lan865x_set_mac_address(struct net_device *netdev, void *addr)
> +{
> +	struct lan865x_priv *priv = netdev_priv(netdev);
> +	struct sockaddr *address = addr;
> +	int ret;
> +
> +	ret = eth_prepare_mac_addr_change(netdev, addr);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (ether_addr_equal(address->sa_data, netdev->dev_addr))
> +		return 0;
> +
> +	ret = lan865x_set_hw_macaddr(priv, address->sa_data);
> +	if (ret)
> +		return ret;
> +
> +	eth_hw_addr_set(netdev, address->sa_data);
> +
> +	return 0;
> +}
> +
> +static u32 lan865x_hash(u8 addr[ETH_ALEN])
> +{
> +	return (ether_crc(ETH_ALEN, addr) >> 26) & GENMASK(5, 0);
> +}
> +
> +static void lan865x_set_specific_multicast_addr(struct net_device *netdev)
> +{
> +	struct lan865x_priv *priv = netdev_priv(netdev);
> +	struct netdev_hw_addr *ha;
> +	u32 hash_lo = 0;
> +	u32 hash_hi = 0;
> +
> +	netdev_for_each_mc_addr(ha, netdev) {
> +		u32 bit_num = lan865x_hash(ha->addr);
> +		u32 mask = BIT(bit_num);
> +
> +		/* 5th bit of the 6 bits hash value is used to determine which
> +		 * bit to set in either a high or low hash register.
> +		 */
> +		if (bit_num & BIT(5))
> +			hash_hi |= mask;
> +		else
> +			hash_lo |= mask;
> +	}
> +
> +	/* Enabling specific multicast addresses */
> +	if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_H_HASH, hash_hi)) {
> +		netdev_err(netdev, "Failed to write reg_hashh");
> +		return;
> +	}
> +
> +	if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_L_HASH, hash_lo))
> +		netdev_err(netdev, "Failed to write reg_hashl");
> +}
> +
> +static void lan865x_multicast_work_handler(struct work_struct *work)
> +{
> +	struct lan865x_priv *priv = container_of(work, struct lan865x_priv,
> +						 multicast_work);
> +	u32 regval = 0;
> +
> +	if (priv->netdev->flags & IFF_PROMISC) {
> +		/* Enabling promiscuous mode */
> +		regval |= MAC_NET_CFG_PROMISCUOUS_MODE;
> +		regval &= (~MAC_NET_CFG_MULTICAST_MODE);
> +		regval &= (~MAC_NET_CFG_UNICAST_MODE);
> +	} else if (priv->netdev->flags & IFF_ALLMULTI) {
> +		/* Enabling all multicast mode */
> +		regval &= (~MAC_NET_CFG_PROMISCUOUS_MODE);
> +		regval |= MAC_NET_CFG_MULTICAST_MODE;
> +		regval &= (~MAC_NET_CFG_UNICAST_MODE);
> +	} else if (!netdev_mc_empty(priv->netdev)) {
> +		lan865x_set_specific_multicast_addr(priv->netdev);
> +		regval &= (~MAC_NET_CFG_PROMISCUOUS_MODE);
> +		regval &= (~MAC_NET_CFG_MULTICAST_MODE);
> +		regval |= MAC_NET_CFG_UNICAST_MODE;
> +	} else {
> +		/* enabling local mac address only */
> +		if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_H_HASH, 0)) {
> +			netdev_err(priv->netdev, "Failed to write reg_hashh");
> +			return;
> +		}
> +		if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_L_HASH, 0)) {
> +			netdev_err(priv->netdev, "Failed to write reg_hashl");
> +			return;
> +		}
> +	}
> +	if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_NET_CFG, regval))
> +		netdev_err(priv->netdev,
> +			   "Failed to enable promiscuous/multicast/normal mode");
> +}
> +
> +static void lan865x_set_multicast_list(struct net_device *netdev)
> +{
> +	struct lan865x_priv *priv = netdev_priv(netdev);
> +
> +	schedule_work(&priv->multicast_work);
> +}
> +
> +static netdev_tx_t lan865x_send_packet(struct sk_buff *skb,
> +				       struct net_device *netdev)
> +{
> +	struct lan865x_priv *priv = netdev_priv(netdev);
> +
> +	return oa_tc6_start_xmit(priv->tc6, skb);
> +}
> +
> +static int lan865x_hw_disable(struct lan865x_priv *priv)
> +{
> +	u32 regval;
> +
> +	if (oa_tc6_read_register(priv->tc6, LAN865X_REG_MAC_NET_CTL, ®val))
> +		return -ENODEV;
> +
> +	regval &= ~(MAC_NET_CTL_TXEN | MAC_NET_CTL_RXEN);
> +
> +	if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_NET_CTL, regval))
> +		return -ENODEV;
> +
> +	return 0;
> +}
> +
> +static int lan865x_net_close(struct net_device *netdev)
> +{
> +	struct lan865x_priv *priv = netdev_priv(netdev);
> +	int ret;
> +
> +	netif_stop_queue(netdev);
> +	phy_stop(netdev->phydev);
> +	ret = lan865x_hw_disable(priv);
> +	if (ret) {
> +		netdev_err(netdev, "Failed to disable the hardware: %d\n", ret);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int lan865x_hw_enable(struct lan865x_priv *priv)
> +{
> +	u32 regval;
> +
> +	if (oa_tc6_read_register(priv->tc6, LAN865X_REG_MAC_NET_CTL, ®val))
> +		return -ENODEV;
> +
> +	regval |= MAC_NET_CTL_TXEN | MAC_NET_CTL_RXEN;
> +
> +	if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_NET_CTL, regval))
> +		return -ENODEV;
> +
> +	return 0;
> +}
> +
> +static int lan865x_net_open(struct net_device *netdev)
> +{
> +	struct lan865x_priv *priv = netdev_priv(netdev);
> +	int ret;
> +
> +	ret = lan865x_hw_enable(priv);
> +	if (ret) {
> +		netdev_err(netdev, "Failed to enable hardware: %d\n", ret);
> +		return ret;
> +	}
> +
> +	phy_start(netdev->phydev);
> +
> +	return 0;
> +}
> +
> +static const struct net_device_ops lan865x_netdev_ops = {
> +	.ndo_open		= lan865x_net_open,
> +	.ndo_stop		= lan865x_net_close,
> +	.ndo_start_xmit		= lan865x_send_packet,
> +	.ndo_set_rx_mode	= lan865x_set_multicast_list,
> +	.ndo_set_mac_address	= lan865x_set_mac_address,
> +};
> +
> +static int lan865x_set_zarfe(struct lan865x_priv *priv)
> +{
> +	u32 regval;
> +	int ret;
> +
> +	ret = oa_tc6_read_register(priv->tc6, OA_TC6_REG_CONFIG0, ®val);
> +	if (ret)
> +		return ret;
> +
> +	/* Set Zero-Align Receive Frame Enable */
> +	regval |= CONFIG0_ZARFE_ENABLE;
> +
> +	return oa_tc6_write_register(priv->tc6, OA_TC6_REG_CONFIG0, regval);
> +}
> +
> +static int lan865x_probe(struct spi_device *spi)
> +{
> +	struct net_device *netdev;
> +	struct lan865x_priv *priv;
> +	int ret;
> +
> +	netdev = alloc_etherdev(sizeof(struct lan865x_priv));
> +	if (!netdev)
> +		return -ENOMEM;
> +
> +	priv = netdev_priv(netdev);
> +	priv->netdev = netdev;
> +	priv->spi = spi;
> +	spi_set_drvdata(spi, priv);
> +	INIT_WORK(&priv->multicast_work, lan865x_multicast_work_handler);
> +
> +	priv->tc6 = oa_tc6_init(spi, netdev);
> +	if (!priv->tc6) {
> +		ret = -ENODEV;
> +		goto free_netdev;
> +	}
> +
> +	/* As per the point s3 in the below errata, SPI receive Ethernet frame
> +	 * transfer may halt when starting the next frame in the same data block
> +	 * (chunk) as the end of a previous frame. The RFA field should be
> +	 * configured to 01b or 10b for proper operation. In these modes, only
> +	 * one receive Ethernet frame will be placed in a single data block.
> +	 * When the RFA field is written to 01b, received frames will be forced
> +	 * to only start in the first word of the data block payload (SWO=0). As
> +	 * recommended, ZARFE bit in the OPEN Alliance CONFIG0 register is set
> +	 * to 1 for proper operation.
> +	 *
> +	 * https://ww1.microchip.com/downloads/aemDocuments/documents/AIS/ProductDocuments/Errata/LAN8650-1-Errata-80001075.pdf
> +	 */
> +	ret = lan865x_set_zarfe(priv);
> +	if (ret) {
> +		dev_err(&spi->dev, "Failed to set ZARFE: %d\n", ret);
> +		goto oa_tc6_exit;
> +	}
> +
> +	/* Get the MAC address from the SPI device tree node */
> +	if (device_get_ethdev_address(&spi->dev, netdev))
> +		eth_hw_addr_random(netdev);
> +
> +	ret = lan865x_set_hw_macaddr(priv, netdev->dev_addr);
> +	if (ret) {
> +		dev_err(&spi->dev, "Failed to configure MAC: %d\n", ret);
> +		goto oa_tc6_exit;
> +	}
> +
> +	netdev->if_port = IF_PORT_10BASET;
> +	netdev->irq = spi->irq;
> +	netdev->netdev_ops = &lan865x_netdev_ops;
> +	netdev->ethtool_ops = &lan865x_ethtool_ops;
> +
> +	ret = register_netdev(netdev);
> +	if (ret) {
> +		dev_err(&spi->dev, "Register netdev failed (ret = %d)", ret);
> +		goto oa_tc6_exit;
> +	}
> +
> +	return 0;
> +
> +oa_tc6_exit:
> +	oa_tc6_exit(priv->tc6);
> +free_netdev:
> +	free_netdev(priv->netdev);
> +	return ret;
> +}
> +
> +static void lan865x_remove(struct spi_device *spi)
> +{
> +	struct lan865x_priv *priv = spi_get_drvdata(spi);
> +
> +	cancel_work_sync(&priv->multicast_work);
> +	unregister_netdev(priv->netdev);
> +	oa_tc6_exit(priv->tc6);
> +	free_netdev(priv->netdev);
> +}
> +
> +static const struct of_device_id lan865x_dt_ids[] = {
> +	{ .compatible = "microchip,lan8651", "microchip,lan8650" },
> +	{ /* Sentinel */ }
> +};
> +MODULE_DEVICE_TABLE(of, lan865x_dt_ids);
> +
> +static struct spi_driver lan865x_driver = {
> +	.driver = {
> +		.name = DRV_NAME,
> +		.of_match_table = lan865x_dt_ids,
> +	 },
> +	.probe = lan865x_probe,
> +	.remove = lan865x_remove,
> +};
> +module_spi_driver(lan865x_driver);
> +
> +MODULE_DESCRIPTION(DRV_NAME " 10Base-T1S MACPHY Ethernet Driver");
> +MODULE_AUTHOR("Parthiban Veerasooran <parthiban.veerasooran@microchip.com>");
> +MODULE_LICENSE("GPL");
> -- 
> 2.34.1
Parthiban Veerasooran July 23, 2024, 11:04 a.m. UTC | #107
Hi Stefan,

Thanks a lot for finding this. Sure I will include the below fix in the 
v5 patch series. Unfortunately the preparation of the v5 patch series 
takes little more time. Will try to post v5 patch series as soon as 
possible. Please keep supporting.

Best regards,
Parthiban V

On 16/07/24 11:05 am, Stefan Bigler wrote:
> [You don't often get email from linux@bigler.io. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Hi Parthiban
> 
> I'm using v4 of the driver an switched from ipv4 to ipv6.
> I recognized, that the Neighbor Discovery is not working as expected.
> The reason for this is, that the Neighbor Solicitation packet is not recived.
> This packet is sent with a Multicast MAC Address.
> When I checked the code and compared it to the documentaton the calculation of the Address Hash is not ok.
> See Chapter 6.4.6 Hash Addressing
> 
> Changing the function lan865x_hash fixed the problem
> 
> +static inline u32 getAddrBit(u8 addr[ETH_ALEN], u32 bit)
> +{
> +       return ((addr[bit/8]) >> (bit % 8)) & 1;
> +}
> +
>   static u32 lan865x_hash(u8 addr[ETH_ALEN])
>   {
> -       return (ether_crc(ETH_ALEN, addr) >> 26) & GENMASK(5, 0);
> +       u32 hash_index = 0;
> +       for (int i=0; i<6; i++)
> +       {
> +               u32 hash = 0;
> +               for (int j=0; j<8; j++) {
> +                       hash ^= getAddrBit(addr, (j*6)+i);
> +               }
> +               hash_index |= (hash << i);
> +       }
> +       return hash_index;
>   }
> 
> also the fuction lan865x_set_specific_multicast_addr() must be fixed due to the overflow of the mask variable
> 
>    static void lan865x_set_specific_multicast_addr(struct net_device *netdev)
> @@ -301,15 +315,11 @@ static void lan865x_set_specific_multicast_addr(struct net_device *netdev)
> 
>          netdev_for_each_mc_addr(ha, netdev) {
>                  u32 bit_num = lan865x_hash(ha->addr);
> -               u32 mask = BIT(bit_num);
> 
> -               /* 5th bit of the 6 bits hash value is used to determine which
> -                * bit to set in either a high or low hash register.
> -                */
> -               if (bit_num & BIT(5))
> -                       hash_hi |= mask;
> +               if (bit_num >= 32)
> +                       hash_hi |= (1 << (bit_num-32));
>                  else
> -                       hash_lo |= mask;
> +                       hash_lo |= (1 << bit_num);
>          }
> 
>          /* Enabling specific multicast addresses */
> 
> I would be great if the fix can be included in the new version 5 of your driver.
> 
> Thanks a lot and best regards
> Stefan
> 
> 
> Am 2024-04-18T14:56:47.000+02:00 hat Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com> geschrieben:
> 
>>   The LAN8650/1 is designed to conform to the OPEN Alliance 10BASE-T1x
>> MAC-PHY Serial Interface specification, Version 1.1. The IEEE Clause 4
>> MAC integration provides the low pin count standard SPI interface to any
>> microcontroller therefore providing Ethernet functionality without
>> requiring MAC integration within the microcontroller. The LAN8650/1
>> operates as an SPI client supporting SCLK clock rates up to a maximum of
>> 25 MHz. This SPI interface supports the transfer of both data (Ethernet
>> frames) and control (register access).
>>
>> By default, the chunk data payload is 64 bytes in size. The Ethernet
>> Media Access Controller (MAC) module implements a 10 Mbps half duplex
>> Ethernet MAC, compatible with the IEEE 802.3 standard. 10BASE-T1S
>> physical layer transceiver integrated is into the LAN8650/1. The PHY and
>> MAC are connected via an internal Media Independent Interface (MII).
>>
>> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
>> ---
>>   MAINTAINERS                                   |   6 +
>>   drivers/net/ethernet/microchip/Kconfig        |   1 +
>>   drivers/net/ethernet/microchip/Makefile       |   1 +
>>   .../net/ethernet/microchip/lan865x/Kconfig    |  19 +
>>   .../net/ethernet/microchip/lan865x/Makefile   |   6 +
>>   .../net/ethernet/microchip/lan865x/lan865x.c  | 384 ++++++++++++++++++
>>   6 files changed, 417 insertions(+)
>>   create mode 100644 drivers/net/ethernet/microchip/lan865x/Kconfig
>>   create mode 100644 drivers/net/ethernet/microchip/lan865x/Makefile
>>   create mode 100644 drivers/net/ethernet/microchip/lan865x/lan865x.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 603528948f61..f41b7f2257d2 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -14374,6 +14374,12 @@ L:   netdev@vger.kernel.org
>>   S:   Maintained
>>   F:   drivers/net/ethernet/microchip/lan743x_*
>>
>> +MICROCHIP LAN8650/1 10BASE-T1S MACPHY ETHERNET DRIVER
>> +M:   Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
>> +L:   netdev@vger.kernel.org
>> +S:   Maintained
>> +F:   drivers/net/ethernet/microchip/lan865x/lan865x.c
>> +
>>   MICROCHIP LAN87xx/LAN937x T1 PHY DRIVER
>>   M:   Arun Ramadoss <arun.ramadoss@microchip.com>
>>   R:   UNGLinuxDriver@microchip.com
>> diff --git a/drivers/net/ethernet/microchip/Kconfig b/drivers/net/ethernet/microchip/Kconfig
>> index 43ba71e82260..06ca79669053 100644
>> --- a/drivers/net/ethernet/microchip/Kconfig
>> +++ b/drivers/net/ethernet/microchip/Kconfig
>> @@ -56,6 +56,7 @@ config LAN743X
>>          To compile this driver as a module, choose M here. The module will be
>>          called lan743x.
>>
>> +source "drivers/net/ethernet/microchip/lan865x/Kconfig"
>>   source "drivers/net/ethernet/microchip/lan966x/Kconfig"
>>   source "drivers/net/ethernet/microchip/sparx5/Kconfig"
>>   source "drivers/net/ethernet/microchip/vcap/Kconfig"
>> diff --git a/drivers/net/ethernet/microchip/Makefile b/drivers/net/ethernet/microchip/Makefile
>> index bbd349264e6f..15dfbb321057 100644
>> --- a/drivers/net/ethernet/microchip/Makefile
>> +++ b/drivers/net/ethernet/microchip/Makefile
>> @@ -9,6 +9,7 @@ obj-$(CONFIG_LAN743X) += lan743x.o
>>
>>   lan743x-objs := lan743x_main.o lan743x_ethtool.o lan743x_ptp.o
>>
>> +obj-$(CONFIG_LAN865X) += lan865x/
>>   obj-$(CONFIG_LAN966X_SWITCH) += lan966x/
>>   obj-$(CONFIG_SPARX5_SWITCH) += sparx5/
>>   obj-$(CONFIG_VCAP) += vcap/
>> diff --git a/drivers/net/ethernet/microchip/lan865x/Kconfig b/drivers/net/ethernet/microchip/lan865x/Kconfig
>> new file mode 100644
>> index 000000000000..f3d60d14e202
>> --- /dev/null
>> +++ b/drivers/net/ethernet/microchip/lan865x/Kconfig
>> @@ -0,0 +1,19 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +#
>> +# Microchip LAN865x Driver Support
>> +#
>> +
>> +if NET_VENDOR_MICROCHIP
>> +
>> +config LAN865X
>> +     tristate "LAN865x support"
>> +     depends on SPI
>> +     depends on OA_TC6
>> +     help
>> +       Support for the Microchip LAN8650/1 Rev.B1 MACPHY Ethernet chip. It
>> +       uses OPEN Alliance 10BASE-T1x Serial Interface specification.
>> +
>> +       To compile this driver as a module, choose M here. The module will be
>> +       called lan865x.
>> +
>> +endif # NET_VENDOR_MICROCHIP
>> diff --git a/drivers/net/ethernet/microchip/lan865x/Makefile b/drivers/net/ethernet/microchip/lan865x/Makefile
>> new file mode 100644
>> index 000000000000..9f5dd89c1eb8
>> --- /dev/null
>> +++ b/drivers/net/ethernet/microchip/lan865x/Makefile
>> @@ -0,0 +1,6 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +#
>> +# Makefile for the Microchip LAN865x Driver
>> +#
>> +
>> +obj-$(CONFIG_LAN865X) += lan865x.o
>> diff --git a/drivers/net/ethernet/microchip/lan865x/lan865x.c b/drivers/net/ethernet/microchip/lan865x/lan865x.c
>> new file mode 100644
>> index 000000000000..9abefa8b9d9f
>> --- /dev/null
>> +++ b/drivers/net/ethernet/microchip/lan865x/lan865x.c
>> @@ -0,0 +1,384 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +/*
>> + * Microchip's LAN865x 10BASE-T1S MAC-PHY driver
>> + *
>> + * Author: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
>> + */
>> +
>> +#include <linux/module.h>
>> +#include <linux/kernel.h>
>> +#include <linux/phy.h>
>> +#include <linux/oa_tc6.h>
>> +
>> +#define DRV_NAME                     "lan865x"
>> +
>> +/* MAC Network Control Register */
>> +#define LAN865X_REG_MAC_NET_CTL              0x00010000
>> +#define MAC_NET_CTL_TXEN             BIT(3) /* Transmit Enable */
>> +#define MAC_NET_CTL_RXEN             BIT(2) /* Receive Enable */
>> +
>> +#define LAN865X_REG_MAC_NET_CFG              0x00010001 /* MAC Network Configuration Reg */
>> +#define MAC_NET_CFG_PROMISCUOUS_MODE BIT(4)
>> +#define MAC_NET_CFG_MULTICAST_MODE   BIT(6)
>> +#define MAC_NET_CFG_UNICAST_MODE     BIT(7)
>> +
>> +#define LAN865X_REG_MAC_L_HASH               0x00010020 /* MAC Hash Register Bottom */
>> +#define LAN865X_REG_MAC_H_HASH               0x00010021 /* MAC Hash Register Top */
>> +#define LAN865X_REG_MAC_L_SADDR1     0x00010022 /* MAC Specific Addr 1 Bottom Reg */
>> +#define LAN865X_REG_MAC_H_SADDR1     0x00010023 /* MAC Specific Addr 1 Top Reg */
>> +
>> +/* OPEN Alliance Configuration Register #0 */
>> +#define OA_TC6_REG_CONFIG0           0x0004
>> +#define CONFIG0_ZARFE_ENABLE         BIT(12)
>> +
>> +struct lan865x_priv {
>> +     struct work_struct multicast_work;
>> +     struct net_device *netdev;
>> +     struct spi_device *spi;
>> +     struct oa_tc6 *tc6;
>> +};
>> +
>> +static int lan865x_set_hw_macaddr_low_bytes(struct oa_tc6 *tc6, const u8 *mac)
>> +{
>> +     u32 regval;
>> +
>> +     regval = (mac[3] << 24) | (mac[2] << 16) | (mac[1] << 8) | mac[0];
>> +
>> +     return oa_tc6_write_register(tc6, LAN865X_REG_MAC_L_SADDR1, regval);
>> +}
>> +
>> +static int lan865x_set_hw_macaddr(struct lan865x_priv *priv, const u8 *mac)
>> +{
>> +     int restore_ret;
>> +     u32 regval;
>> +     int ret;
>> +
>> +     /* Configure MAC address low bytes */
>> +     ret = lan865x_set_hw_macaddr_low_bytes(priv->tc6, mac);
>> +     if (ret)
>> +             return ret;
>> +
>> +     /* Prepare and configure MAC address high bytes */
>> +     regval = (mac[5] << 8) | mac[4];
>> +     ret = oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_H_SADDR1, regval);
>> +     if (!ret)
>> +             return 0;
>> +
>> +     /* Restore the old MAC address low bytes from netdev if the new MAC
>> +      * address high bytes setting failed.
>> +      */
>> +     restore_ret = lan865x_set_hw_macaddr_low_bytes(priv->tc6,
>> +                                                    priv->netdev->dev_addr);
>> +     if (restore_ret)
>> +             return restore_ret;
>> +
>> +     return ret;
>> +}
>> +
>> +static void
>> +lan865x_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *info)
>> +{
>> +     strscpy(info->driver, DRV_NAME, sizeof(info->driver));
>> +     strscpy(info->bus_info, dev_name(netdev->dev.parent),
>> +             sizeof(info->bus_info));
>> +}
>> +
>> +static const struct ethtool_ops lan865x_ethtool_ops = {
>> +     .get_drvinfo        = lan865x_get_drvinfo,
>> +     .get_link_ksettings = phy_ethtool_get_link_ksettings,
>> +     .set_link_ksettings = phy_ethtool_set_link_ksettings,
>> +};
>> +
>> +static int lan865x_set_mac_address(struct net_device *netdev, void *addr)
>> +{
>> +     struct lan865x_priv *priv = netdev_priv(netdev);
>> +     struct sockaddr *address = addr;
>> +     int ret;
>> +
>> +     ret = eth_prepare_mac_addr_change(netdev, addr);
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     if (ether_addr_equal(address->sa_data, netdev->dev_addr))
>> +             return 0;
>> +
>> +     ret = lan865x_set_hw_macaddr(priv, address->sa_data);
>> +     if (ret)
>> +             return ret;
>> +
>> +     eth_hw_addr_set(netdev, address->sa_data);
>> +
>> +     return 0;
>> +}
>> +
>> +static u32 lan865x_hash(u8 addr[ETH_ALEN])
>> +{
>> +     return (ether_crc(ETH_ALEN, addr) >> 26) & GENMASK(5, 0);
>> +}
>> +
>> +static void lan865x_set_specific_multicast_addr(struct net_device *netdev)
>> +{
>> +     struct lan865x_priv *priv = netdev_priv(netdev);
>> +     struct netdev_hw_addr *ha;
>> +     u32 hash_lo = 0;
>> +     u32 hash_hi = 0;
>> +
>> +     netdev_for_each_mc_addr(ha, netdev) {
>> +             u32 bit_num = lan865x_hash(ha->addr);
>> +             u32 mask = BIT(bit_num);
>> +
>> +             /* 5th bit of the 6 bits hash value is used to determine which
>> +              * bit to set in either a high or low hash register.
>> +              */
>> +             if (bit_num & BIT(5))
>> +                     hash_hi |= mask;
>> +             else
>> +                     hash_lo |= mask;
>> +     }
>> +
>> +     /* Enabling specific multicast addresses */
>> +     if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_H_HASH, hash_hi)) {
>> +             netdev_err(netdev, "Failed to write reg_hashh");
>> +             return;
>> +     }
>> +
>> +     if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_L_HASH, hash_lo))
>> +             netdev_err(netdev, "Failed to write reg_hashl");
>> +}
>> +
>> +static void lan865x_multicast_work_handler(struct work_struct *work)
>> +{
>> +     struct lan865x_priv *priv = container_of(work, struct lan865x_priv,
>> +                                              multicast_work);
>> +     u32 regval = 0;
>> +
>> +     if (priv->netdev->flags & IFF_PROMISC) {
>> +             /* Enabling promiscuous mode */
>> +             regval |= MAC_NET_CFG_PROMISCUOUS_MODE;
>> +             regval &= (~MAC_NET_CFG_MULTICAST_MODE);
>> +             regval &= (~MAC_NET_CFG_UNICAST_MODE);
>> +     } else if (priv->netdev->flags & IFF_ALLMULTI) {
>> +             /* Enabling all multicast mode */
>> +             regval &= (~MAC_NET_CFG_PROMISCUOUS_MODE);
>> +             regval |= MAC_NET_CFG_MULTICAST_MODE;
>> +             regval &= (~MAC_NET_CFG_UNICAST_MODE);
>> +     } else if (!netdev_mc_empty(priv->netdev)) {
>> +             lan865x_set_specific_multicast_addr(priv->netdev);
>> +             regval &= (~MAC_NET_CFG_PROMISCUOUS_MODE);
>> +             regval &= (~MAC_NET_CFG_MULTICAST_MODE);
>> +             regval |= MAC_NET_CFG_UNICAST_MODE;
>> +     } else {
>> +             /* enabling local mac address only */
>> +             if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_H_HASH, 0)) {
>> +                     netdev_err(priv->netdev, "Failed to write reg_hashh");
>> +                     return;
>> +             }
>> +             if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_L_HASH, 0)) {
>> +                     netdev_err(priv->netdev, "Failed to write reg_hashl");
>> +                     return;
>> +             }
>> +     }
>> +     if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_NET_CFG, regval))
>> +             netdev_err(priv->netdev,
>> +                        "Failed to enable promiscuous/multicast/normal mode");
>> +}
>> +
>> +static void lan865x_set_multicast_list(struct net_device *netdev)
>> +{
>> +     struct lan865x_priv *priv = netdev_priv(netdev);
>> +
>> +     schedule_work(&priv->multicast_work);
>> +}
>> +
>> +static netdev_tx_t lan865x_send_packet(struct sk_buff *skb,
>> +                                    struct net_device *netdev)
>> +{
>> +     struct lan865x_priv *priv = netdev_priv(netdev);
>> +
>> +     return oa_tc6_start_xmit(priv->tc6, skb);
>> +}
>> +
>> +static int lan865x_hw_disable(struct lan865x_priv *priv)
>> +{
>> +     u32 regval;
>> +
>> +     if (oa_tc6_read_register(priv->tc6, LAN865X_REG_MAC_NET_CTL, ®val))
>> +             return -ENODEV;
>> +
>> +     regval &= ~(MAC_NET_CTL_TXEN | MAC_NET_CTL_RXEN);
>> +
>> +     if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_NET_CTL, regval))
>> +             return -ENODEV;
>> +
>> +     return 0;
>> +}
>> +
>> +static int lan865x_net_close(struct net_device *netdev)
>> +{
>> +     struct lan865x_priv *priv = netdev_priv(netdev);
>> +     int ret;
>> +
>> +     netif_stop_queue(netdev);
>> +     phy_stop(netdev->phydev);
>> +     ret = lan865x_hw_disable(priv);
>> +     if (ret) {
>> +             netdev_err(netdev, "Failed to disable the hardware: %d\n", ret);
>> +             return ret;
>> +     }
>> +
>> +     return 0;
>> +}
>> +
>> +static int lan865x_hw_enable(struct lan865x_priv *priv)
>> +{
>> +     u32 regval;
>> +
>> +     if (oa_tc6_read_register(priv->tc6, LAN865X_REG_MAC_NET_CTL, ®val))
>> +             return -ENODEV;
>> +
>> +     regval |= MAC_NET_CTL_TXEN | MAC_NET_CTL_RXEN;
>> +
>> +     if (oa_tc6_write_register(priv->tc6, LAN865X_REG_MAC_NET_CTL, regval))
>> +             return -ENODEV;
>> +
>> +     return 0;
>> +}
>> +
>> +static int lan865x_net_open(struct net_device *netdev)
>> +{
>> +     struct lan865x_priv *priv = netdev_priv(netdev);
>> +     int ret;
>> +
>> +     ret = lan865x_hw_enable(priv);
>> +     if (ret) {
>> +             netdev_err(netdev, "Failed to enable hardware: %d\n", ret);
>> +             return ret;
>> +     }
>> +
>> +     phy_start(netdev->phydev);
>> +
>> +     return 0;
>> +}
>> +
>> +static const struct net_device_ops lan865x_netdev_ops = {
>> +     .ndo_open               = lan865x_net_open,
>> +     .ndo_stop               = lan865x_net_close,
>> +     .ndo_start_xmit         = lan865x_send_packet,
>> +     .ndo_set_rx_mode        = lan865x_set_multicast_list,
>> +     .ndo_set_mac_address    = lan865x_set_mac_address,
>> +};
>> +
>> +static int lan865x_set_zarfe(struct lan865x_priv *priv)
>> +{
>> +     u32 regval;
>> +     int ret;
>> +
>> +     ret = oa_tc6_read_register(priv->tc6, OA_TC6_REG_CONFIG0, ®val);
>> +     if (ret)
>> +             return ret;
>> +
>> +     /* Set Zero-Align Receive Frame Enable */
>> +     regval |= CONFIG0_ZARFE_ENABLE;
>> +
>> +     return oa_tc6_write_register(priv->tc6, OA_TC6_REG_CONFIG0, regval);
>> +}
>> +
>> +static int lan865x_probe(struct spi_device *spi)
>> +{
>> +     struct net_device *netdev;
>> +     struct lan865x_priv *priv;
>> +     int ret;
>> +
>> +     netdev = alloc_etherdev(sizeof(struct lan865x_priv));
>> +     if (!netdev)
>> +             return -ENOMEM;
>> +
>> +     priv = netdev_priv(netdev);
>> +     priv->netdev = netdev;
>> +     priv->spi = spi;
>> +     spi_set_drvdata(spi, priv);
>> +     INIT_WORK(&priv->multicast_work, lan865x_multicast_work_handler);
>> +
>> +     priv->tc6 = oa_tc6_init(spi, netdev);
>> +     if (!priv->tc6) {
>> +             ret = -ENODEV;
>> +             goto free_netdev;
>> +     }
>> +
>> +     /* As per the point s3 in the below errata, SPI receive Ethernet frame
>> +      * transfer may halt when starting the next frame in the same data block
>> +      * (chunk) as the end of a previous frame. The RFA field should be
>> +      * configured to 01b or 10b for proper operation. In these modes, only
>> +      * one receive Ethernet frame will be placed in a single data block.
>> +      * When the RFA field is written to 01b, received frames will be forced
>> +      * to only start in the first word of the data block payload (SWO=0). As
>> +      * recommended, ZARFE bit in the OPEN Alliance CONFIG0 register is set
>> +      * to 1 for proper operation.
>> +      *
>> +      * https://ww1.microchip.com/downloads/aemDocuments/documents/AIS/ProductDocuments/Errata/LAN8650-1-Errata-80001075.pdf
>> +      */
>> +     ret = lan865x_set_zarfe(priv);
>> +     if (ret) {
>> +             dev_err(&spi->dev, "Failed to set ZARFE: %d\n", ret);
>> +             goto oa_tc6_exit;
>> +     }
>> +
>> +     /* Get the MAC address from the SPI device tree node */
>> +     if (device_get_ethdev_address(&spi->dev, netdev))
>> +             eth_hw_addr_random(netdev);
>> +
>> +     ret = lan865x_set_hw_macaddr(priv, netdev->dev_addr);
>> +     if (ret) {
>> +             dev_err(&spi->dev, "Failed to configure MAC: %d\n", ret);
>> +             goto oa_tc6_exit;
>> +     }
>> +
>> +     netdev->if_port = IF_PORT_10BASET;
>> +     netdev->irq = spi->irq;
>> +     netdev->netdev_ops = &lan865x_netdev_ops;
>> +     netdev->ethtool_ops = &lan865x_ethtool_ops;
>> +
>> +     ret = register_netdev(netdev);
>> +     if (ret) {
>> +             dev_err(&spi->dev, "Register netdev failed (ret = %d)", ret);
>> +             goto oa_tc6_exit;
>> +     }
>> +
>> +     return 0;
>> +
>> +oa_tc6_exit:
>> +     oa_tc6_exit(priv->tc6);
>> +free_netdev:
>> +     free_netdev(priv->netdev);
>> +     return ret;
>> +}
>> +
>> +static void lan865x_remove(struct spi_device *spi)
>> +{
>> +     struct lan865x_priv *priv = spi_get_drvdata(spi);
>> +
>> +     cancel_work_sync(&priv->multicast_work);
>> +     unregister_netdev(priv->netdev);
>> +     oa_tc6_exit(priv->tc6);
>> +     free_netdev(priv->netdev);
>> +}
>> +
>> +static const struct of_device_id lan865x_dt_ids[] = {
>> +     { .compatible = "microchip,lan8651", "microchip,lan8650" },
>> +     { /* Sentinel */ }
>> +};
>> +MODULE_DEVICE_TABLE(of, lan865x_dt_ids);
>> +
>> +static struct spi_driver lan865x_driver = {
>> +     .driver = {
>> +             .name = DRV_NAME,
>> +             .of_match_table = lan865x_dt_ids,
>> +      },
>> +     .probe = lan865x_probe,
>> +     .remove = lan865x_remove,
>> +};
>> +module_spi_driver(lan865x_driver);
>> +
>> +MODULE_DESCRIPTION(DRV_NAME " 10Base-T1S MACPHY Ethernet Driver");
>> +MODULE_AUTHOR("Parthiban Veerasooran <parthiban.veerasooran@microchip.com>");
>> +MODULE_LICENSE("GPL");
>> --
>> 2.34.1
>