Message ID | 20200328003148.498021-1-marex@denx.de |
---|---|
Headers | show |
Series | net: ks8851: Unify KS8851 SPI and MLL drivers | expand |
On 3/28/20 1:31 AM, Marek Vasut wrote: > The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces > of silicon, except the former has an SPI interface, while the later has a > parallel bus interface. Thus far, Linux has two separate drivers for each > and they are diverging considerably. > > This series unifies them into a single driver with small SPI and parallel > bus specific parts. The approach here is to first separate out the SPI > specific parts into a separate file, then add parallel bus accessors in > another separate file and then finally remove the old parallel bus driver. > The reason for replacing the old parallel bus driver is because the SPI > bus driver is much higher quality. > > NOTE: This series depends on "net: ks8851-ml: Fix IO operations, again" > > NOTE: The performance regression on KS8851-16MLL is now fixes, the TX > throughput is back to ~75 Mbit/s , RX is still 50 Mbit/s . I also managed to implement RX NAPI (see attached demo patch if you want to measure something on the SPI variant), but the RX performance didn't improve dramatically. It's been 52 Mbit/s, with the RX NAPI it is 56 Mbit/s, on the parallel variant. [...]
On Sat, Mar 28, 2020 at 01:31:30AM +0100, Marek Vasut wrote: > The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces > of silicon, except the former has an SPI interface, while the later has a > parallel bus interface. Thus far, Linux has two separate drivers for each > and they are diverging considerably. > > This series unifies them into a single driver with small SPI and parallel > bus specific parts. The approach here is to first separate out the SPI > specific parts into a separate file, then add parallel bus accessors in > another separate file and then finally remove the old parallel bus driver. > The reason for replacing the old parallel bus driver is because the SPI > bus driver is much higher quality. Sorry for the delay Marek. ks8851.ko (SPI variant) no longer compiles with this series. The attached 0001 patch fixes it. Both drivers can only be compiled as modules with this series: If only one of them is built-in, there's a linker error because of the module_param_named() for msg_enable. If both are built-in, the symbol collisions you've mentioned occur. It seems Kbuild can't support building a .o file with a different name than the corresponding .c file because of the implicit rules used everywhere. However, ks8851_common.c can be renamed to be a header file (a library of sorts) which is included by the two .c files. I've renamed ks8851_spi.c back to the original ks8851.c and ks8851_par.c back to ks8851_mll.c. The result is the attached 0002 patch. Compiles without any errors regardless if one or both drivers are built-in or modules. I'll be back at the office this week and will conduct performance measurements with this version. Thanks, Lukas
On 4/6/20 5:16 AM, Lukas Wunner wrote: > On Sat, Mar 28, 2020 at 01:31:30AM +0100, Marek Vasut wrote: >> The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces >> of silicon, except the former has an SPI interface, while the later has a >> parallel bus interface. Thus far, Linux has two separate drivers for each >> and they are diverging considerably. >> >> This series unifies them into a single driver with small SPI and parallel >> bus specific parts. The approach here is to first separate out the SPI >> specific parts into a separate file, then add parallel bus accessors in >> another separate file and then finally remove the old parallel bus driver. >> The reason for replacing the old parallel bus driver is because the SPI >> bus driver is much higher quality. > > Sorry for the delay Marek. > > ks8851.ko (SPI variant) no longer compiles with this series. > The attached 0001 patch fixes it. > > Both drivers can only be compiled as modules with this series: > If only one of them is built-in, there's a linker error because of > the module_param_named() for msg_enable. > If both are built-in, the symbol collisions you've mentioned occur. > > It seems Kbuild can't support building a .o file with a different name > than the corresponding .c file because of the implicit rules used > everywhere. However, ks8851_common.c can be renamed to be a header > file (a library of sorts) which is included by the two .c files. > I've renamed ks8851_spi.c back to the original ks8851.c and > ks8851_par.c back to ks8851_mll.c. The result is the attached 0002 patch. > Compiles without any errors regardless if one or both drivers are > built-in or modules. > > I'll be back at the office this week and will conduct performance > measurements with this version. This looks like a hack, I'm more inclined to go back to using callbacks for the various functions, since I don't see any performance problems there. We're still talking about 25 MHz SPI bus here and the SPI subsystem is full of such indirection anyway, so I'm not even sure what you're hoping to gain here ; it seems to me like a premature optimization which only causes trouble.
On 4/6/20 1:20 PM, Marek Vasut wrote: > On 4/6/20 5:16 AM, Lukas Wunner wrote: >> On Sat, Mar 28, 2020 at 01:31:30AM +0100, Marek Vasut wrote: >>> The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces >>> of silicon, except the former has an SPI interface, while the later has a >>> parallel bus interface. Thus far, Linux has two separate drivers for each >>> and they are diverging considerably. >>> >>> This series unifies them into a single driver with small SPI and parallel >>> bus specific parts. The approach here is to first separate out the SPI >>> specific parts into a separate file, then add parallel bus accessors in >>> another separate file and then finally remove the old parallel bus driver. >>> The reason for replacing the old parallel bus driver is because the SPI >>> bus driver is much higher quality. >> >> Sorry for the delay Marek. >> >> ks8851.ko (SPI variant) no longer compiles with this series. >> The attached 0001 patch fixes it. >> >> Both drivers can only be compiled as modules with this series: >> If only one of them is built-in, there's a linker error because of >> the module_param_named() for msg_enable. >> If both are built-in, the symbol collisions you've mentioned occur. >> >> It seems Kbuild can't support building a .o file with a different name >> than the corresponding .c file because of the implicit rules used >> everywhere. However, ks8851_common.c can be renamed to be a header >> file (a library of sorts) which is included by the two .c files. >> I've renamed ks8851_spi.c back to the original ks8851.c and >> ks8851_par.c back to ks8851_mll.c. The result is the attached 0002 patch. >> Compiles without any errors regardless if one or both drivers are >> built-in or modules. >> >> I'll be back at the office this week and will conduct performance >> measurements with this version. > > This looks like a hack, I'm more inclined to go back to using callbacks > for the various functions, since I don't see any performance problems > there. We're still talking about 25 MHz SPI bus here and the SPI > subsystem is full of such indirection anyway, so I'm not even sure what > you're hoping to gain here ; it seems to me like a premature > optimization which only causes trouble. So I got the KS8851SNL development kit to test the SPI option. The current driver in linux-next gives me SPI transfer timeouts at 1 MHz (I also tried 6 MHz, 10 MHz, same problem, I also checked the signal quality which is OK) both on iMX6Q and on STM32MP1 with ~2 cm long wiring between the SoM and the KS8851SNL devkit, so that's where my testing ends, sadly. Unless you have some idea what the problem might be ? That said, I at least did test of the impact of the pointer indirection on the KS8851-16MLL with [1] (direct access test is with the top 7 patches removed, indirect access test is with the branch as-is). The KS8851-16MLL also has higher bandwidth, so it calls the IO accessors more often than the SPI variant, and so it's likely more relevant when testing whether their overhead matters. - SoC is stm32mp157c at 650 MHz - link partner Intel Corporation 82579LM Gigabit Network Connection (Lewisville) (rev 06) - Test commands (repeated thrice, averaged, after three discarded runs to prime the system): $ iperf -sei 1 # RX $ iperf -c 192.168.1.1 -ei 1 # TX | accessor | RX | TX KS8851-16MLL | indirect | 52.86 Mbit/s | 74.03 Mbit/s KS8851-16MLL | direct | 53.80 Mbit/s | 73.90 Mbit/s There is ~1 Mbit/s throughput increase on the RX average, however the throughput fluctuates by a few Mbit/s during the iperf RX tests between 50.7 Mbit/s and 54.1 Mbit/s. There is also some decrease on TX, but it is marginal. Interestingly enough, the TX test throughput fluctuates less than RX, only by small hundreds of kbit/s. I'd say that the pointer indirection doesn't play any role here and it is better to keep the driver simple and go with it, unless the SPI option is affected, which I doubt it is. [1] https://git.kernel.org/pub/scm/linux/kernel/git/marex/linux-2.6.git/log/?h=ks8851-v4
On Wed, Apr 08, 2020 at 09:49:14PM +0200, Marek Vasut wrote: > On 4/6/20 1:20 PM, Marek Vasut wrote: > > On 4/6/20 5:16 AM, Lukas Wunner wrote: > > > I'll be back at the office this week and will conduct performance > > > measurements with this version. Latency without this series (ping -A -c 100000): 100000 packets transmitted, 100000 received, 0% packet loss, time 205ms rtt min/avg/max/mdev = 0.790/1.695/2.624/0.046 ms, ipg/ewma 2.000/1.693 ms ^^^^^ Latency with this series: 100000 packets transmitted, 100000 received, 0% packet loss, time 220ms rtt min/avg/max/mdev = 0.880/1.717/2.428/0.059 ms, ipg/ewma 2.000/1.710 ms Latency with this series + my patch to inline accessors: 100000 packets transmitted, 100000 received, 0% packet loss, time 219ms rtt min/avg/max/mdev = 0.874/1.716/3.296/0.058 ms, ipg/ewma 2.000/1.719 ms RX throughput without this series (iperf3 -f k -i 0 -c): [ 5] 0.00-10.00 sec 19.0 MBytes 15958 Kbits/sec receiver RX throughput with this series: [ 5] 0.00-10.00 sec 18.4 MBytes 15452 Kbits/sec receiver RX throughput with this series + my patch to inline accessors: [ 5] 0.00-10.00 sec 18.5 MBytes 15506 Kbits/sec receiver TX throughput without this series (iperf3 -R -f k -i 0 -c): [ 5] 0.00-10.00 sec 18.6 MBytes 15604 Kbits/sec receiver TX throughput with this series: [ 5] 0.00-10.00 sec 18.3 MBytes 15314 Kbits/sec receiver TX throughput with this series + my patch to inline accessors: [ 5] 0.00-10.00 sec 18.3 MBytes 15349 Kbits/sec receiver The commands were invoked from a machine with a Broadcom tg3 Gigabit Ethernet controller. Conclusion: The series does incur a measurable performance penalty which should be fixed before it gets merged. Inlining the accessors only yields a very small improvement. I'm wondering where the performance penalty is originating from: Perhaps because of the 16-bit read of RXFC in ks8851_rx_pkts()? > So I got the KS8851SNL development kit to test the SPI option. The > current driver in linux-next gives me SPI transfer timeouts at 1 MHz (I > also tried 6 MHz, 10 MHz, same problem, I also checked the signal > quality which is OK) both on iMX6Q and on STM32MP1 with ~2 cm long > wiring between the SoM and the KS8851SNL devkit, so that's where my > testing ends, sadly. Unless you have some idea what the problem might be ? Check the signal quality with an oscilloscope. Crank up drive strength of the SPI pins if supported by the pin controller. Check if the driver for the SPI controller is buggy or somehow limits the speed. We use the KSZ8851SNL both with STM32-based products (at 20 MHz SPI clock, but without an operating system) and with Raspberry Pi based products (at up to 25 MHz SPI clock, with Linux). It is only the latter that I'm really familiar with. We've invested considerable resources to fix bugs and speed up the Raspberry Pi's SPI driver as much as possible. By now it works pretty well with the ks8851. A Raspberry Pi 3 is readily available for just a few Euros, so you may want to pick up one of those. My most recent speed improvements for the Raspberry Pi's SPI and DMA drivers went into v5.4, so be sure to use at least that. Should signal quality be a problem, it's possible to increase GPIO drive strength by putting a dt-blob.bin in the /boot partition: https://www.raspberrypi.org/documentation/configuration/pin-configuration.md I forgot to test whether reading the MAC address from the external EEPROM works with v3 of your series, but I'll be back in the office on Tuesday to take another look. Thanks, Lukas
On 4/10/20 1:01 PM, Lukas Wunner wrote: > On Wed, Apr 08, 2020 at 09:49:14PM +0200, Marek Vasut wrote: >> On 4/6/20 1:20 PM, Marek Vasut wrote: >>> On 4/6/20 5:16 AM, Lukas Wunner wrote: >>>> I'll be back at the office this week and will conduct performance >>>> measurements with this version. > > Latency without this series (ping -A -c 100000): > > 100000 packets transmitted, 100000 received, 0% packet loss, time 205ms > rtt min/avg/max/mdev = 0.790/1.695/2.624/0.046 ms, ipg/ewma 2.000/1.693 ms > ^^^^^ > Latency with this series: > > 100000 packets transmitted, 100000 received, 0% packet loss, time 220ms > rtt min/avg/max/mdev = 0.880/1.717/2.428/0.059 ms, ipg/ewma 2.000/1.710 ms > > Latency with this series + my patch to inline accessors: > > 100000 packets transmitted, 100000 received, 0% packet loss, time 219ms > rtt min/avg/max/mdev = 0.874/1.716/3.296/0.058 ms, ipg/ewma 2.000/1.719 ms > > > RX throughput without this series (iperf3 -f k -i 0 -c): > > [ 5] 0.00-10.00 sec 19.0 MBytes 15958 Kbits/sec receiver > > RX throughput with this series: > > [ 5] 0.00-10.00 sec 18.4 MBytes 15452 Kbits/sec receiver > > RX throughput with this series + my patch to inline accessors: > > [ 5] 0.00-10.00 sec 18.5 MBytes 15506 Kbits/sec receiver > > > TX throughput without this series (iperf3 -R -f k -i 0 -c): > > [ 5] 0.00-10.00 sec 18.6 MBytes 15604 Kbits/sec receiver > > TX throughput with this series: > > [ 5] 0.00-10.00 sec 18.3 MBytes 15314 Kbits/sec receiver > > TX throughput with this series + my patch to inline accessors: > > [ 5] 0.00-10.00 sec 18.3 MBytes 15349 Kbits/sec receiver > > > The commands were invoked from a machine with a Broadcom tg3 > Gigabit Ethernet controller. It would be helpful if we could at least run the same test, so the results are comparable. > Conclusion: The series does incur a measurable performance penalty > which should be fixed before it gets merged. Inlining the accessors > only yields a very small improvement. OK, so we finally agree on this, thanks. > I'm wondering where the > performance penalty is originating from: Perhaps because of the > 16-bit read of RXFC in ks8851_rx_pkts()? Can you patch that part away to see whether that's the case ? >> So I got the KS8851SNL development kit to test the SPI option. The >> current driver in linux-next gives me SPI transfer timeouts at 1 MHz (I >> also tried 6 MHz, 10 MHz, same problem, I also checked the signal >> quality which is OK) both on iMX6Q and on STM32MP1 with ~2 cm long >> wiring between the SoM and the KS8851SNL devkit, so that's where my >> testing ends, sadly. Unless you have some idea what the problem might be ? > > Check the signal quality with an oscilloscope. > Crank up drive strength of the SPI pins if supported by the pin controller. I already tried both, got nothing useful. > Check if the driver for the SPI controller is buggy or somehow limits the > speed. I used two different drivers -- the iMX SPI and the STM32 SPI -- I would say that if both show the same behavior, it's unlikely to be the driver. > We use the KSZ8851SNL both with STM32-based products (at 20 MHz SPI clock, > but without an operating system) and with Raspberry Pi based products > (at up to 25 MHz SPI clock, with Linux). It is only the latter that > I'm really familiar with. We've invested considerable resources to > fix bugs and speed up the Raspberry Pi's SPI driver as much as possible. > By now it works pretty well with the ks8851. > > A Raspberry Pi 3 is readily available for just a few Euros, so you may > want to pick up one of those. My most recent speed improvements for the > Raspberry Pi's SPI and DMA drivers went into v5.4, so be sure to use at > least that. This is based off linux-next, so that shouldn't be a problem. I'll try to set up the RPi3 and see if that's any better. > Should signal quality be a problem, it's possible to > increase GPIO drive strength by putting a dt-blob.bin in the /boot > partition: > > https://www.raspberrypi.org/documentation/configuration/pin-configuration.md > > I forgot to test whether reading the MAC address from the external > EEPROM works with v3 of your series, but I'll be back in the office > on Tuesday to take another look. While you're at it, can you take a look at the latency ? I think you should be able to git-bisect it rather easily. Thanks
The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces of silicon, except the former has an SPI interface, while the later has a parallel bus interface. Thus far, Linux has two separate drivers for each and they are diverging considerably. This series unifies them into a single driver with small SPI and parallel bus specific parts. The approach here is to first separate out the SPI specific parts into a separate file, then add parallel bus accessors in another separate file and then finally remove the old parallel bus driver. The reason for replacing the old parallel bus driver is because the SPI bus driver is much higher quality. NOTE: This series depends on "net: ks8851-ml: Fix IO operations, again" NOTE: The performance regression on KS8851-16MLL is now fixes, the TX throughput is back to ~75 Mbit/s , RX is still 50 Mbit/s . Marek Vasut (18): net: ks8851: Factor out spi->dev in probe()/remove() net: ks8851: Rename ndev to netdev in probe net: ks8851: Replace dev_err() with netdev_err() in IRQ handler net: ks8851: Pass device node into ks8851_init_mac() net: ks8851: Use devm_alloc_etherdev() net: ks8851: Use dev_{get,set}_drvdata() net: ks8851: Remove ks8851_rdreg32() net: ks8851: Use 16-bit writes to program MAC address net: ks8851: Use 16-bit read of RXFC register net: ks8851: Factor out bus lock handling net: ks8851: Factor out SKB receive function net: ks8851: Split out SPI specific entries in struct ks8851_net net: ks8851: Split out SPI specific code from probe() and remove() net: ks8851: Factor out TX work flush function net: ks8851: Permit overridding interrupt enable register net: ks8851: Separate SPI operations into separate file net: ks8851: Implement Parallel bus operations net: ks8851: Remove ks8851_mll.c drivers/net/ethernet/micrel/Kconfig | 2 + drivers/net/ethernet/micrel/Makefile | 2 + drivers/net/ethernet/micrel/ks8851.h | 142 +- .../micrel/{ks8851.c => ks8851_common.c} | 688 ++------ drivers/net/ethernet/micrel/ks8851_mll.c | 1393 ----------------- drivers/net/ethernet/micrel/ks8851_par.c | 337 ++++ drivers/net/ethernet/micrel/ks8851_spi.c | 448 ++++++ 7 files changed, 1027 insertions(+), 1985 deletions(-) rename drivers/net/ethernet/micrel/{ks8851.c => ks8851_common.c} (60%) delete mode 100644 drivers/net/ethernet/micrel/ks8851_mll.c create mode 100644 drivers/net/ethernet/micrel/ks8851_par.c create mode 100644 drivers/net/ethernet/micrel/ks8851_spi.c Signed-off-by: Marek Vasut <marex@denx.de> Cc: David S. Miller <davem@davemloft.net> Cc: Lukas Wunner <lukas@wunner.de> Cc: Petr Stetiar <ynezz@true.cz> Cc: YueHaibing <yuehaibing@huawei.com>