Message ID | 20081029202027.GH12879@ovro.caltech.edu (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Wed, 29 Oct 2008 13:20:27 -0700 Ira Snyder <iws@ovro.caltech.edu> wrote: > This adds support to Linux for a virtual ethernet interface which uses the > PCI bus as its transport mechanism. It creates a simple, familiar, and fast > method of communication for two devices connected by a PCI interface. > > I have implemented client support for the Freescale MPC8349EMDS board, > which is capable of running in PCI Agent mode (It acts like a PCI card, but > is a complete PowerPC computer, running Linux). It is almost certainly > trivially ported to any MPC83xx system. It should be a relatively small > effort to port it to any chip that can generate PCI interrupts and has at > least one PCI accessible scratch register. > > It was developed to work in a CompactPCI crate of computers, one of which > is a relatively standard x86 system (acting as the host) and many PowerPC > systems (acting as clients). > > RFC v1 -> RFC v2: > * remove vim modelines > * use net_device->name in request_irq(), for irqbalance > * remove unnecessary wqt_get_stats(), use default get_stats() instead > * use dev_printk() and friends > * add message unit to MPC8349EMDS dts file > > Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> > --- > > This is the third posting of this driver. I got some feedback, and have > corrected the problems. Stephen, thanks for the review! I also got some > off-list feedback from a potential user, so I believe this is worth getting > into mainline. > > I'll post up a revised version about once a week as long as the changes are > minor. If they are more substantial, I'll post them as needed. > > GregKH: is this worth putting into the staging tree? (I left you out of the > CC list to keep your email traffic down) > > The remaining issues I see in this driver: > 1) ==== Naming ==== > The name wqt originally stood for "workqueue-test" and somewhat evolved > over time into the current driver. I'm looking for suggestions for a > better name. It should be the same between the host and client drivers, > to make porting the code between them easier. The drivers are /very/ > similar other than the setup code. > 2) ==== IMMR Usage ==== > In the Freescale client driver, I use the whole set of board control > registers (AKA IMMR registers). I only need a very small subset of them, > during startup to set up the DMA window. I used the full set of > registers so that I could share the register offsets between the drivers > (in pcinet_hw.h) > 3) ==== ioremap() of a physical address ==== > In the Freescale client driver, I called ioremap() with the physical > address of the IMMR registers, 0xe0000000. I don't know a better way to > get them. They are somewhat exposed in the Flat Device Tree. Suggestions > on how to extract them are welcome. > 4) ==== Hardcoded DMA Window Address ==== > In the Freescale client driver, I just hardcoded the address of the > outbound PCI window into the DMA transfer code. It is 0x80000000. > Suggestions on how to get this value at runtime are welcome. > > > Rationale behind some decisions: > 1) ==== Usage of the PCINET_NET_REGISTERS_VALID bit ==== > I want to be able to use this driver from U-Boot to tftp a kernel over > the PCI backplane, and then boot up the board. This means that the > device descriptor memory, which lives in the client RAM, becomes invalid > during boot. > 2) ==== Buffer Descriptors in client memory ==== > I chose to put the buffer descriptors in client memory rather than host > memory. It seemed more logical to me at the time. In my application, > I'll have 19 boards + 1 host per cPCI chassis. The client -> host > direction will see most of the traffic, and so I thought I would cut > down on the number of PCI accesses needed. I'm willing to change this. > 3) ==== Usage of client DMA controller for all data transfer ==== > This was done purely for speed. I tried using the CPU to transfer all > data, and it is very slow: ~3MB/sec. Using the DMA controller gets me to > ~40MB/sec (as tested with netperf). > 4) ==== Static 1GB DMA window ==== > Maintaining a window while DMA's in flight, and then changing it seemed > too complicated. Also, testing showed that using a static window gave me > a ~10MB/sec speedup compared to moving the window for each skb. > 5) ==== The serial driver ==== > Yes, there are two essentially separate drivers here. I needed a method > to communicate with the U-Boot bootloader on these boards without > plugging in a serial cable. With 19 clients + 1 host per chassis, the > cable clutter is worth avoiding. Since everything is connected via the > PCI bus anyway, I used that. A virtual serial port was simple to > implement using the messaging unit hardware that I used for the network > driver. > > I'll post both U-Boot drivers to their mailing list once this driver is > finalized. > > Thanks, > Ira > > > arch/powerpc/boot/dts/mpc834x_mds.dts | 7 + > drivers/net/Kconfig | 34 + > drivers/net/Makefile | 3 + > drivers/net/pcinet.h | 75 ++ > drivers/net/pcinet_fsl.c | 1351 ++++++++++++++++++++++++++++++++ > drivers/net/pcinet_host.c | 1383 +++++++++++++++++++++++++++++++++ > drivers/net/pcinet_hw.h | 77 ++ > 7 files changed, 2930 insertions(+), 0 deletions(-) > create mode 100644 drivers/net/pcinet.h > create mode 100644 drivers/net/pcinet_fsl.c > create mode 100644 drivers/net/pcinet_host.c > create mode 100644 drivers/net/pcinet_hw.h > > diff --git a/arch/powerpc/boot/dts/mpc834x_mds.dts b/arch/powerpc/boot/dts/mpc834x_mds.dts > index c986c54..3bc8975 100644 > --- a/arch/powerpc/boot/dts/mpc834x_mds.dts > +++ b/arch/powerpc/boot/dts/mpc834x_mds.dts > @@ -104,6 +104,13 @@ > mode = "cpu"; > }; > > + message-unit@8030 { > + compatible = "fsl,mpc8349-mu"; > + reg = <0x8030 0xd0>; > + interrupts = <69 0x8>; > + interrupt-parent = <&ipic>; > + }; > + > dma@82a8 { > #address-cells = <1>; > #size-cells = <1>; > diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig > index f749b40..eef7af7 100644 > --- a/drivers/net/Kconfig > +++ b/drivers/net/Kconfig > @@ -2279,6 +2279,40 @@ config UGETH_TX_ON_DEMAND > bool "Transmit on Demand support" > depends on UCC_GETH > > +config PCINET_FSL > + tristate "PCINet Virtual Ethernet over PCI support (Freescale)" > + depends on MPC834x_MDS && !PCI > + select DMA_ENGINE > + select FSL_DMA > + help > + When running as a PCI Agent, this driver will create a virtual > + ethernet link running over the PCI bus, allowing simplified > + communication with the host system. The host system will need > + to use the corresponding driver. > + > + If in doubt, say N. > + > +config PCINET_HOST > + tristate "PCINet Virtual Ethernet over PCI support (Host)" > + depends on PCI > + help > + This driver will let you communicate with a PCINet client device > + using a virtual ethernet link running over the PCI bus. This > + allows simplified communication with the client system. > + > + This is inteded for use in a system that has a crate full of > + computers running Linux, all connected by a PCI backplane. > + > + If in doubt, say N. > + > +config PCINET_DISABLE_CHECKSUM > + bool "Disable packet checksumming" > + depends on PCINET_FSL || PCINET_HOST > + default n > + help > + Disable packet checksumming on packets received by the PCINet > + driver. This gives a possible speed boost. > + > config MV643XX_ETH > tristate "Marvell Discovery (643XX) and Orion ethernet support" > depends on MV64360 || MV64X60 || (PPC_MULTIPLATFORM && PPC32) || PLAT_ORION > diff --git a/drivers/net/Makefile b/drivers/net/Makefile > index f19acf8..c6fbafc 100644 > --- a/drivers/net/Makefile > +++ b/drivers/net/Makefile > @@ -30,6 +30,9 @@ gianfar_driver-objs := gianfar.o \ > obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o > ucc_geth_driver-objs := ucc_geth.o ucc_geth_mii.o ucc_geth_ethtool.o > > +obj-$(CONFIG_PCINET_FSL) += pcinet_fsl.o > +obj-$(CONFIG_PCINET_HOST) += pcinet_host.o > + > # > # link order important here > # > diff --git a/drivers/net/pcinet.h b/drivers/net/pcinet.h > new file mode 100644 > index 0000000..66d2cba > --- /dev/null > +++ b/drivers/net/pcinet.h > @@ -0,0 +1,75 @@ > +/* > + * Shared Definitions for the PCINet / PCISerial drivers > + * > + * Copyright (c) 2008 Ira W. Snyder <iws@ovro.caltech.edu> > + * > + * Heavily inspired by the drivers/net/fs_enet driver > + * > + * This file is licensed under the terms of the GNU General Public License > + * version 2. This program is licensed "as is" without any warranty of any > + * kind, whether express or implied. > + */ > + > +#ifndef PCINET_H > +#define PCINET_H > + > +#include <linux/kernel.h> > +#include <linux/if_ether.h> > + > +/* Ring and Frame size -- these must match between the drivers */ > +#define PH_RING_SIZE (64) > +#define PH_MAX_FRSIZE (64 * 1024) > +#define PH_MAX_MTU (PH_MAX_FRSIZE - ETH_HLEN) > + > +struct circ_buf_desc { > + __le32 sc; > + __le32 len; > + __le32 addr; > +} __attribute__((__packed__)); > +typedef struct circ_buf_desc cbd_t; Don't use typedef. See chapter 5 of Documentation/CodingStyle Do you really need packed here, sometime gcc generates much worse code on needlessly packed structures?
On Wed, Oct 29, 2008 at 01:25:06PM -0700, Stephen Hemminger wrote: > On Wed, 29 Oct 2008 13:20:27 -0700 > Ira Snyder <iws@ovro.caltech.edu> wrote: Big snip... > > diff --git a/drivers/net/pcinet.h b/drivers/net/pcinet.h > > new file mode 100644 > > index 0000000..66d2cba > > --- /dev/null > > +++ b/drivers/net/pcinet.h > > @@ -0,0 +1,75 @@ > > +/* > > + * Shared Definitions for the PCINet / PCISerial drivers > > + * > > + * Copyright (c) 2008 Ira W. Snyder <iws@ovro.caltech.edu> > > + * > > + * Heavily inspired by the drivers/net/fs_enet driver > > + * > > + * This file is licensed under the terms of the GNU General Public License > > + * version 2. This program is licensed "as is" without any warranty of any > > + * kind, whether express or implied. > > + */ > > + > > +#ifndef PCINET_H > > +#define PCINET_H > > + > > +#include <linux/kernel.h> > > +#include <linux/if_ether.h> > > + > > +/* Ring and Frame size -- these must match between the drivers */ > > +#define PH_RING_SIZE (64) > > +#define PH_MAX_FRSIZE (64 * 1024) > > +#define PH_MAX_MTU (PH_MAX_FRSIZE - ETH_HLEN) > > + > > +struct circ_buf_desc { > > + __le32 sc; > > + __le32 len; > > + __le32 addr; > > +} __attribute__((__packed__)); > > +typedef struct circ_buf_desc cbd_t; > > Don't use typedef. See chapter 5 of Documentation/CodingStyle > I know about this. I was following the example set forth in drivers/net/fs_enet. I tried to make this driver somewhat similar to that driver, but I'm not against changing this to a struct everywhere. > Do you really need packed here, sometime gcc generates much worse code > on needlessly packed structures? I'm pretty sure I do. I share this structure between both devices over the PCI bus. This is where the physical addresses of the skb's go so that the DMA controller can transfer them. As an example, let's say I have a machine that places 32bit values on 64bit boundaries. AFAIK, the Freescale places 32bit values on 32bit boundaries. Now the two of them disagree about where the fields of the buffer descriptor are. Of course, I may be wrong. :) Comments? Thanks for the review, Ira
Ira Snyder wrote: > I know about this. I was following the example set forth in > drivers/net/fs_enet. I recommend against that. :-) -Scott
On Wed, Oct 29, 2008 at 03:54:46PM -0500, Scott Wood wrote: > Ira Snyder wrote: >> I know about this. I was following the example set forth in >> drivers/net/fs_enet. > > I recommend against that. :-) Great, now you tell me :) I'll go ahead change the typedef to a struct. Hopefully some of the other inspiration I took wasn't too bad. I'm totally open to suggestions on improvements for this driver. Ira
Ira Snyder wrote: > > I'll go ahead change the typedef to a struct. Hopefully some of the > other inspiration I took wasn't too bad. I'm totally open to suggestions > on improvements for this driver. Not a suggestion for improvement, but I have to say this is by far the single coolest thing I have EVER seen submitted to mainline :)
On Wednesday 29 October 2008, Ira Snyder wrote: > This adds support to Linux for a virtual ethernet interface which uses the > PCI bus as its transport mechanism. It creates a simple, familiar, and fast > method of communication for two devices connected by a PCI interface. Very interesting. I have seen multiple such drivers, and still plan to have one as well for the cell/axon platform. > This is the third posting of this driver. I got some feedback, and have > corrected the problems. Stephen, thanks for the review! I also got some > off-list feedback from a potential user, so I believe this is worth getting > into mainline. Sorry for not having replied earlier, I did not see first postings. While I did not see any major problems with your code, I think we should actually do it in a different way, which I have talked about at the Linux plumbers conference, and will again at the UKUUG Linux conference this weekend. You can find my slides from the previous talk at http://userweb.kernel.org/%7Earnd/papers/plumbers08/plumbers-slides.pdf Since you have code and I only have plans, I suppose I can't make you wait for me to get a driver for your platform, but I hope we can join forces in the future and share code in the future. My fundamental criticism of your code is that it is not flexible enough. We have a number of platforms that act as a PCI, PCIe or similar kind of device and want to transfer network data. At the same time, some of these want to use the same infrastructure for non-network data like MPI (ibverbs), block device or file system (9p). Even in your limited setup, it seems that you should be able to share more code between the two implementations you have posted here. When looking at the code, I found a lot of duplication between the drivers that you should be able to avoid. My suggestion is to base this on the virtio infrastructure, and some of my colleagues at IBM already implemented a driver that uses virtio-net on a PCIe connection. Arnd <>< Now a few comments on your code: > diff --git a/arch/powerpc/boot/dts/mpc834x_mds.dts b/arch/powerpc/boot/dts/mpc834x_mds.dts > index c986c54..3bc8975 100644 > --- a/arch/powerpc/boot/dts/mpc834x_mds.dts > +++ b/arch/powerpc/boot/dts/mpc834x_mds.dts > @@ -104,6 +104,13 @@ > mode = "cpu"; > }; > > + message-unit@8030 { > + compatible = "fsl,mpc8349-mu"; > + reg = <0x8030 0xd0>; > + interrupts = <69 0x8>; > + interrupt-parent = <&ipic>; > + }; What is a message unit? Is this the mailbox you use in the driver? We are facing a similar problem on Axon regarding probing of the virtual device because you can't easily tell from the device tree whether the machine is connected to something that is trying to communicate. The message-unit does not seem to be network-specific, so it's not particularly nice to have the network unit grab that of_device. > +config PCINET_FSL > + tristate "PCINet Virtual Ethernet over PCI support (Freescale)" > + depends on MPC834x_MDS && !PCI > + select DMA_ENGINE > + select FSL_DMA > + help > + When running as a PCI Agent, this driver will create a virtual > + ethernet link running over the PCI bus, allowing simplified > + communication with the host system. The host system will need > + to use the corresponding driver. > + > + If in doubt, say N. Why 'depends on !PCI'? This means that you cannot build a kernel that is able to run both as host and endpoint for PCInet, right? > +config PCINET_HOST > + tristate "PCINet Virtual Ethernet over PCI support (Host)" > + depends on PCI > + help > + This driver will let you communicate with a PCINet client device > + using a virtual ethernet link running over the PCI bus. This > + allows simplified communication with the client system. > + > + This is inteded for use in a system that has a crate full of > + computers running Linux, all connected by a PCI backplane. > + > + If in doubt, say N. Is this meant to be portable across different hardware implementations? If the driver can only work with PCINET_FSL on the other side, you should probably mention that in the description. > +config PCINET_DISABLE_CHECKSUM > + bool "Disable packet checksumming" > + depends on PCINET_FSL || PCINET_HOST > + default n > + help > + Disable packet checksumming on packets received by the PCINet > + driver. This gives a possible speed boost. Why make this one optional? Is there ever a reason to enable checksumming? If there is, how about making it tunable with ethtool? > +struct circ_buf_desc { > + __le32 sc; > + __le32 len; > + __le32 addr; > +} __attribute__((__packed__)); It would be useful to always force aligning the desciptors to the whole 32 bit and avoid the packing here. Unaligned accesses are inefficient on many systems. > +typedef struct circ_buf_desc cbd_t; Also, don't pass structures by value if they don't fit into one or two registers. > +/* Buffer Descriptor Accessors */ > +#define CBDW_SC(_cbd, _sc) iowrite32((_sc), &(_cbd)->sc) > +#define CBDW_LEN(_cbd, _len) iowrite32((_len), &(_cbd)->len) > +#define CBDW_ADDR(_cbd, _addr) iowrite32((_addr), &(_cbd)->addr) > + > +#define CBDR_SC(_cbd) ioread32(&(_cbd)->sc) > +#define CBDR_LEN(_cbd) ioread32(&(_cbd)->len) > +#define CBDR_ADDR(_cbd) ioread32(&(_cbd)->addr) We have found that accessing remote descriptors using mmio read is rather slow, and changed the code to always do local reads and remote writes. On the host side, I would argue that you should use out_le32 rather than iowrite32, because you are not operating on a PCI device but an OF device. > +/* IMMR Accessor Helpers */ > +#define IMMR_R32(_off) ioread32(priv->immr+(_off)) > +#define IMMR_W32(_off, _val) iowrite32((_val), priv->immr+(_off)) > +#define IMMR_R32BE(_off) ioread32be(priv->immr+(_off)) > +#define IMMR_W32BE(_off, _val) iowrite32be((_val), priv->immr+(_off)) IIRC, the IMMR is a platform resource, so the system should map those registers already and provide accessor functions for the registers you need. Simply allowing to pass an offset in here does not look clean. > +static void wqtuart_rx_char(struct uart_port *port, const char ch); > +static void wqtuart_stop_tx(struct uart_port *port); You should try to avoid forward declarations for static functions. If you order the function implementation correctly, that will also give you the expected reading order in the driver. > +struct wqt_dev; > +typedef void (*wqt_irqhandler_t)(struct wqt_dev *); Much more importantly, never do forward declarations for global functions! These belong into a header that is included by both the user and the definition. > +struct wqt_dev { > + /*--------------------------------------------------------------------*/ > + /* OpenFirmware Infrastructure */ > + /*--------------------------------------------------------------------*/ > + struct of_device *op; > + struct device *dev; Why the dev? You can always get that from the of_device, right? > + int irq; > + void __iomem *immr; I don't think immr is device specific. > + struct mutex irq_mutex; > + int interrupt_count; > + > + spinlock_t irq_lock; > + struct wqt_irqhandlers handlers; > + > + /*--------------------------------------------------------------------*/ > + /* UART Device Infrastructure */ > + /*--------------------------------------------------------------------*/ > + struct uart_port port; > + bool uart_rx_enabled; > + bool uart_open; hmm, if you need a uart, that sounds like it should be a separate driver altogether. What is the relation between the network interface and the UART? > + struct workqueue_struct *wq; A per-device pointer to a workqueue_struct is unusual. What are you doing this for? How about using the system work queue? > + /*--------------------------------------------------------------------*/ > + /* Ethernet Device Infrastructure */ > + /*--------------------------------------------------------------------*/ > + struct net_device *ndev; Why make this a separate structure? If you have one of these per net_device, you should embed the net_device into your own structure. > + struct tasklet_struct tx_complete_tasklet; Using a tasklet for tx processing sounds fishy because most of the network code already runs at softirq time. You do not gain anything by another softirq context. > +/*----------------------------------------------------------------------------*/ > +/* Status Register Helper Operations */ > +/*----------------------------------------------------------------------------*/ > + > +static DEFINE_SPINLOCK(status_lock); > + > +static void wqtstatus_setbit(struct wqt_dev *priv, u32 bit) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&status_lock, flags); > + IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) | bit); > + spin_unlock_irqrestore(&status_lock, flags); > +} > + > +static void wqtstatus_clrbit(struct wqt_dev *priv, u32 bit) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&status_lock, flags); > + IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) & ~bit); > + spin_unlock_irqrestore(&status_lock, flags); > +} This looks misplaced, as mentioned before. > +static int wqtstatus_remote_testbit(struct wqt_dev *priv, u32 bit) > +{ > + return IMMR_R32(IMR1_OFFSET) & bit; > +} Are you sure that do not need a spinlock here? The lock also prevents reordering the device access arount the code using the mmio data. > +/*----------------------------------------------------------------------------*/ > +/* Message Sending and Processing Operations */ > +/*----------------------------------------------------------------------------*/ > + > +static irqreturn_t wqt_interrupt(int irq, void *dev_id) > +{ > + struct wqt_dev *priv = dev_id; > + u32 imisr, idr; > + unsigned long flags; > + > + imisr = IMMR_R32(IMISR_OFFSET); > + idr = IMMR_R32(IDR_OFFSET); > + > + if (!(imisr & 0x8)) > + return IRQ_NONE; > + > + /* Clear all of the interrupt sources, we'll handle them next */ > + IMMR_W32(IDR_OFFSET, idr); > + > + /* Lock over all of the handlers, so they cannot get called when > + * the code doesn't expect them to be called */ > + spin_lock_irqsave(&priv->irq_lock, flags); If this is in an interrupt handler, why disable the interrupts again? The same comment applies to many of the other places where you use spin_lock_irqsave rather than spin_lock or spin_lock_irq. > +/* NOTE: All handlers are called with priv->irq_lock held */ > + > +static void empty_handler(struct wqt_dev *priv) > +{ > + /* Intentionally left empty */ > +} > + > +static void net_start_req_handler(struct wqt_dev *priv) > +{ > + schedule_work(&priv->net_start_work); > +} > + > +static void net_start_ack_handler(struct wqt_dev *priv) > +{ > + complete(&priv->net_start_completion); > +} > + > +static void net_stop_req_handler(struct wqt_dev *priv) > +{ > + schedule_work(&priv->net_stop_work); > +} > + > +static void net_stop_ack_handler(struct wqt_dev *priv) > +{ > + complete(&priv->net_stop_completion); > +} > + > +static void net_tx_complete_handler(struct wqt_dev *priv) > +{ > + tasklet_schedule(&priv->tx_complete_tasklet); > +} > + > +static void net_rx_packet_handler(struct wqt_dev *priv) > +{ > + wqtstatus_setbit(priv, PCINET_NET_RXINT_OFF); > + netif_rx_schedule(priv->ndev, &priv->napi); > +} > + > +static void uart_rx_ready_handler(struct wqt_dev *priv) > +{ > + wqtuart_rx_char(&priv->port, IMMR_R32(IMR0_OFFSET) & 0xff); > + IMMR_W32(ODR_OFFSET, UART_TX_EMPTY_DBELL); > +} > + > +static void uart_tx_empty_handler(struct wqt_dev *priv) > +{ > + priv->uart_tx_ready = true; > + wake_up(&priv->uart_tx_wait); > +} Since these all seem to be trivial functions, why go through the indirect function pointers at all? It seems that wqt_interrupt could do this just as well. > + > +static int wqt_request_irq(struct wqt_dev *priv) > +{ > + int ret = 0; no point initializing ret. > + mutex_lock(&priv->irq_mutex); What data does the irq_mutex protect? Arnd <><
On Tue, Nov 04, 2008 at 01:09:25PM +0100, Arnd Bergmann wrote: > On Wednesday 29 October 2008, Ira Snyder wrote: > > This adds support to Linux for a virtual ethernet interface which uses the > > PCI bus as its transport mechanism. It creates a simple, familiar, and fast > > method of communication for two devices connected by a PCI interface. > > Very interesting. I have seen multiple such drivers, and still plan to > have one as well for the cell/axon platform. > > > This is the third posting of this driver. I got some feedback, and have > > corrected the problems. Stephen, thanks for the review! I also got some > > off-list feedback from a potential user, so I believe this is worth getting > > into mainline. > > Sorry for not having replied earlier, I did not see first postings. > While I did not see any major problems with your code, I think we should > actually do it in a different way, which I have talked about at the > Linux plumbers conference, and will again at the UKUUG Linux conference > this weekend. You can find my slides from the previous talk at > http://userweb.kernel.org/%7Earnd/papers/plumbers08/plumbers-slides.pdf > I'll read through these slides as soon as I'm finished replying to this email :) > Since you have code and I only have plans, I suppose I can't make you > wait for me to get a driver for your platform, but I hope we can join > forces in the future and share code in the future. > I hope so too. I wrote this because I needed something working. I'm completely happy to help write code for another implementation. > My fundamental criticism of your code is that it is not flexible enough. > We have a number of platforms that act as a PCI, PCIe or similar kind > of device and want to transfer network data. At the same time, some of > these want to use the same infrastructure for non-network data like > MPI (ibverbs), block device or file system (9p). Even in your limited > setup, it seems that you should be able to share more code between the > two implementations you have posted here. When looking at the code, I > found a lot of duplication between the drivers that you should be able > to avoid. > I completely agree. I made the host and client drivers as similar as possible on purpose, to increase my own code sharing. In fact, if you diff them, you'll see that they are almost identical. I decided that it would be better to post them up for review than try to fix every little problem. > My suggestion is to base this on the virtio infrastructure, and some > of my colleagues at IBM already implemented a driver that uses > virtio-net on a PCIe connection. > To be honest, I couldn't get that figured out. I looked at that code for a couple of weeks and couldn't make a lot of sense out of it. I couldn't find much really good documentation, either. I'd love to re-use more code than I have. > Arnd <>< > > Now a few comments on your code: > > > > diff --git a/arch/powerpc/boot/dts/mpc834x_mds.dts b/arch/powerpc/boot/dts/mpc834x_mds.dts > > index c986c54..3bc8975 100644 > > --- a/arch/powerpc/boot/dts/mpc834x_mds.dts > > +++ b/arch/powerpc/boot/dts/mpc834x_mds.dts > > @@ -104,6 +104,13 @@ > > mode = "cpu"; > > }; > > > > + message-unit@8030 { > > + compatible = "fsl,mpc8349-mu"; > > + reg = <0x8030 0xd0>; > > + interrupts = <69 0x8>; > > + interrupt-parent = <&ipic>; > > + }; > > What is a message unit? Is this the mailbox you use in the driver? > We are facing a similar problem on Axon regarding probing of > the virtual device because you can't easily tell from the device > tree whether the machine is connected to something that is trying > to communicate. > > The message-unit does not seem to be network-specific, so it's not > particularly nice to have the network unit grab that of_device. > Yes, the message unit is how the Freescale reference manual describes this hardware. It is the mailbox hardware to generate interrupts, plus a few registers to transfer data. The registers aren't network specific, but they are needed for this driver. That is how the driver generates PCI interrupts to notify the other side of the connection it has new data, etc. > > +config PCINET_FSL > > + tristate "PCINet Virtual Ethernet over PCI support (Freescale)" > > + depends on MPC834x_MDS && !PCI > > + select DMA_ENGINE > > + select FSL_DMA > > + help > > + When running as a PCI Agent, this driver will create a virtual > > + ethernet link running over the PCI bus, allowing simplified > > + communication with the host system. The host system will need > > + to use the corresponding driver. > > + > > + If in doubt, say N. > > Why 'depends on !PCI'? This means that you cannot build a kernel that > is able to run both as host and endpoint for PCInet, right? > Yes, that is correct. I did this because the Linux PCI code does some relatively nasty things in agent mode. One thing that consistently crashed my box was running through the quirks list and trying to re-initialize an e100 that was in my box. Remember, this is a PCI agent. It shouldn't be re-initializing the other hardware on the PCI bus. > > +config PCINET_HOST > > + tristate "PCINet Virtual Ethernet over PCI support (Host)" > > + depends on PCI > > + help > > + This driver will let you communicate with a PCINet client device > > + using a virtual ethernet link running over the PCI bus. This > > + allows simplified communication with the client system. > > + > > + This is inteded for use in a system that has a crate full of > > + computers running Linux, all connected by a PCI backplane. > > + > > + If in doubt, say N. > > Is this meant to be portable across different hardware implementations? > If the driver can only work with PCINET_FSL on the other side, you > should probably mention that in the description. > I had hoped other people would come along and help port it to other hardware, like you yourself are suggesting. :) I'll add to the description. > > +config PCINET_DISABLE_CHECKSUM > > + bool "Disable packet checksumming" > > + depends on PCINET_FSL || PCINET_HOST > > + default n > > + help > > + Disable packet checksumming on packets received by the PCINet > > + driver. This gives a possible speed boost. > > Why make this one optional? Is there ever a reason to enable checksumming? > If there is, how about making it tunable with ethtool? > I left it optional so I could turn it on and off easily. I have no strong feelings on keeping it optional. Does the PCI bus reliably transfer data? I'm not sure. I left it there so that we could at least turn on checksumming if there was a problem. > > +struct circ_buf_desc { > > + __le32 sc; > > + __le32 len; > > + __le32 addr; > > +} __attribute__((__packed__)); > > It would be useful to always force aligning the desciptors to the whole > 32 bit and avoid the packing here. Unaligned accesses are inefficient on > many systems. > I don't really know how to do that. I got a warning here from sparse telling me something about expensive pointer subtraction. Adding a dummy 32bit padding variable got rid of the warning, but I didn't change the driver. > > +typedef struct circ_buf_desc cbd_t; > > Also, don't pass structures by value if they don't fit into one or > two registers. > These are only used for pointers to the buffer descriptors (in RAM on the Freescale) that hold packet information. I never copy them directly. > > +/* Buffer Descriptor Accessors */ > > +#define CBDW_SC(_cbd, _sc) iowrite32((_sc), &(_cbd)->sc) > > +#define CBDW_LEN(_cbd, _len) iowrite32((_len), &(_cbd)->len) > > +#define CBDW_ADDR(_cbd, _addr) iowrite32((_addr), &(_cbd)->addr) > > + > > +#define CBDR_SC(_cbd) ioread32(&(_cbd)->sc) > > +#define CBDR_LEN(_cbd) ioread32(&(_cbd)->len) > > +#define CBDR_ADDR(_cbd) ioread32(&(_cbd)->addr) > > We have found that accessing remote descriptors using mmio read is > rather slow, and changed the code to always do local reads and > remote writes. > Interesting. I don't know how you would get network speed doing this. X86 systems don't have a DMA conttroller. The entire purpose of making the Freescale do all the copying was to use its DMA controller. Using the DMA controller to transfer all of the data took my transfer speed from ~3MB/sec to ~45MB/sec. While that is a good increase, it could be better. I should be able to hit close to 133MB/sec (the limit of PCI) > On the host side, I would argue that you should use out_le32 rather > than iowrite32, because you are not operating on a PCI device but > an OF device. > Yeah, you're probably correct. I can't remember if I tried this and had problems or not. > > +/* IMMR Accessor Helpers */ > > +#define IMMR_R32(_off) ioread32(priv->immr+(_off)) > > +#define IMMR_W32(_off, _val) iowrite32((_val), priv->immr+(_off)) > > +#define IMMR_R32BE(_off) ioread32be(priv->immr+(_off)) > > +#define IMMR_W32BE(_off, _val) iowrite32be((_val), priv->immr+(_off)) > > IIRC, the IMMR is a platform resource, so the system should map those > registers already and provide accessor functions for the registers > you need. Simply allowing to pass an offset in here does not look > clean. > > Correct. This was done to make both sides as identical as possible. The Freescale exports the entire 1MB block of IMMR registers at PCI BAR0. So I have to use the offsets on the host side. From the client side, I could just map what I need, but that would make the two drivers diverge. I was trying to keep them the same. > > +static void wqtuart_rx_char(struct uart_port *port, const char ch); > > +static void wqtuart_stop_tx(struct uart_port *port); > > You should try to avoid forward declarations for static functions. > If you order the function implementation correctly, that will > also give you the expected reading order in the driver. > Yep, I tried to do this. I couldn't figure out a sane ordering that would work. I tried to keep the network and uart as seperate as possible in the code. > > +struct wqt_dev; > > +typedef void (*wqt_irqhandler_t)(struct wqt_dev *); > > Much more importantly, never do forward declarations for > global functions! > > These belong into a header that is included by both the > user and the definition. > > Ok. > > +struct wqt_dev { > > + /*--------------------------------------------------------------------*/ > > + /* OpenFirmware Infrastructure */ > > + /*--------------------------------------------------------------------*/ > > + struct of_device *op; > > + struct device *dev; > > Why the dev? You can always get that from the of_device, right? > Yes. I stored it there to make it identical to the host driver. By doing this, both drivers have code that says "dev_debug(priv->dev, ...)" rather than: Host: dev_debug(&priv->pdev->dev, ...) Freescale: dev_debug(&priv->op->dev, ...) > > + int irq; > > + void __iomem *immr; > > I don't think immr is device specific. > Correct, it is not. Both the PCI subsystem and the OF subsystem will call probe() when a new device is detected. I stored it here so I could ioremap() them at probe() time, and to keep both drivers' code the same. > > + struct mutex irq_mutex; > > + int interrupt_count; > > + > > + spinlock_t irq_lock; > > + struct wqt_irqhandlers handlers; > > + > > + /*--------------------------------------------------------------------*/ > > + /* UART Device Infrastructure */ > > + /*--------------------------------------------------------------------*/ > > + struct uart_port port; > > + bool uart_rx_enabled; > > + bool uart_open; > > hmm, if you need a uart, that sounds like it should be a separate driver > altogether. What is the relation between the network interface and the UART? > Yes, I agree. How do you make two Linux drivers that can be loaded for the same hardware at the same time? :) AFAIK, you cannot. I NEED two functions accessible at the same time, network (to transfer data) and uart (to control my bootloader). I use the uart to interact with the bootloader (U-Boot) and tell it where to tftp a kernel. I use the network to transfer the kernel. So you see, I really do need them both at the same time. If you know a better way to do this, please let me know! It was possible to write seperate U-Boot drivers, but only by being careful to not conflict in my usage of the hardware. > > + struct workqueue_struct *wq; > > A per-device pointer to a workqueue_struct is unusual. What are you doing > this for? How about using the system work queue? > Look through the code carefully, especially at uart_tx_work_fn(), which calls do_send_message(). do_send_message() can sleep for a long time. If I use the shared workqueue, I block ALL OTHER USERS of the queue for the time I am asleep. They cannot run. The LDD3 book says not to use the shared workqueue if you are going to sleep for long periods. I followed their advice. > > + /*--------------------------------------------------------------------*/ > > + /* Ethernet Device Infrastructure */ > > + /*--------------------------------------------------------------------*/ > > + struct net_device *ndev; > > Why make this a separate structure? If you have one of these per net_device, > you should embed the net_device into your own structure. > This structure is embedded in struct net_device! Look at how alloc_etherdev() works. You pass it the size of your private data structure and it allocates the space for you. > > + struct tasklet_struct tx_complete_tasklet; > > Using a tasklet for tx processing sounds fishy because most of the > network code already runs at softirq time. You do not gain anything > by another softirq context. > I didn't want to run the TX cleanup routine at hard irq time, because it can potentially take some time to run. I would rather run it with hard interrupts enabled. This DOES NOT do TX processing, it only frees skbs that have been transferred. I used the network stack to do as much as possible, of course. > > +/*----------------------------------------------------------------------------*/ > > +/* Status Register Helper Operations */ > > +/*----------------------------------------------------------------------------*/ > > + > > +static DEFINE_SPINLOCK(status_lock); > > + > > +static void wqtstatus_setbit(struct wqt_dev *priv, u32 bit) > > +{ > > + unsigned long flags; > > + > > + spin_lock_irqsave(&status_lock, flags); > > + IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) | bit); > > + spin_unlock_irqrestore(&status_lock, flags); > > +} > > + > > +static void wqtstatus_clrbit(struct wqt_dev *priv, u32 bit) > > +{ > > + unsigned long flags; > > + > > + spin_lock_irqsave(&status_lock, flags); > > + IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) & ~bit); > > + spin_unlock_irqrestore(&status_lock, flags); > > +} > > This looks misplaced, as mentioned before. > I used these so I could have some remote status. Remember, this has to work for Linux -> U-Boot, not just Linux <-> Linux. While the board is booting Linux, the RAM I used to hold buffer descriptors disappears! I need some shared status to make sure no stray accesses happen to random areas of memory. The case this potentially happens is: 1) Start Freescale board, do NOT install the driver 2) Insert the driver into the host system Now, if the host tries writing to the buffer descriptors, we've corrupted memory on the Freescale board. > > +static int wqtstatus_remote_testbit(struct wqt_dev *priv, u32 bit) > > +{ > > + return IMMR_R32(IMR1_OFFSET) & bit; > > +} > > Are you sure that do not need a spinlock here? The lock also > prevents reordering the device access arount the code using > the mmio data. > I don't think so. Note that this is a status bit on the remote side of the connection. My access ordering means nothing when the remote side is setting and clearing the bit. > > +/*----------------------------------------------------------------------------*/ > > +/* Message Sending and Processing Operations */ > > +/*----------------------------------------------------------------------------*/ > > + > > +static irqreturn_t wqt_interrupt(int irq, void *dev_id) > > +{ > > + struct wqt_dev *priv = dev_id; > > + u32 imisr, idr; > > + unsigned long flags; > > + > > + imisr = IMMR_R32(IMISR_OFFSET); > > + idr = IMMR_R32(IDR_OFFSET); > > + > > + if (!(imisr & 0x8)) > > + return IRQ_NONE; > > + > > + /* Clear all of the interrupt sources, we'll handle them next */ > > + IMMR_W32(IDR_OFFSET, idr); > > + > > + /* Lock over all of the handlers, so they cannot get called when > > + * the code doesn't expect them to be called */ > > + spin_lock_irqsave(&priv->irq_lock, flags); > > If this is in an interrupt handler, why disable the interrupts again? > The same comment applies to many of the other places where you > use spin_lock_irqsave rather than spin_lock or spin_lock_irq. > I tried to make the locking do only what was needed. I just couldn't get it correct unless I used spin_lock_irqsave(). I was able to get the system to deadlock otherwise. This is why I posted the driver for review, I could use some help here. It isn't critical anyway. You can always use spin_lock_irqsave(), it is just a little slower, but it will always work :) > > +/* NOTE: All handlers are called with priv->irq_lock held */ > > + > > +static void empty_handler(struct wqt_dev *priv) > > +{ > > + /* Intentionally left empty */ > > +} > > + > > +static void net_start_req_handler(struct wqt_dev *priv) > > +{ > > + schedule_work(&priv->net_start_work); > > +} > > + > > +static void net_start_ack_handler(struct wqt_dev *priv) > > +{ > > + complete(&priv->net_start_completion); > > +} > > + > > +static void net_stop_req_handler(struct wqt_dev *priv) > > +{ > > + schedule_work(&priv->net_stop_work); > > +} > > + > > +static void net_stop_ack_handler(struct wqt_dev *priv) > > +{ > > + complete(&priv->net_stop_completion); > > +} > > + > > +static void net_tx_complete_handler(struct wqt_dev *priv) > > +{ > > + tasklet_schedule(&priv->tx_complete_tasklet); > > +} > > + > > +static void net_rx_packet_handler(struct wqt_dev *priv) > > +{ > > + wqtstatus_setbit(priv, PCINET_NET_RXINT_OFF); > > + netif_rx_schedule(priv->ndev, &priv->napi); > > +} > > + > > +static void uart_rx_ready_handler(struct wqt_dev *priv) > > +{ > > + wqtuart_rx_char(&priv->port, IMMR_R32(IMR0_OFFSET) & 0xff); > > + IMMR_W32(ODR_OFFSET, UART_TX_EMPTY_DBELL); > > +} > > + > > +static void uart_tx_empty_handler(struct wqt_dev *priv) > > +{ > > + priv->uart_tx_ready = true; > > + wake_up(&priv->uart_tx_wait); > > +} > > Since these all seem to be trivial functions, why go through the > indirect function pointers at all? It seems that wqt_interrupt > could do this just as well. > I did this so they can be disabled independently. It is VERY tricky to get the board started without the chance of corrupting memory. Write the code, you'll see :) You have to deal with two systems that can read/write each other's memory, but DO NOT have any synchronization primitives. > > + > > +static int wqt_request_irq(struct wqt_dev *priv) > > +{ > > + int ret = 0; > > no point initializing ret. > > > + mutex_lock(&priv->irq_mutex); > > What data does the irq_mutex protect? > priv->interrupt_count and the indirect handlers. Note that I have two fundamentally seperate devices here. The network and the uart. What if noone has brought up the network interface? You'd still expect the uart to work, right? In this case, I need to call request_irq() in TWO places, the network's open() and the uart's open(). There is only a single interrupt for this hardware. When only the uart is open, I only want the uart section of the irq handler enabled. This is where the indirect handlers come in. On uart open(), I call wqt_request_irq(), which will ONLY call request_irq() if it HAS NOT been called yet. Here are a few examples: Ex1: uart only 1) uart open() is called 2) wqt_request_irq() is called 3) the uart irq handling functions are enabled 4) request_irq() is called, hooking up the irq 5) the uart is started The network by itself works exactly the same. Ex2: uart first, then network while uart is open 1) uart open() is called 2) wqt_request_irq() is called 3) the uart irq handling functions are enabled 4) request_irq() is called, hooking up the irq handler 5) the uart is started 6) time passes 7) network open() is called 8) wqt_request_irq() is called 9) request_irq() is NOT CALLED THIS TIME, because it is already enabled 10) the network irq handling functions are enabled 11) the network is started Thanks so much for the review! I hope we can work together to get something that can be merged into mainline Linux. I'm willing to write code, I just need some direction from more experienced kernel developers. Ira
On Tuesday 04 November 2008, Ira Snyder wrote: > On Tue, Nov 04, 2008 at 01:09:25PM +0100, Arnd Bergmann wrote: > > > > Why 'depends on !PCI'? This means that you cannot build a kernel that > > is able to run both as host and endpoint for PCInet, right? > > > > Yes, that is correct. I did this because the Linux PCI code does some > relatively nasty things in agent mode. One thing that consistently > crashed my box was running through the quirks list and trying to > re-initialize an e100 that was in my box. > > Remember, this is a PCI agent. It shouldn't be re-initializing the other > hardware on the PCI bus. Yes, that makes sense. However, you should still be able to have the PCI code built into the kernel, as long as you prevent it from scanning the bus on the machine that is in agent/endpoint mode. This should be made clear in the device tree. On the QS22 machine, we remove the "pci" device from the device tree, and add a "pcie-ep" device. > I left it optional so I could turn it on and off easily. I have no > strong feelings on keeping it optional. > > Does the PCI bus reliably transfer data? I'm not sure. I left it there > so that we could at least turn on checksumming if there was a problem. Yes, PCI guarantees reliable transfers. > > > +struct circ_buf_desc { > > > + __le32 sc; > > > + __le32 len; > > > + __le32 addr; > > > +} __attribute__((__packed__)); > > > > It would be useful to always force aligning the desciptors to the whole > > 32 bit and avoid the packing here. Unaligned accesses are inefficient on > > many systems. > > > > I don't really know how to do that. I got a warning here from sparse > telling me something about expensive pointer subtraction. Adding a dummy > 32bit padding variable got rid of the warning, but I didn't change the > driver. Ok, I see. However, adding the packed attribute makes it more expensive to use. > > > +typedef struct circ_buf_desc cbd_t; > > > > Also, don't pass structures by value if they don't fit into one or > > two registers. > > > > These are only used for pointers to the buffer descriptors (in RAM on > the Freescale) that hold packet information. I never copy them directly. Ok, then you should not have a typedef. > > > +/* Buffer Descriptor Accessors */ > > > +#define CBDW_SC(_cbd, _sc) iowrite32((_sc), &(_cbd)->sc) > > > +#define CBDW_LEN(_cbd, _len) iowrite32((_len), &(_cbd)->len) > > > +#define CBDW_ADDR(_cbd, _addr) iowrite32((_addr), &(_cbd)->addr) > > > + > > > +#define CBDR_SC(_cbd) ioread32(&(_cbd)->sc) > > > +#define CBDR_LEN(_cbd) ioread32(&(_cbd)->len) > > > +#define CBDR_ADDR(_cbd) ioread32(&(_cbd)->addr) > > > > We have found that accessing remote descriptors using mmio read is > > rather slow, and changed the code to always do local reads and > > remote writes. > > > Interesting. I don't know how you would get network speed doing this. > X86 systems don't have a DMA conttroller. The entire purpose of making > the Freescale do all the copying was to use its DMA controller. > > Using the DMA controller to transfer all of the data took my transfer > speed from ~3MB/sec to ~45MB/sec. While that is a good increase, it > could be better. I should be able to hit close to 133MB/sec (the limit > of PCI) Then I think I misunderstood something about this driver. Are these descriptors accessed by the DMA engine, or by software? If it's the DMA engine accessing them, can you put the descriptors on both sides of the bus rather than just on one side? Which side allocates them anyway? Since you use ioread32/iowrite32 on the ppc side, it looks like they are on the PCI host, which does not seem to make much sense, because the ppc memory is much closer to the DMA engine? Obviously, you want the DMA engine to do the data transfers, but here, you use ioread32 for mmio transfers to the descriptors, which is slow. > Correct. This was done to make both sides as identical as possible. The > Freescale exports the entire 1MB block of IMMR registers at PCI BAR0. So > I have to use the offsets on the host side. > > From the client side, I could just map what I need, but that would make > the two drivers diverge. I was trying to keep them the same. Ah, I see. We had the same problem on Axon, and I'm still looking for a good solution. The best option is probably to abstract the immr access in some way and provide a driver that implements them on top of PCI. > > > > +static void wqtuart_rx_char(struct uart_port *port, const char ch); > > > +static void wqtuart_stop_tx(struct uart_port *port); > > > > You should try to avoid forward declarations for static functions. > > If you order the function implementation correctly, that will > > also give you the expected reading order in the driver. > > > > Yep, I tried to do this. I couldn't figure out a sane ordering that > would work. I tried to keep the network and uart as seperate as possible > in the code. I'd suggest splitting the uart code into a separate driver then. > > > +struct wqt_dev { > > > + /*--------------------------------------------------------------------*/ > > > + /* OpenFirmware Infrastructure */ > > > + /*--------------------------------------------------------------------*/ > > > + struct of_device *op; > > > + struct device *dev; > > > > Why the dev? You can always get that from the of_device, right? > > > > Yes. I stored it there to make it identical to the host driver. By doing > this, both drivers have code that says "dev_debug(priv->dev, ...)" > rather than: > > Host: > dev_debug(&priv->pdev->dev, ...) > > Freescale: > dev_debug(&priv->op->dev, ...) Ok. You can just store the dev pointer then, and leave out the op pointer. You can always do a container_of() to get back to it. > Yes, I agree. How do you make two Linux drivers that can be loaded for > the same hardware at the same time? :) AFAIK, you cannot. > > I NEED two functions accessible at the same time, network (to transfer > data) and uart (to control my bootloader). > > I use the uart to interact with the bootloader (U-Boot) and tell it > where to tftp a kernel. I use the network to transfer the kernel. > > So you see, I really do need them both at the same time. If you know a > better way to do this, please let me know! > > It was possible to write seperate U-Boot drivers, but only by being > careful to not conflict in my usage of the hardware. Ok, I see. I fear any nice solution would make the u-boot drivers much more complex. > > > + struct workqueue_struct *wq; > > > > A per-device pointer to a workqueue_struct is unusual. What are you doing > > this for? How about using the system work queue? > > > > Look through the code carefully, especially at uart_tx_work_fn(), which > calls do_send_message(). > > do_send_message() can sleep for a long time. If I use the shared > workqueue, I block ALL OTHER USERS of the queue for the time I am > asleep. They cannot run. ok. > The LDD3 book says not to use the shared workqueue if you are going to > sleep for long periods. I followed their advice. ok. > > > + /*--------------------------------------------------------------------*/ > > > + /* Ethernet Device Infrastructure */ > > > + /*--------------------------------------------------------------------*/ > > > + struct net_device *ndev; > > > > Why make this a separate structure? If you have one of these per net_device, > > you should embed the net_device into your own structure. > > > > This structure is embedded in struct net_device! Look at how > alloc_etherdev() works. You pass it the size of your private data > structure and it allocates the space for you. right, I remember now. Unfortunately, alloc_etherdev is a little bit different from many other kernel interfaces. > > > + struct tasklet_struct tx_complete_tasklet; > > > > Using a tasklet for tx processing sounds fishy because most of the > > network code already runs at softirq time. You do not gain anything > > by another softirq context. > > > > I didn't want to run the TX cleanup routine at hard irq time, because it > can potentially take some time to run. I would rather run it with hard > interrupts enabled. sure. > This DOES NOT do TX processing, it only frees skbs that have been > transferred. I used the network stack to do as much as possible, of > course. Most drivers now do that from the *rx* poll function, and call netif_rx_schedule when they get a tx interrupt. > > If this is in an interrupt handler, why disable the interrupts again? > > The same comment applies to many of the other places where you > > use spin_lock_irqsave rather than spin_lock or spin_lock_irq. > > > > I tried to make the locking do only what was needed. I just couldn't get > it correct unless I used spin_lock_irqsave(). I was able to get the > system to deadlock otherwise. This is why I posted the driver for > review, I could use some help here. > > It isn't critical anyway. You can always use spin_lock_irqsave(), it is > just a little slower, but it will always work :) I like the documenting character of the spinlock functions. E.g. if you use spin_lock_irq() in a function, it is obvious that interrupts are enabled, and if you use spin_lock() on a lock that requires disabling interrupts, you know that interrupts are already off. > Thanks so much for the review! I hope we can work together to get > something that can be merged into mainline Linux. I'm willing to write > code, I just need some direction from more experienced kernel > developers. Great, I can certainly help with that. Please CC me on anything related to this driver. Arnd <><
On Tue, Nov 04, 2008 at 09:23:03PM +0100, Arnd Bergmann wrote: > On Tuesday 04 November 2008, Ira Snyder wrote: > > On Tue, Nov 04, 2008 at 01:09:25PM +0100, Arnd Bergmann wrote: > > > > > > Why 'depends on !PCI'? This means that you cannot build a kernel that > > > is able to run both as host and endpoint for PCInet, right? > > > > > > > Yes, that is correct. I did this because the Linux PCI code does some > > relatively nasty things in agent mode. One thing that consistently > > crashed my box was running through the quirks list and trying to > > re-initialize an e100 that was in my box. > > > > Remember, this is a PCI agent. It shouldn't be re-initializing the other > > hardware on the PCI bus. > > Yes, that makes sense. However, you should still be able to have the > PCI code built into the kernel, as long as you prevent it from scanning > the bus on the machine that is in agent/endpoint mode. > > This should be made clear in the device tree. On the QS22 machine, we > remove the "pci" device from the device tree, and add a "pcie-ep" > device. > Ok, that makes perfect sense. I'll test it at some point and make sure that the kernel doesn't go through the quirks list, but it sounds reasonable to assume it doesn't. > > I left it optional so I could turn it on and off easily. I have no > > strong feelings on keeping it optional. > > > > Does the PCI bus reliably transfer data? I'm not sure. I left it there > > so that we could at least turn on checksumming if there was a problem. > > Yes, PCI guarantees reliable transfers. > Great, I didn't know that. I'll turn it off unconditionally. Disabling the checksumming gave me a few extra MB/sec. > > > > +struct circ_buf_desc { > > > > + __le32 sc; > > > > + __le32 len; > > > > + __le32 addr; > > > > +} __attribute__((__packed__)); > > > > > > It would be useful to always force aligning the desciptors to the whole > > > 32 bit and avoid the packing here. Unaligned accesses are inefficient on > > > many systems. > > > > > > > I don't really know how to do that. I got a warning here from sparse > > telling me something about expensive pointer subtraction. Adding a dummy > > 32bit padding variable got rid of the warning, but I didn't change the > > driver. > > Ok, I see. However, adding the packed attribute makes it more expensive > to use. > Ok. Is there any way to make sure that the structure compiles to the same representation on the host and agent system without using packed? > > > > +typedef struct circ_buf_desc cbd_t; > > > > > > Also, don't pass structures by value if they don't fit into one or > > > two registers. > > > > > > > These are only used for pointers to the buffer descriptors (in RAM on > > the Freescale) that hold packet information. I never copy them directly. > > Ok, then you should not have a typedef. > Ok, it is gone in my latest version. > > > > +/* Buffer Descriptor Accessors */ > > > > +#define CBDW_SC(_cbd, _sc) iowrite32((_sc), &(_cbd)->sc) > > > > +#define CBDW_LEN(_cbd, _len) iowrite32((_len), &(_cbd)->len) > > > > +#define CBDW_ADDR(_cbd, _addr) iowrite32((_addr), &(_cbd)->addr) > > > > + > > > > +#define CBDR_SC(_cbd) ioread32(&(_cbd)->sc) > > > > +#define CBDR_LEN(_cbd) ioread32(&(_cbd)->len) > > > > +#define CBDR_ADDR(_cbd) ioread32(&(_cbd)->addr) > > > > > > We have found that accessing remote descriptors using mmio read is > > > rather slow, and changed the code to always do local reads and > > > remote writes. > > > > > Interesting. I don't know how you would get network speed doing this. > > X86 systems don't have a DMA conttroller. The entire purpose of making > > the Freescale do all the copying was to use its DMA controller. > > > > Using the DMA controller to transfer all of the data took my transfer > > speed from ~3MB/sec to ~45MB/sec. While that is a good increase, it > > could be better. I should be able to hit close to 133MB/sec (the limit > > of PCI) > > Then I think I misunderstood something about this driver. Are these > descriptors accessed by the DMA engine, or by software? If it's the > DMA engine accessing them, can you put the descriptors on both sides > of the bus rather than just on one side? > I access the descriptors in software, and program the DMA controller to transfer the data. They are not directly used by the hardware. I used the DMAEngine API to interact with the DMA controller. I tried programming them manually, but the DMAEngine API was about 10 MB/sec faster than I could achieve by hand. See dma_async_copy_raw_to_buf() and dma_async_copy_buf_to_raw() in the PowerPC code. The basics of the network driver are as follows: 1) PowerPC allocates 4k of RAM for buffer descriptors, and exposes it over PCI in BAR 1 2) Host initializes all buffer descriptors to zero 3) Host allocates RING_SIZE 64K skb's, and puts them in the RX ring On PowerPC hard_start_xmit(): 1) Find the next free buffer in the RX ring, get the address stored inside it 2) DMA the packet given to us by the network stack to that address 3) Mark the buffer descriptor used 4) Interrupt the host On Host hard_start_xmit(): 1) Find the next free buffer descriptor in the TX ring 2) dma_map_single() and put the address into the buffer descriptor 3) Mark the buffer descriptor as used 4) Interrupt the PowerPC On PowerPC rx_napi(): (scheduled by interrupt) 1) Find the next dirty buffer in the TX ring, get the address and len 2) Allocate an skb of this len 3) DMA the data into the new skb 4) Pass the new skb up into the kernel 5) Mark the buffer as freeable 6) Loop until done On Host rx_napi(): 1) Find the next dirty buffer in the RX ring, get the pointer to it in the list of allocated skbs 2) Allocate a new 64K skb 3) Put the new skb into the buffer descriptors, mark it as clean 4) Push the skb (from the RX ring) into the kernel 5) Loop until done So, you'll notice that I only copy the data over the PCI bus once, directly into the skb it is supposed to be going into. The buffer descriptors are there so I know where to find the skb in host memory across the PCI bus. Hopefully that's a good description. :) It seems to me that both sides of the connection need to read the descriptors (to get packet length, clean up dirty packets, etc.) and write them (to set packet length, mark packets dirty, etc.) I just can't come up with something that is local-read / remote-write only. > Which side allocates them anyway? Since you use ioread32/iowrite32 > on the ppc side, it looks like they are on the PCI host, which does > not seem to make much sense, because the ppc memory is much closer > to the DMA engine? > The PowerPC allocates them. They are accessible via PCI BAR1. They live in regular RAM on the PowerPC. I can't remember why I used ioread32/iowrite32 anymore. I'll try again with in_le32()/out_le32() on the PowerPC system, and see what happens. > Obviously, you want the DMA engine to do the data transfers, but here, you > use ioread32 for mmio transfers to the descriptors, which is slow. > I didn't know it was slow :) Maybe this is why I had to make the MTU very large to get good speed. Using a standard 1500 byte MTU I get <10 MB/sec transfer speed. Using a 64K MTU, I get ~45MB/sec transfer speed. Do I need to do any sort of flushing to make sure that the read has actually gone out of cache and into memory? When the host accesses the buffer descriptors over PCI, it can only view memory. If a write is still in the PowerPC cache, the host will get stale data. > > Correct. This was done to make both sides as identical as possible. The > > Freescale exports the entire 1MB block of IMMR registers at PCI BAR0. So > > I have to use the offsets on the host side. > > > > From the client side, I could just map what I need, but that would make > > the two drivers diverge. I was trying to keep them the same. > > Ah, I see. We had the same problem on Axon, and I'm still looking for a > good solution. The best option is probably to abstract the immr access > in some way and provide a driver that implements them on top of PCI. > > > > > > +static void wqtuart_rx_char(struct uart_port *port, const char ch); > > > > +static void wqtuart_stop_tx(struct uart_port *port); > > > > > > You should try to avoid forward declarations for static functions. > > > If you order the function implementation correctly, that will > > > also give you the expected reading order in the driver. > > > > > > > Yep, I tried to do this. I couldn't figure out a sane ordering that > > would work. I tried to keep the network and uart as seperate as possible > > in the code. > > I'd suggest splitting the uart code into a separate driver then. > How? In Linux we can only have one driver for a certain set of hardware. I use the messaging unit to do both network (interrupts and status bits) and uart (interrupts and message transfer). Both the network and uart _must_ run at the same time. This way I can type into the bootloader prompt to start a network transfer, and watch it complete. Remember, I can't have a real serial console plugged into this board. I'll be using this with about 150 boards in 8 separate chassis, which makes cabling a nightmare. I'm trying to do as much as possible with the PCI backplane. > > > > +struct wqt_dev { > > > > + /*--------------------------------------------------------------------*/ > > > > + /* OpenFirmware Infrastructure */ > > > > + /*--------------------------------------------------------------------*/ > > > > + struct of_device *op; > > > > + struct device *dev; > > > > > > Why the dev? You can always get that from the of_device, right? > > > > > > > Yes. I stored it there to make it identical to the host driver. By doing > > this, both drivers have code that says "dev_debug(priv->dev, ...)" > > rather than: > > > > Host: > > dev_debug(&priv->pdev->dev, ...) > > > > Freescale: > > dev_debug(&priv->op->dev, ...) > > Ok. You can just store the dev pointer then, and leave out the op pointer. > You can always do a container_of() to get back to it. > True, I didn't think of that. I'll make that change. > > Yes, I agree. How do you make two Linux drivers that can be loaded for > > the same hardware at the same time? :) AFAIK, you cannot. > > > > I NEED two functions accessible at the same time, network (to transfer > > data) and uart (to control my bootloader). > > > > I use the uart to interact with the bootloader (U-Boot) and tell it > > where to tftp a kernel. I use the network to transfer the kernel. > > > > So you see, I really do need them both at the same time. If you know a > > better way to do this, please let me know! > > > > It was possible to write seperate U-Boot drivers, but only by being > > careful to not conflict in my usage of the hardware. > > Ok, I see. I fear any nice solution would make the u-boot drivers much > more complex. > Perhaps. I'm perfectly willing to port things to U-Boot. Especially if we can make something generic enough to be re-used by many different boards. Recently, another person on the U-Boot list has shown a need for this kind of solution. > > > > + /*--------------------------------------------------------------------*/ > > > > + /* Ethernet Device Infrastructure */ > > > > + /*--------------------------------------------------------------------*/ > > > > + struct net_device *ndev; > > > > > > Why make this a separate structure? If you have one of these per net_device, > > > you should embed the net_device into your own structure. > > > > > > > This structure is embedded in struct net_device! Look at how > > alloc_etherdev() works. You pass it the size of your private data > > structure and it allocates the space for you. > > right, I remember now. Unfortunately, alloc_etherdev is a little bit > different from many other kernel interfaces. > Yep. It sure is :) > > > > + struct tasklet_struct tx_complete_tasklet; > > > > > > Using a tasklet for tx processing sounds fishy because most of the > > > network code already runs at softirq time. You do not gain anything > > > by another softirq context. > > > > > > > I didn't want to run the TX cleanup routine at hard irq time, because it > > can potentially take some time to run. I would rather run it with hard > > interrupts enabled. > > sure. > > > This DOES NOT do TX processing, it only frees skbs that have been > > transferred. I used the network stack to do as much as possible, of > > course. > > Most drivers now do that from the *rx* poll function, and call > netif_rx_schedule when they get a tx interrupt. > That is an interesting concept. I'll look around the drivers/net tree and try to find one that works this way. It should be pretty easy to implement, though. I'll try it out. > > > If this is in an interrupt handler, why disable the interrupts again? > > > The same comment applies to many of the other places where you > > > use spin_lock_irqsave rather than spin_lock or spin_lock_irq. > > > > > > > I tried to make the locking do only what was needed. I just couldn't get > > it correct unless I used spin_lock_irqsave(). I was able to get the > > system to deadlock otherwise. This is why I posted the driver for > > review, I could use some help here. > > > > It isn't critical anyway. You can always use spin_lock_irqsave(), it is > > just a little slower, but it will always work :) > > I like the documenting character of the spinlock functions. E.g. if you > use spin_lock_irq() in a function, it is obvious that interrupts are enabled, > and if you use spin_lock() on a lock that requires disabling interrupts, > you know that interrupts are already off. > True. I just couldn't seem to get it right. I'll try again. Perhaps it was another bug in the driver that I hadn't found at the time. > > Thanks so much for the review! I hope we can work together to get > > something that can be merged into mainline Linux. I'm willing to write > > code, I just need some direction from more experienced kernel > > developers. > > Great, I can certainly help with that. Please CC me on anything related > to this driver. Will do. Please CC me on anything similar that you run across as well. :) Ira
On Tue, Nov 04, 2008 at 09:23:03PM +0100, Arnd Bergmann wrote: Big snip. > > > > I tried to make the locking do only what was needed. I just couldn't get > > it correct unless I used spin_lock_irqsave(). I was able to get the > > system to deadlock otherwise. This is why I posted the driver for > > review, I could use some help here. > > > > It isn't critical anyway. You can always use spin_lock_irqsave(), it is > > just a little slower, but it will always work :) > > I like the documenting character of the spinlock functions. E.g. if you > use spin_lock_irq() in a function, it is obvious that interrupts are enabled, > and if you use spin_lock() on a lock that requires disabling interrupts, > you know that interrupts are already off. > Ok, I've started addressing your comments. I'm having trouble with the locking again. Let me see if my understanding is correct: spin_lock_irqsave() disables interrupts and saves the interrupt state spin_unlock_irqrestore() MAY re-enable interrupts based on the saved flags spin_lock_irq() disables interrupts, and always turns them back on when spin_unlock_irq() is called spin_lock_bh() disables softirqs, spin_unlock_bh() re-enables them spin_lock() and spin_unlock() are just regular spinlocks So, since interrupts are disabled while my interrupt handler is running, I think I should be able to use spin_lock() and spin_unlock(), correct? But sparse gives me the following warning: wqt.c:185:9: warning: context imbalance in 'wqt_interrupt': wrong count at exit wqt.c:185:9: context 'lock': wanted 0, got 1 If I'm using spin_lock_irqsave() and spin_lock_irqrestore() I do not get the same warnings. Therefore I must have some misunderstanding :) Thanks, Ira
On Tuesday 04 November 2008, Ira Snyder wrote: > So, since interrupts are disabled while my interrupt handler is running, > I think I should be able to use spin_lock() and spin_unlock(), correct? yes. > But sparse gives me the following warning: > wqt.c:185:9: warning: context imbalance in 'wqt_interrupt': wrong count at exit > wqt.c:185:9: context 'lock': wanted 0, got 1 > > If I'm using spin_lock_irqsave() and spin_lock_irqrestore() I do not get > the same warnings. Therefore I must have some misunderstanding :) I've seen something like that before, I think it was a bug either in sparse or in the powerpc platform code. Try updating both the kernel and sparse and see if it still happens. Arnd <><
On Tuesday 04 November 2008, Ira Snyder wrote: > On Tue, Nov 04, 2008 at 09:23:03PM +0100, Arnd Bergmann wrote: > > On Tuesday 04 November 2008, Ira Snyder wrote: > > > I don't really know how to do that. I got a warning here from sparse > > > telling me something about expensive pointer subtraction. Adding a dummy > > > 32bit padding variable got rid of the warning, but I didn't change the > > > driver. > > > > Ok, I see. However, adding the packed attribute makes it more expensive > > to use. > > > > Ok. Is there any way to make sure that the structure compiles to the > same representation on the host and agent system without using packed? Only knowledge about the alignment on all the possible architectures ;-) As a simplified rule, always pad every struct member to the largest other member in the struct and always use explicitly sized types like __u8 or __le32. > Hopefully that's a good description. :) It seems to me that both sides > of the connection need to read the descriptors (to get packet length, > clean up dirty packets, etc.) and write them (to set packet length, mark > packets dirty, etc.) I just can't come up with something that is > local-read / remote-write only. If I understand your description correctly, the only remote read is when the host accesses the buffer descriptors to find free space. Avoiding this read access may improve the latency a bit. In our ring buffer concept, both host and endpoint allocate a memory buffer that gets ioremapped into the remote side. Since you always need to read the descriptors from powerpc, you should probably keep them in powerpc memory, but you can change the code so that for finding the next free entry, the host will look in its own memory for the number of the next entry, and the powerpc side will write that when it consumes a descriptor to mark it as free. > > Which side allocates them anyway? Since you use ioread32/iowrite32 > > on the ppc side, it looks like they are on the PCI host, which does > > not seem to make much sense, because the ppc memory is much closer > > to the DMA engine? > > > > The PowerPC allocates them. They are accessible via PCI BAR1. They live > in regular RAM on the PowerPC. I can't remember why I used > ioread32/iowrite32 anymore. I'll try again with in_le32()/out_le32() on > the PowerPC system, and see what happens. Actually, if they are in powerpc RAM, you must not neither in_le32 nor ioread32. Both are only well-defined on I/O devices (local bus or PCI, respectively). Instead, you should use directly access the buffer using pointer dereferences, and use rmb()/wmb() to make sure anything you access is synchronized with the host. > > Obviously, you want the DMA engine to do the data transfers, but here, you > > use ioread32 for mmio transfers to the descriptors, which is slow. > > > > I didn't know it was slow :) Maybe this is why I had to make the MTU > very large to get good speed. Using a standard 1500 byte MTU I get > <10 MB/sec transfer speed. Using a 64K MTU, I get ~45MB/sec transfer > speed. > > Do I need to do any sort of flushing to make sure that the read has > actually gone out of cache and into memory? When the host accesses the > buffer descriptors over PCI, it can only view memory. If a write is > still in the PowerPC cache, the host will get stale data. The access over the bus is cache-coherent, unless you are on one of the more obscure powerpc implementations. This means you do not have a problem with data still being in cache. However, you need to make sure that data arrives in the right order. DMA read accesses over PCI may be reordered, and you need a wmb() between two memory stores if you want to be sure that the host sees them in the correct order. > > > Yep, I tried to do this. I couldn't figure out a sane ordering that > > > would work. I tried to keep the network and uart as seperate as possible > > > in the code. > > > > I'd suggest splitting the uart code into a separate driver then. > > > > How? In Linux we can only have one driver for a certain set of hardware. > I use the messaging unit to do both network (interrupts and status bits) > and uart (interrupts and message transfer). > > Both the network and uart _must_ run at the same time. This way I can > type into the bootloader prompt to start a network transfer, and watch > it complete. > > Remember, I can't have a real serial console plugged into this board. > I'll be using this with about 150 boards in 8 separate chassis, which > makes cabling a nightmare. I'm trying to do as much as possible with the > PCI backplane. When splitting out the hardware specific parts, I would write a device driver for the messaging unit that knows about neither the uart nor the network (or any other high-level protocol). It's a bit more complicated to load the two high-level drivers in that case, but one clean way to do it would be to instantiate a new bus-type from the MU driver and have that driver register devices for itself. Then you can load the high-level driver through udev or have them built into the kernel. To get really fancy, you could find a way for the host to announce what protocols are supported on though the MU. A use case for that, which I have been thinking about before, would be to allow the host to set up direct virtual point-to-point networks between two endpoints, not involving the host at all once the device is up. Arnd <><
On Wed, Nov 05, 2008 at 02:19:52PM +0100, Arnd Bergmann wrote: > On Tuesday 04 November 2008, Ira Snyder wrote: > > > So, since interrupts are disabled while my interrupt handler is running, > > I think I should be able to use spin_lock() and spin_unlock(), correct? > > yes. > > > But sparse gives me the following warning: > > wqt.c:185:9: warning: context imbalance in 'wqt_interrupt': wrong count at exit > > wqt.c:185:9: context 'lock': wanted 0, got 1 > > > > If I'm using spin_lock_irqsave() and spin_lock_irqrestore() I do not get > > the same warnings. Therefore I must have some misunderstanding :) > > > I've seen something like that before, I think it was a bug either in > sparse or in the powerpc platform code. Try updating both the kernel > and sparse and see if it still happens. > Ok, I had the latest sparse, but I've been developing the host driver against Ubuntu 8.04's kernel, which is based on 2.6.24. Using Linus' git tree resolved the errors from sparse. I ended up finding that the dma_mapping_error() API has changed, and so I updated the driver for that. It should build against 2.6.28-rc now. I'll post it in a little while. Ira
On Wed, Nov 05, 2008 at 02:50:59PM +0100, Arnd Bergmann wrote: > On Tuesday 04 November 2008, Ira Snyder wrote: > > On Tue, Nov 04, 2008 at 09:23:03PM +0100, Arnd Bergmann wrote: > > > On Tuesday 04 November 2008, Ira Snyder wrote: > > > > I don't really know how to do that. I got a warning here from sparse > > > > telling me something about expensive pointer subtraction. Adding a dummy > > > > 32bit padding variable got rid of the warning, but I didn't change the > > > > driver. > > > > > > Ok, I see. However, adding the packed attribute makes it more expensive > > > to use. > > > > > > > Ok. Is there any way to make sure that the structure compiles to the > > same representation on the host and agent system without using packed? > > Only knowledge about the alignment on all the possible architectures ;-) > As a simplified rule, always pad every struct member to the largest > other member in the struct and always use explicitly sized types like > __u8 or __le32. > Ok, I tried using: struct circ_buf_desc { __le32 sc; __le32 len; __le32 addr; __le32 padding; }; The structure came out the same size on both x86 and powerpc, and the driver still works. I'll put this change in the driver. > > Hopefully that's a good description. :) It seems to me that both sides > > of the connection need to read the descriptors (to get packet length, > > clean up dirty packets, etc.) and write them (to set packet length, mark > > packets dirty, etc.) I just can't come up with something that is > > local-read / remote-write only. > > If I understand your description correctly, the only remote read is > when the host accesses the buffer descriptors to find free space. > Avoiding this read access may improve the latency a bit. In our ring > buffer concept, both host and endpoint allocate a memory buffer that > gets ioremapped into the remote side. Since you always need to read > the descriptors from powerpc, you should probably keep them in powerpc > memory, but you can change the code so that for finding the next > free entry, the host will look in its own memory for the number of the > next entry, and the powerpc side will write that when it consumes a > descriptor to mark it as free. > There are a few remote reads (from the host). 1) in hard_start_xmit() to make sure the queue is stopped 2) in wqt_tx_complete() to find the buffer descriptors that have been consumed 3) in wqt_rx_napi() to find the buffer descriptors that have been dirtied 4) in wqt_rx_napi() to get the actual packet length of a dirty buffer I /could/ come up with a scheme to use only writes, but it seems much too complicated for a little performance. I'll keep it in mind, and change it during performance tuning (if needed). > > > Which side allocates them anyway? Since you use ioread32/iowrite32 > > > on the ppc side, it looks like they are on the PCI host, which does > > > not seem to make much sense, because the ppc memory is much closer > > > to the DMA engine? > > > > > > > The PowerPC allocates them. They are accessible via PCI BAR1. They live > > in regular RAM on the PowerPC. I can't remember why I used > > ioread32/iowrite32 anymore. I'll try again with in_le32()/out_le32() on > > the PowerPC system, and see what happens. > > Actually, if they are in powerpc RAM, you must not neither in_le32 nor > ioread32. Both are only well-defined on I/O devices (local bus or PCI, > respectively). Instead, you should use directly access the buffer using > pointer dereferences, and use rmb()/wmb() to make sure anything you > access is synchronized with the host. > I've changed it to just do pointer dereferences, and it works just fine. I added appropriate wmb() (I think...) > > > Obviously, you want the DMA engine to do the data transfers, but here, you > > > use ioread32 for mmio transfers to the descriptors, which is slow. > > > > > > > I didn't know it was slow :) Maybe this is why I had to make the MTU > > very large to get good speed. Using a standard 1500 byte MTU I get > > <10 MB/sec transfer speed. Using a 64K MTU, I get ~45MB/sec transfer > > speed. > > > > Do I need to do any sort of flushing to make sure that the read has > > actually gone out of cache and into memory? When the host accesses the > > buffer descriptors over PCI, it can only view memory. If a write is > > still in the PowerPC cache, the host will get stale data. > > The access over the bus is cache-coherent, unless you are on one of the > more obscure powerpc implementations. This means you do not have a > problem with data still being in cache. However, you need to make > sure that data arrives in the right order. DMA read accesses over PCI > may be reordered, and you need a wmb() between two memory stores if you > want to be sure that the host sees them in the correct order. > Ok. > > > > Yep, I tried to do this. I couldn't figure out a sane ordering that > > > > would work. I tried to keep the network and uart as seperate as possible > > > > in the code. > > > > > > I'd suggest splitting the uart code into a separate driver then. > > > > > > > How? In Linux we can only have one driver for a certain set of hardware. > > I use the messaging unit to do both network (interrupts and status bits) > > and uart (interrupts and message transfer). > > > > Both the network and uart _must_ run at the same time. This way I can > > type into the bootloader prompt to start a network transfer, and watch > > it complete. > > > > Remember, I can't have a real serial console plugged into this board. > > I'll be using this with about 150 boards in 8 separate chassis, which > > makes cabling a nightmare. I'm trying to do as much as possible with the > > PCI backplane. > > When splitting out the hardware specific parts, I would write a device > driver for the messaging unit that knows about neither the uart nor the > network (or any other high-level protocol). It's a bit more complicated to > load the two high-level drivers in that case, but one clean way to do > it would be to instantiate a new bus-type from the MU driver and have > that driver register devices for itself. Then you can load the high-level > driver through udev or have them built into the kernel. > > To get really fancy, you could find a way for the host to announce what > protocols are supported on though the MU. A use case for that, which I > have been thinking about before, would be to allow the host to set up > direct virtual point-to-point networks between two endpoints, not involving > the host at all once the device is up. > This is getting pretty complicated, especially for someone just getting started writing drivers. :) It seems like you'd need a set of functions to access specific interrupts / message boxes, maybe something like the gpio interface (where each gpio pin has an integer number). Both sides have to agree on what each interrupt means, and which mailbox data transfer happens in. So each driver would still be pretty closely tied to the hardware. It seems like too much for someone like me to design. I'll leave it up to the pro's. Ira
diff --git a/arch/powerpc/boot/dts/mpc834x_mds.dts b/arch/powerpc/boot/dts/mpc834x_mds.dts index c986c54..3bc8975 100644 --- a/arch/powerpc/boot/dts/mpc834x_mds.dts +++ b/arch/powerpc/boot/dts/mpc834x_mds.dts @@ -104,6 +104,13 @@ mode = "cpu"; }; + message-unit@8030 { + compatible = "fsl,mpc8349-mu"; + reg = <0x8030 0xd0>; + interrupts = <69 0x8>; + interrupt-parent = <&ipic>; + }; + dma@82a8 { #address-cells = <1>; #size-cells = <1>; diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index f749b40..eef7af7 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2279,6 +2279,40 @@ config UGETH_TX_ON_DEMAND bool "Transmit on Demand support" depends on UCC_GETH +config PCINET_FSL + tristate "PCINet Virtual Ethernet over PCI support (Freescale)" + depends on MPC834x_MDS && !PCI + select DMA_ENGINE + select FSL_DMA + help + When running as a PCI Agent, this driver will create a virtual + ethernet link running over the PCI bus, allowing simplified + communication with the host system. The host system will need + to use the corresponding driver. + + If in doubt, say N. + +config PCINET_HOST + tristate "PCINet Virtual Ethernet over PCI support (Host)" + depends on PCI + help + This driver will let you communicate with a PCINet client device + using a virtual ethernet link running over the PCI bus. This + allows simplified communication with the client system. + + This is inteded for use in a system that has a crate full of + computers running Linux, all connected by a PCI backplane. + + If in doubt, say N. + +config PCINET_DISABLE_CHECKSUM + bool "Disable packet checksumming" + depends on PCINET_FSL || PCINET_HOST + default n + help + Disable packet checksumming on packets received by the PCINet + driver. This gives a possible speed boost. + config MV643XX_ETH tristate "Marvell Discovery (643XX) and Orion ethernet support" depends on MV64360 || MV64X60 || (PPC_MULTIPLATFORM && PPC32) || PLAT_ORION diff --git a/drivers/net/Makefile b/drivers/net/Makefile index f19acf8..c6fbafc 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -30,6 +30,9 @@ gianfar_driver-objs := gianfar.o \ obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o ucc_geth_driver-objs := ucc_geth.o ucc_geth_mii.o ucc_geth_ethtool.o +obj-$(CONFIG_PCINET_FSL) += pcinet_fsl.o +obj-$(CONFIG_PCINET_HOST) += pcinet_host.o + # # link order important here # diff --git a/drivers/net/pcinet.h b/drivers/net/pcinet.h new file mode 100644 index 0000000..66d2cba --- /dev/null +++ b/drivers/net/pcinet.h @@ -0,0 +1,75 @@ +/* + * Shared Definitions for the PCINet / PCISerial drivers + * + * Copyright (c) 2008 Ira W. Snyder <iws@ovro.caltech.edu> + * + * Heavily inspired by the drivers/net/fs_enet driver + * + * This file is licensed under the terms of the GNU General Public License + * version 2. This program is licensed "as is" without any warranty of any + * kind, whether express or implied. + */ + +#ifndef PCINET_H +#define PCINET_H + +#include <linux/kernel.h> +#include <linux/if_ether.h> + +/* Ring and Frame size -- these must match between the drivers */ +#define PH_RING_SIZE (64) +#define PH_MAX_FRSIZE (64 * 1024) +#define PH_MAX_MTU (PH_MAX_FRSIZE - ETH_HLEN) + +struct circ_buf_desc { + __le32 sc; + __le32 len; + __le32 addr; +} __attribute__((__packed__)); +typedef struct circ_buf_desc cbd_t; + +/* Buffer Descriptor Accessors */ +#define CBDW_SC(_cbd, _sc) iowrite32((_sc), &(_cbd)->sc) +#define CBDW_LEN(_cbd, _len) iowrite32((_len), &(_cbd)->len) +#define CBDW_ADDR(_cbd, _addr) iowrite32((_addr), &(_cbd)->addr) + +#define CBDR_SC(_cbd) ioread32(&(_cbd)->sc) +#define CBDR_LEN(_cbd) ioread32(&(_cbd)->len) +#define CBDR_ADDR(_cbd) ioread32(&(_cbd)->addr) + +/* Buffer Descriptor Registers */ +#define PCINET_TXBD_BASE 0x400 +#define PCINET_RXBD_BASE 0x800 + +/* Buffer Descriptor Status */ +#define BD_MEM_READY 0x1 +#define BD_MEM_DIRTY 0x2 +#define BD_MEM_FREE 0x3 + +/* IMMR Accessor Helpers */ +#define IMMR_R32(_off) ioread32(priv->immr+(_off)) +#define IMMR_W32(_off, _val) iowrite32((_val), priv->immr+(_off)) +#define IMMR_R32BE(_off) ioread32be(priv->immr+(_off)) +#define IMMR_W32BE(_off, _val) iowrite32be((_val), priv->immr+(_off)) + +/* Status Register Bits */ +#define PCINET_UART_RX_ENABLED (1<<0) +#define PCINET_NET_STATUS_RUNNING (1<<1) +#define PCINET_NET_RXINT_OFF (1<<2) +#define PCINET_NET_REGISTERS_VALID (1<<3) + +/* Driver State Bits */ +#define NET_STATE_STOPPED 0 +#define NET_STATE_RUNNING 1 + +/* Doorbell Registers */ +#define UART_RX_READY_DBELL (1<<0) +#define UART_TX_EMPTY_DBELL (1<<1) +#define NET_RX_PACKET_DBELL (1<<2) +#define NET_TX_COMPLETE_DBELL (1<<3) +#define NET_START_REQ_DBELL (1<<4) +#define NET_START_ACK_DBELL (1<<5) +#define NET_STOP_REQ_DBELL (1<<6) +#define NET_STOP_ACK_DBELL (1<<7) + +#endif /* PCINET_H */ diff --git a/drivers/net/pcinet_fsl.c b/drivers/net/pcinet_fsl.c new file mode 100644 index 0000000..b03764c --- /dev/null +++ b/drivers/net/pcinet_fsl.c @@ -0,0 +1,1351 @@ +/* + * PCINet and PCISerial Driver for Freescale MPC8349EMDS + * + * Copyright (c) 2008 Ira W. Snyder <iws@ovro.caltech.edu> + * + * Heavily inspired by the drivers/net/fs_enet driver + * + * This file is licensed under the terms of the GNU General Public License + * version 2. This program is licensed "as is" without any warranty of any + * kind, whether express or implied. + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/init.h> +#include <linux/of_platform.h> +#include <linux/sched.h> +#include <linux/wait.h> +#include <linux/interrupt.h> +#include <linux/irqreturn.h> +#include <linux/delay.h> +#include <linux/io.h> +#include <linux/serial.h> +#include <linux/serial_core.h> +#include <linux/etherdevice.h> +#include <linux/mutex.h> +#include <linux/dmaengine.h> + +#include "pcinet.h" +#include "pcinet_hw.h" + +static const char driver_name[] = "wqt"; + +static void wqtuart_rx_char(struct uart_port *port, const char ch); +static void wqtuart_stop_tx(struct uart_port *port); + +struct wqt_dev; +typedef void (*wqt_irqhandler_t)(struct wqt_dev *); + +struct wqt_irqhandlers { + wqt_irqhandler_t net_start_req_handler; + wqt_irqhandler_t net_start_ack_handler; + wqt_irqhandler_t net_stop_req_handler; + wqt_irqhandler_t net_stop_ack_handler; + wqt_irqhandler_t net_rx_packet_handler; + wqt_irqhandler_t net_tx_complete_handler; + wqt_irqhandler_t uart_rx_ready_handler; + wqt_irqhandler_t uart_tx_empty_handler; +}; + +struct wqt_dev { + /*--------------------------------------------------------------------*/ + /* OpenFirmware Infrastructure */ + /*--------------------------------------------------------------------*/ + struct of_device *op; + struct device *dev; + int irq; + void __iomem *immr; + + struct mutex irq_mutex; + int interrupt_count; + + spinlock_t irq_lock; + struct wqt_irqhandlers handlers; + + /*--------------------------------------------------------------------*/ + /* UART Device Infrastructure */ + /*--------------------------------------------------------------------*/ + struct uart_port port; + bool uart_rx_enabled; + bool uart_open; + + struct workqueue_struct *wq; + struct work_struct uart_tx_work; + wait_queue_head_t uart_tx_wait; /* sleep for uart_tx_ready */ + bool uart_tx_ready; /* transmitter state */ + + /*--------------------------------------------------------------------*/ + /* Ethernet Device Infrastructure */ + /*--------------------------------------------------------------------*/ + struct net_device *ndev; + void __iomem *netregs; + dma_addr_t netregs_addr; + + /* Outstanding SKB */ + struct sk_buff *tx_skbs[PH_RING_SIZE]; + + /* Circular Buffer Descriptor base */ + cbd_t __iomem *rx_base; + cbd_t __iomem *tx_base; + + /* Current SKB index */ + cbd_t __iomem *cur_rx; + cbd_t __iomem *cur_tx; + cbd_t __iomem *dirty_tx; + int tx_free; + + struct tasklet_struct tx_complete_tasklet; + spinlock_t net_lock; + + struct mutex net_mutex; + int net_state; + struct work_struct net_start_work; + struct work_struct net_stop_work; + struct completion net_start_completion; + struct completion net_stop_completion; + struct napi_struct napi; + + struct dma_client client; + struct dma_chan *chan; +}; + +/*----------------------------------------------------------------------------*/ +/* Status Register Helper Operations */ +/*----------------------------------------------------------------------------*/ + +static DEFINE_SPINLOCK(status_lock); + +static void wqtstatus_setbit(struct wqt_dev *priv, u32 bit) +{ + unsigned long flags; + + spin_lock_irqsave(&status_lock, flags); + IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) | bit); + spin_unlock_irqrestore(&status_lock, flags); +} + +static void wqtstatus_clrbit(struct wqt_dev *priv, u32 bit) +{ + unsigned long flags; + + spin_lock_irqsave(&status_lock, flags); + IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) & ~bit); + spin_unlock_irqrestore(&status_lock, flags); +} + +static int wqtstatus_remote_testbit(struct wqt_dev *priv, u32 bit) +{ + return IMMR_R32(IMR1_OFFSET) & bit; +} + +/*----------------------------------------------------------------------------*/ +/* Message Sending and Processing Operations */ +/*----------------------------------------------------------------------------*/ + +static irqreturn_t wqt_interrupt(int irq, void *dev_id) +{ + struct wqt_dev *priv = dev_id; + u32 imisr, idr; + unsigned long flags; + + imisr = IMMR_R32(IMISR_OFFSET); + idr = IMMR_R32(IDR_OFFSET); + + if (!(imisr & 0x8)) + return IRQ_NONE; + + /* Clear all of the interrupt sources, we'll handle them next */ + IMMR_W32(IDR_OFFSET, idr); + + /* Lock over all of the handlers, so they cannot get called when + * the code doesn't expect them to be called */ + spin_lock_irqsave(&priv->irq_lock, flags); + + if (idr & UART_RX_READY_DBELL) + priv->handlers.uart_rx_ready_handler(priv); + + if (idr & UART_TX_EMPTY_DBELL) + priv->handlers.uart_tx_empty_handler(priv); + + if (idr & NET_RX_PACKET_DBELL) + priv->handlers.net_rx_packet_handler(priv); + + if (idr & NET_TX_COMPLETE_DBELL) + priv->handlers.net_tx_complete_handler(priv); + + if (idr & NET_START_REQ_DBELL) + priv->handlers.net_start_req_handler(priv); + + if (idr & NET_START_ACK_DBELL) + priv->handlers.net_start_ack_handler(priv); + + if (idr & NET_STOP_REQ_DBELL) + priv->handlers.net_stop_req_handler(priv); + + if (idr & NET_STOP_ACK_DBELL) + priv->handlers.net_stop_ack_handler(priv); + + spin_unlock_irqrestore(&priv->irq_lock, flags); + + return IRQ_HANDLED; +} + +/* Send a character through the mbox when it becomes available + * Blocking, must not be called with any spinlocks held */ +static int do_send_message(struct wqt_dev *priv, const char ch) +{ + struct uart_port *port = &priv->port; + bool tmp; + unsigned long flags; + + spin_lock_irqsave(&priv->irq_lock, flags); + while (priv->uart_tx_ready != true) { + spin_unlock_irqrestore(&priv->irq_lock, flags); + wait_event_timeout(priv->uart_tx_wait, priv->uart_tx_ready, HZ); + + spin_lock_irqsave(&port->lock, flags); + tmp = priv->uart_open; + spin_unlock_irqrestore(&port->lock, flags); + + if (!tmp) + return -EIO; + + spin_lock_irqsave(&priv->irq_lock, flags); + } + + /* Now the transmitter is free, send the message */ + IMMR_W32(OMR0_OFFSET, ch); + IMMR_W32(ODR_OFFSET, UART_RX_READY_DBELL); + + /* Mark the transmitter busy */ + priv->uart_tx_ready = false; + spin_unlock_irqrestore(&priv->irq_lock, flags); + return 0; +} + +/* Grab a character out of the uart tx buffer and send it */ +static void uart_tx_work_fn(struct work_struct *work) +{ + struct wqt_dev *priv = container_of(work, struct wqt_dev, uart_tx_work); + struct uart_port *port = &priv->port; + struct circ_buf *xmit = &port->info->xmit; + char ch; + unsigned long flags; + + spin_lock_irqsave(&port->lock, flags); + while (true) { + + /* Check for XON/XOFF (high priority) */ + if (port->x_char) { + ch = port->x_char; + port->x_char = 0; + spin_unlock_irqrestore(&port->lock, flags); + + if (do_send_message(priv, ch)) + return; + + spin_lock_irqsave(&port->lock, flags); + continue; + } + + /* If we're out of chars or the port is stopped, we're done */ + if (uart_circ_empty(xmit) || uart_tx_stopped(port)) { + wqtuart_stop_tx(port); + break; + } + + /* Grab the next char out of the buffer and send it */ + ch = xmit->buf[xmit->tail]; + xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1); + spin_unlock_irqrestore(&port->lock, flags); + + if (do_send_message(priv, ch)) + return; + + spin_lock_irqsave(&port->lock, flags); + } + + if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) + uart_write_wakeup(port); + + if (uart_circ_empty(xmit)) + wqtuart_stop_tx(port); + + spin_unlock_irqrestore(&port->lock, flags); +} + +/*----------------------------------------------------------------------------*/ +/* Interrupt Handlers */ +/*----------------------------------------------------------------------------*/ + +/* NOTE: All handlers are called with priv->irq_lock held */ + +static void empty_handler(struct wqt_dev *priv) +{ + /* Intentionally left empty */ +} + +static void net_start_req_handler(struct wqt_dev *priv) +{ + schedule_work(&priv->net_start_work); +} + +static void net_start_ack_handler(struct wqt_dev *priv) +{ + complete(&priv->net_start_completion); +} + +static void net_stop_req_handler(struct wqt_dev *priv) +{ + schedule_work(&priv->net_stop_work); +} + +static void net_stop_ack_handler(struct wqt_dev *priv) +{ + complete(&priv->net_stop_completion); +} + +static void net_tx_complete_handler(struct wqt_dev *priv) +{ + tasklet_schedule(&priv->tx_complete_tasklet); +} + +static void net_rx_packet_handler(struct wqt_dev *priv) +{ + wqtstatus_setbit(priv, PCINET_NET_RXINT_OFF); + netif_rx_schedule(priv->ndev, &priv->napi); +} + +static void uart_rx_ready_handler(struct wqt_dev *priv) +{ + wqtuart_rx_char(&priv->port, IMMR_R32(IMR0_OFFSET) & 0xff); + IMMR_W32(ODR_OFFSET, UART_TX_EMPTY_DBELL); +} + +static void uart_tx_empty_handler(struct wqt_dev *priv) +{ + priv->uart_tx_ready = true; + wake_up(&priv->uart_tx_wait); +} + +/*----------------------------------------------------------------------------*/ +/* Interrupt Request / Free Helpers */ +/*----------------------------------------------------------------------------*/ + +static void do_enable_net_startstop_handlers(struct wqt_dev *priv) +{ + unsigned long flags; + + spin_lock_irqsave(&priv->irq_lock, flags); + priv->handlers.net_start_req_handler = net_start_req_handler; + priv->handlers.net_start_ack_handler = net_start_ack_handler; + priv->handlers.net_stop_req_handler = net_stop_req_handler; + priv->handlers.net_stop_ack_handler = net_stop_ack_handler; + spin_unlock_irqrestore(&priv->irq_lock, flags); + + wqtstatus_setbit(priv, PCINET_NET_STATUS_RUNNING); +} + +static void do_disable_net_startstop_handlers(struct wqt_dev *priv) +{ + unsigned long flags; + + wqtstatus_clrbit(priv, PCINET_NET_STATUS_RUNNING); + + spin_lock_irqsave(&priv->irq_lock, flags); + priv->handlers.net_start_req_handler = empty_handler; + priv->handlers.net_start_ack_handler = empty_handler; + priv->handlers.net_stop_req_handler = empty_handler; + priv->handlers.net_stop_ack_handler = empty_handler; + spin_unlock_irqrestore(&priv->irq_lock, flags); +} + +static void do_enable_net_rxtx_handlers(struct wqt_dev *priv) +{ + unsigned long flags; + + spin_lock_irqsave(&priv->irq_lock, flags); + priv->handlers.net_rx_packet_handler = net_rx_packet_handler; + priv->handlers.net_tx_complete_handler = net_tx_complete_handler; + spin_unlock_irqrestore(&priv->irq_lock, flags); +} + +static void do_disable_net_rxtx_handlers(struct wqt_dev *priv) +{ + unsigned long flags; + + spin_lock_irqsave(&priv->irq_lock, flags); + priv->handlers.net_rx_packet_handler = empty_handler; + priv->handlers.net_tx_complete_handler = empty_handler; + spin_unlock_irqrestore(&priv->irq_lock, flags); +} + +static void do_enable_uart_handlers(struct wqt_dev *priv) +{ + unsigned long flags; + + spin_lock_irqsave(&priv->irq_lock, flags); + priv->handlers.uart_rx_ready_handler = uart_rx_ready_handler; + priv->handlers.uart_tx_empty_handler = uart_tx_empty_handler; + spin_unlock_irqrestore(&priv->irq_lock, flags); +} + +static void do_disable_uart_handlers(struct wqt_dev *priv) +{ + unsigned long flags; + + spin_lock_irqsave(&priv->irq_lock, flags); + priv->handlers.uart_rx_ready_handler = empty_handler; + priv->handlers.uart_tx_empty_handler = empty_handler; + spin_unlock_irqrestore(&priv->irq_lock, flags); +} + +static int wqt_request_irq(struct wqt_dev *priv) +{ + int ret = 0; + + mutex_lock(&priv->irq_mutex); + + if (priv->interrupt_count > 0) + goto out_unlock; + + /* Force all handlers to be disabled before attaching the handler */ + do_disable_net_startstop_handlers(priv); + do_disable_net_rxtx_handlers(priv); + do_disable_uart_handlers(priv); + + ret = request_irq(priv->irq, + wqt_interrupt, + IRQF_SHARED, + priv->ndev->name, + priv); + +out_unlock: + priv->interrupt_count++; + mutex_unlock(&priv->irq_mutex); + + return ret; +} + +static void wqt_free_irq(struct wqt_dev *priv) +{ + mutex_lock(&priv->irq_mutex); + priv->interrupt_count--; + + if (priv->interrupt_count > 0) + goto out_unlock; + + free_irq(priv->irq, priv); + +out_unlock: + mutex_unlock(&priv->irq_mutex); +} + +/*----------------------------------------------------------------------------*/ +/* Network Startup and Shutdown Helpers */ +/*----------------------------------------------------------------------------*/ + +/* NOTE: All helper functions prefixed with "do" must be called only from + * process context, with priv->net_mutex held. They are expected to sleep */ + +static void do_net_start_queues(struct wqt_dev *priv) +{ + if (priv->net_state == NET_STATE_RUNNING) + return; + + dev_dbg(priv->dev, "resetting buffer positions\n"); + priv->cur_rx = priv->rx_base; + priv->cur_tx = priv->tx_base; + priv->dirty_tx = priv->tx_base; + priv->tx_free = PH_RING_SIZE; + + dev_dbg(priv->dev, "enabling NAPI queue\n"); + napi_enable(&priv->napi); + + dev_dbg(priv->dev, "enabling tx_complete() tasklet\n"); + tasklet_enable(&priv->tx_complete_tasklet); + + dev_dbg(priv->dev, "enabling TX queue\n"); + netif_start_queue(priv->ndev); + + dev_dbg(priv->dev, "carrier on!\n"); + netif_carrier_on(priv->ndev); + + /* Enable the RX_PACKET and TX_COMPLETE interrupt handlers */ + do_enable_net_rxtx_handlers(priv); + + priv->net_state = NET_STATE_RUNNING; +} + +static void do_net_stop_queues(struct wqt_dev *priv) +{ + if (priv->net_state == NET_STATE_STOPPED) + return; + + /* Disable the RX_PACKET and TX_COMPLETE interrupt handlers */ + do_disable_net_rxtx_handlers(priv); + + dev_dbg(priv->dev, "disabling NAPI queue\n"); + napi_disable(&priv->napi); + + dev_dbg(priv->dev, "disabling tx_complete() tasklet\n"); + tasklet_disable(&priv->tx_complete_tasklet); + + dev_dbg(priv->dev, "disabling TX queue\n"); + netif_tx_disable(priv->ndev); + + dev_dbg(priv->dev, "carrier off!\n"); + netif_carrier_off(priv->ndev); + + priv->net_state = NET_STATE_STOPPED; +} + +/* Called when we get a request to start our queues and acknowledge */ +static void wqtnet_start_work_fn(struct work_struct *work) +{ + struct wqt_dev *priv = container_of(work, struct wqt_dev, + net_start_work); + + mutex_lock(&priv->net_mutex); + + do_net_start_queues(priv); + IMMR_W32(ODR_OFFSET, NET_START_ACK_DBELL); + + mutex_unlock(&priv->net_mutex); +} + +/* Called when we get a request to stop our queues and acknowledge */ +static void wqtnet_stop_work_fn(struct work_struct *work) +{ + struct wqt_dev *priv = container_of(work, struct wqt_dev, + net_stop_work); + + mutex_lock(&priv->net_mutex); + + do_net_stop_queues(priv); + IMMR_W32(ODR_OFFSET, NET_STOP_ACK_DBELL); + + mutex_unlock(&priv->net_mutex); +} + +/*----------------------------------------------------------------------------*/ +/* DMA Operation Helpers */ +/*----------------------------------------------------------------------------*/ + +/* Setup a static 1GB window starting at PCI address 0x0 + * + * This means that all DMA must be within the first 1GB of the other side's + * memory, which shouldn't be a problem + */ +static int wqtdma_setup_outbound_window(struct wqt_dev *priv) +{ + IMMR_W32BE(LAWAR0_OFFSET, LAWAR0_ENABLE | 0x1d); + IMMR_W32BE(POCMR0_OFFSET, POCMR0_ENABLE | 0xc0000); + IMMR_W32BE(POTAR0_OFFSET, 0x0); + + return 0; +} + +static enum dma_state_client dmatest_event(struct dma_client *client, + struct dma_chan *chan, + enum dma_state state) +{ + struct wqt_dev *priv = container_of(client, struct wqt_dev, client); + enum dma_state_client ack = DMA_NAK; + + switch (state) { + case DMA_RESOURCE_AVAILABLE: + if (chan == priv->chan) + ack = DMA_DUP; + else if (priv->chan) + ack = DMA_NAK; + else { + priv->chan = chan; + ack = DMA_ACK; + } + + break; + + case DMA_RESOURCE_REMOVED: + priv->chan = NULL; + ack = DMA_ACK; + break; + + default: + dev_dbg(priv->dev, "unhandled DMA event %u (%s)\n", + state, chan->dev.bus_id); + break; + } + + return ack; +} + +static dma_cookie_t dma_async_memcpy_raw_to_buf(struct dma_chan *chan, + void *dest, + dma_addr_t src, + size_t len) +{ + struct dma_device *dev = chan->device; + struct dma_async_tx_descriptor *tx; + dma_addr_t dma_dest, dma_src; + dma_cookie_t cookie; + int cpu; + + dma_src = src; + dma_dest = dma_map_single(dev->dev, dest, len, DMA_FROM_DEVICE); + tx = dev->device_prep_dma_memcpy(chan, dma_dest, dma_src, len, + DMA_CTRL_ACK); + + if (!tx) { + dma_unmap_single(dev->dev, dma_dest, len, DMA_FROM_DEVICE); + return -ENOMEM; + } + + tx->callback = NULL; + cookie = tx->tx_submit(tx); + + cpu = get_cpu(); + per_cpu_ptr(chan->local, cpu)->bytes_transferred += len; + per_cpu_ptr(chan->local, cpu)->memcpy_count++; + put_cpu(); + + return cookie; +} + +static dma_cookie_t dma_async_memcpy_buf_to_raw(struct dma_chan *chan, + dma_addr_t dest, + void *src, + size_t len) +{ + struct dma_device *dev = chan->device; + struct dma_async_tx_descriptor *tx; + dma_addr_t dma_dest, dma_src; + dma_cookie_t cookie; + int cpu; + + dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE); + dma_dest = dest; + tx = dev->device_prep_dma_memcpy(chan, dma_dest, dma_src, len, + DMA_CTRL_ACK); + + if (!tx) { + dma_unmap_single(dev->dev, dma_src, len, DMA_TO_DEVICE); + return -ENOMEM; + } + + tx->callback = NULL; + cookie = tx->tx_submit(tx); + + cpu = get_cpu(); + per_cpu_ptr(chan->local, cpu)->bytes_transferred += len; + per_cpu_ptr(chan->local, cpu)->memcpy_count++; + put_cpu(); + + return cookie; +} + +/*----------------------------------------------------------------------------*/ +/* Network Device Operations */ +/*----------------------------------------------------------------------------*/ + +static int wqt_open(struct net_device *dev) +{ + struct wqt_dev *priv = netdev_priv(dev); + int ret; + + /* Pretend the cable is unplugged until we are up and running */ + netif_carrier_off(dev); + + mutex_lock(&priv->net_mutex); + + ret = wqt_request_irq(priv); + + if (ret) + goto out_unlock; + + /* Enable only the network start/stop interrupts */ + do_enable_net_startstop_handlers(priv); + + /* Check if the other side is running, if not, it will start us. + * Without the interrupt handler installed, there's no way it can + * respond to us anyway */ + if (!wqtstatus_remote_testbit(priv, PCINET_NET_STATUS_RUNNING)) { + ret = 0; + goto out_unlock; + } + + IMMR_W32(ODR_OFFSET, NET_START_REQ_DBELL); + ret = wait_for_completion_timeout(&priv->net_start_completion, 5*HZ); + + if (!ret) { + /* Our start request timed out, therefore, the other + * side will start us when it comes back up */ + dev_dbg(priv->dev, "start timed out\n"); + } else { + do_net_start_queues(priv); + ret = 0; + } + +out_unlock: + mutex_unlock(&priv->net_mutex); + return ret; +} + +static int wqt_stop(struct net_device *dev) +{ + struct wqt_dev *priv = netdev_priv(dev); + int ret; + + mutex_lock(&priv->net_mutex); + + do_net_stop_queues(priv); + + IMMR_W32(ODR_OFFSET, NET_STOP_REQ_DBELL); + ret = wait_for_completion_timeout(&priv->net_stop_completion, 5*HZ); + + if (!ret) + dev_warn(priv->dev, "other side did not stop on time!\n"); + else + ret = 0; + + do_disable_net_startstop_handlers(priv); + wqt_free_irq(priv); + + mutex_unlock(&priv->net_mutex); + return 0; +} + +static int wqt_change_mtu(struct net_device *dev, int new_mtu) +{ + if ((new_mtu < 68) || (new_mtu > PH_MAX_MTU)) + return -EINVAL; + + dev->mtu = new_mtu; + return 0; +} + +static int wqt_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct wqt_dev *priv = netdev_priv(dev); + dma_cookie_t cookie; + enum dma_status status; + cbd_t __iomem *bdp; + int dirty_idx; + + spin_lock_bh(&priv->net_lock); + + bdp = priv->cur_tx; + dirty_idx = bdp - priv->tx_base; + + /* This should not happen, the queue should be stopped */ + if (priv->tx_free == 0 || CBDR_SC(bdp) != BD_MEM_READY) { + netif_stop_queue(dev); + spin_unlock_bh(&priv->net_lock); + return NETDEV_TX_BUSY; + } + + cookie = dma_async_memcpy_buf_to_raw(priv->chan, + (dma_addr_t)(0x80000000 + CBDR_ADDR(bdp)), + skb->data, + skb->len); + + if (dma_submit_error(cookie)) { + dev_warn(priv->dev, "DMA submit error\n"); + spin_unlock_bh(&priv->net_lock); + return -ENOMEM; + } + + status = dma_sync_wait(priv->chan, cookie); + + if (status == DMA_ERROR) { + dev_warn(priv->dev, "DMA error\n"); + spin_unlock_bh(&priv->net_lock); + return -EIO; + } + + CBDW_LEN(bdp, skb->len); + CBDW_SC(bdp, BD_MEM_DIRTY); + + if (dirty_idx == PH_RING_SIZE - 1) + bdp = priv->tx_base; + else + bdp++; + + priv->tx_skbs[dirty_idx] = skb; + priv->cur_tx = bdp; + priv->tx_free--; + dev->trans_start = jiffies; + + if (priv->tx_free == 0) + netif_stop_queue(dev); + + if (!wqtstatus_remote_testbit(priv, PCINET_NET_RXINT_OFF)) + IMMR_W32(ODR_OFFSET, NET_RX_PACKET_DBELL); + + spin_unlock_bh(&priv->net_lock); + return NETDEV_TX_OK; +} + +static void wqt_tx_timeout(struct net_device *dev) +{ + struct wqt_dev *priv = netdev_priv(dev); + + dev->stats.tx_errors++; + IMMR_W32(ODR_OFFSET, NET_RX_PACKET_DBELL); +} + +static void wqt_tx_complete(unsigned long data) +{ + struct net_device *dev = (struct net_device *)data; + struct wqt_dev *priv = netdev_priv(dev); + struct sk_buff *skb; + cbd_t __iomem *bdp; + int do_wake, dirty_idx; + + spin_lock_bh(&priv->net_lock); + + bdp = priv->dirty_tx; + do_wake = 0; + + while (CBDR_SC(bdp) == BD_MEM_FREE) { + dirty_idx = bdp - priv->tx_base; + + skb = priv->tx_skbs[dirty_idx]; + + BUG_ON(skb == NULL); + + dev->stats.tx_bytes += skb->len; + dev->stats.tx_packets++; + + dev_kfree_skb_irq(skb); + + priv->tx_skbs[dirty_idx] = NULL; + + /* Mark the BDP as ready */ + CBDW_SC(bdp, BD_MEM_READY); + + /* Update the bdp */ + if (dirty_idx == PH_RING_SIZE - 1) + bdp = priv->tx_base; + else + bdp++; + + if (!priv->tx_free++) + do_wake = 1; + } + + priv->dirty_tx = bdp; + + spin_unlock_bh(&priv->net_lock); + + if (do_wake) + netif_wake_queue(dev); +} + +static int wqt_rx_napi(struct napi_struct *napi, int budget) +{ + struct wqt_dev *priv = container_of(napi, struct wqt_dev, napi); + struct net_device *dev = priv->ndev; + int received = 0; + struct sk_buff *skb; + dma_addr_t remote_addr; + dma_cookie_t cookie; + enum dma_status status; + int pkt_len, dirty_idx; + cbd_t __iomem *bdp; + + bdp = priv->cur_rx; + + while (CBDR_SC(bdp) == BD_MEM_DIRTY) { + dirty_idx = bdp - priv->rx_base; + + pkt_len = CBDR_LEN(bdp); + remote_addr = CBDR_ADDR(bdp); + + /* Allocate a packet for the data */ + skb = dev_alloc_skb(pkt_len + NET_IP_ALIGN); + + if (skb == NULL) { + dev->stats.rx_dropped++; + goto out_err; + } + + skb_reserve(skb, NET_IP_ALIGN); + + cookie = dma_async_memcpy_raw_to_buf(priv->chan, + skb->data, + (dma_addr_t)(0x80000000 + remote_addr), + pkt_len); + + if (dma_submit_error(cookie)) { + dev_warn(priv->dev, "DMA submit error\n"); + dev_kfree_skb_irq(skb); + dev->stats.rx_dropped++; + goto out_err; + } + + status = dma_sync_wait(priv->chan, cookie); + + if (status == DMA_ERROR) { + dev_warn(priv->dev, "DMA error\n"); + dev_kfree_skb_irq(skb); + dev->stats.rx_dropped++; + goto out_err; + } + + /* Push the packet into the network stack */ + skb_put(skb, pkt_len); + skb->protocol = eth_type_trans(skb, dev); +#ifdef CONFIG_PCINET_DISABLE_CHECKSUM + skb->ip_summed = CHECKSUM_UNNECESSARY; +#else + skb->ip_summed = CHECKSUM_NONE; +#endif + netif_receive_skb(skb); + received++; + dev->stats.rx_bytes += pkt_len; + dev->stats.rx_packets++; + +out_err: + CBDW_SC(bdp, BD_MEM_FREE); + + if (dirty_idx == PH_RING_SIZE - 1) + bdp = priv->rx_base; + else + bdp++; + + if (received >= budget) + break; + } + + priv->cur_rx = bdp; + + /* We have processed all packets that the adapter had, but it + * was less than our budget, stop polling */ + if (received < budget) { + netif_rx_complete(dev, napi); + wqtstatus_clrbit(priv, PCINET_NET_RXINT_OFF); + } + + IMMR_W32(ODR_OFFSET, NET_TX_COMPLETE_DBELL); + + return received; +} + +/*----------------------------------------------------------------------------*/ +/* UART Device Operations */ +/*----------------------------------------------------------------------------*/ + +static unsigned int wqtuart_tx_empty(struct uart_port *port) +{ + return TIOCSER_TEMT; +} + +static void wqtuart_set_mctrl(struct uart_port *port, unsigned int mctrl) +{ +} + +static unsigned int wqtuart_get_mctrl(struct uart_port *port) +{ + return TIOCM_CAR | TIOCM_DSR | TIOCM_CTS; +} + +static void wqtuart_stop_tx(struct uart_port *port) +{ +} + +static void wqtuart_start_tx(struct uart_port *port) +{ + struct wqt_dev *priv = container_of(port, struct wqt_dev, port); + + queue_work(priv->wq, &priv->uart_tx_work); +} + +static void wqtuart_stop_rx(struct uart_port *port) +{ + struct wqt_dev *priv = container_of(port, struct wqt_dev, port); + + do_disable_uart_handlers(priv); + priv->uart_rx_enabled = false; + wqtstatus_clrbit(priv, PCINET_UART_RX_ENABLED); +} + +static void wqtuart_enable_ms(struct uart_port *port) +{ +} + +static void wqtuart_break_ctl(struct uart_port *port, int break_state) +{ +} + +static int wqtuart_startup(struct uart_port *port) +{ + struct wqt_dev *priv = container_of(port, struct wqt_dev, port); + int ret; + + ret = wqt_request_irq(priv); + + if (ret) + return ret; + + do_enable_uart_handlers(priv); + + /* Mark the transmitter and receiver ready */ + priv->uart_tx_ready = true; + priv->uart_rx_enabled = true; + wqtstatus_setbit(priv, PCINET_UART_RX_ENABLED); + + /* Let the other side know that we are ready to receive chars now */ + IMMR_W32(ODR_OFFSET, UART_TX_EMPTY_DBELL); + priv->uart_open = true; + return 0; +} + +static void wqtuart_shutdown(struct uart_port *port) +{ + struct wqt_dev *priv = container_of(port, struct wqt_dev, port); + + wqt_free_irq(priv); + + /* Make sure the uart_tx_work_fn() exits cleanly */ + priv->uart_open = false; + wake_up(&priv->uart_tx_wait); +} + +static void wqtuart_set_termios(struct uart_port *port, + struct ktermios *termios, + struct ktermios *old) +{ +} + +static const char *wqtuart_type(struct uart_port *port) +{ + return "WQTUART"; +} + +static int wqtuart_request_port(struct uart_port *port) +{ + return 0; +} + +static void wqtuart_config_port(struct uart_port *port, int flags) +{ +} + +static void wqtuart_release_port(struct uart_port *port) +{ +} + +static int wqtuart_verify_port(struct uart_port *port, + struct serial_struct *ser) +{ + return 0; +} + +static void wqtuart_rx_char(struct uart_port *port, const char ch) +{ + struct wqt_dev *priv = container_of(port, struct wqt_dev, port); + struct tty_struct *tty; + unsigned long flags; + + spin_lock_irqsave(&port->lock, flags); + + if (priv->uart_rx_enabled) { + tty = port->info->port.tty; + tty_insert_flip_char(tty, ch, TTY_NORMAL); + tty_flip_buffer_push(tty); + } + + spin_unlock_irqrestore(&port->lock, flags); +} + +static struct uart_ops wqtuart_ops = { + .tx_empty = wqtuart_tx_empty, + .set_mctrl = wqtuart_set_mctrl, + .get_mctrl = wqtuart_get_mctrl, + .stop_tx = wqtuart_stop_tx, + .start_tx = wqtuart_start_tx, + .stop_rx = wqtuart_stop_rx, + .enable_ms = wqtuart_enable_ms, + .break_ctl = wqtuart_break_ctl, + .startup = wqtuart_startup, + .shutdown = wqtuart_shutdown, + .set_termios = wqtuart_set_termios, + .type = wqtuart_type, + .release_port = wqtuart_release_port, + .request_port = wqtuart_request_port, + .config_port = wqtuart_config_port, + .verify_port = wqtuart_verify_port, +}; + +static struct uart_driver wqtuart_driver = { + .owner = THIS_MODULE, + .driver_name = driver_name, + .dev_name = "ttyPCI", + .major = 240, + .minor = 0, + .nr = 1, +}; + +/*----------------------------------------------------------------------------*/ +/* Network Registers */ +/*----------------------------------------------------------------------------*/ + +static void wqt_free_netregs(struct wqt_dev *priv) +{ + BUG_ON(priv->netregs == NULL); + BUG_ON(priv->netregs_addr == 0x0); + + dma_free_coherent(priv->dev, + PAGE_SIZE, + priv->netregs, + priv->netregs_addr); + + priv->netregs = NULL; + priv->netregs_addr = 0x0; +} + +static int wqt_init_netregs(struct wqt_dev *priv) +{ + u32 val; + + BUG_ON(priv->netregs != NULL); + BUG_ON(priv->netregs_addr != 0x0); + + /* Check the PCI Inbound Window Attributes Register 0 for a 4k window + * This is PCI BAR1, and will be used as network device registers */ + val = IMMR_R32BE(PIWAR0_OFFSET); + val = val & (PIWAR0_ENABLED | PIWAR0_IWS_4K); + + if (val != (PIWAR0_ENABLED | PIWAR0_IWS_4K)) { + dev_dbg(priv->dev, "PIWAR0 set up incorrectly\n"); + return -ENODEV; + } + + priv->netregs = dma_alloc_coherent(priv->dev, + PAGE_SIZE, + &priv->netregs_addr, + GFP_KERNEL); + + if (!priv->netregs) { + dev_dbg(priv->dev, "Unable to allocate netregs\n"); + return -ENOMEM; + } + + /* Write the page address into the address register */ + IMMR_W32BE(PITAR0_OFFSET, priv->netregs_addr >> 12); + return 0; +} + +/*----------------------------------------------------------------------------*/ +/* OpenFirmware Device Subsystem */ +/*----------------------------------------------------------------------------*/ + +static int wqt_probe(struct of_device *op, const struct of_device_id *match) +{ + struct net_device *ndev; + struct wqt_dev *priv; + int ret; + + ndev = alloc_etherdev(sizeof(*priv)); + + if (!ndev) { + ret = -ENOMEM; + goto out_alloc_ndev; + } + + dev_set_drvdata(&op->dev, ndev); + priv = netdev_priv(ndev); + priv->op = op; + priv->dev = &op->dev; + priv->ndev = ndev; + + spin_lock_init(&priv->irq_lock); + mutex_init(&priv->irq_mutex); + + /* Hardware Initialization */ + priv->irq = irq_of_parse_and_map(op->node, 0); + priv->immr = ioremap(0xe0000000, 0x100000); + + if (!priv->immr) { + ret = -ENOMEM; + goto out_ioremap_immr; + } + + ret = wqt_init_netregs(priv); + + if (ret) + goto out_init_netregs; + + /* NOTE: Yes, this is correct. Everything was written as if this + * NOTE: side *is* a network card. So the place the card is + * NOTE: receiving from is the other side's TX buffers */ + priv->rx_base = priv->netregs + PCINET_TXBD_BASE; + priv->tx_base = priv->netregs + PCINET_RXBD_BASE; + wqtstatus_setbit(priv, PCINET_NET_REGISTERS_VALID); + + /* DMA Client */ + wqtdma_setup_outbound_window(priv); + priv->client.event_callback = dmatest_event; + dma_cap_set(DMA_MEMCPY, priv->client.cap_mask); + dma_async_client_register(&priv->client); + dma_async_client_chan_request(&priv->client); + + /* Initialize private data */ + priv->wq = create_singlethread_workqueue(driver_name); + + if (!priv->wq) { + ret = -ENOMEM; + goto out_create_workqueue; + } + + INIT_WORK(&priv->uart_tx_work, uart_tx_work_fn); + init_waitqueue_head(&priv->uart_tx_wait); + priv->uart_tx_ready = true; + + tasklet_init(&priv->tx_complete_tasklet, wqt_tx_complete, + (unsigned long)ndev); + tasklet_disable(&priv->tx_complete_tasklet); + spin_lock_init(&priv->net_lock); + + mutex_init(&priv->net_mutex); + priv->net_state = NET_STATE_STOPPED; + INIT_WORK(&priv->net_start_work, wqtnet_start_work_fn); + INIT_WORK(&priv->net_stop_work, wqtnet_stop_work_fn); + init_completion(&priv->net_start_completion); + init_completion(&priv->net_stop_completion); + + /* Mask all of the MBOX interrupts */ + IMMR_W32(IMIMR_OFFSET, 0x1 | 0x2); + + /* Network Device */ + random_ether_addr(ndev->dev_addr); + + ndev->open = wqt_open; + ndev->stop = wqt_stop; + ndev->change_mtu = wqt_change_mtu; + ndev->hard_start_xmit = wqt_hard_start_xmit; + ndev->tx_timeout = wqt_tx_timeout; + ndev->watchdog_timeo = HZ/4; + ndev->flags &= ~IFF_MULTICAST; /* No multicast support */ +#ifdef CONFIG_PCINET_DISABLE_CHECKSUM + ndev->features |= NETIF_F_NO_CSUM; /* No checksum needed */ +#endif + ndev->mtu = PH_MAX_MTU; + netif_napi_add(ndev, &priv->napi, wqt_rx_napi, PH_RING_SIZE); + + ret = register_netdev(ndev); + + if (ret) + goto out_register_netdev; + + /* UART Device */ + priv->port.ops = &wqtuart_ops; + priv->port.type = PORT_16550A; + priv->port.dev = &op->dev; + priv->port.line = 0; + spin_lock_init(&priv->port.lock); + + ret = uart_add_one_port(&wqtuart_driver, &priv->port); + + if (ret) + goto out_add_uart_port; + + dev_info(priv->dev, "using ethernet device %s\n", ndev->name); + dev_info(priv->dev, "using serial device %s%d\n", + wqtuart_driver.dev_name, priv->port.line); + return 0; + +out_add_uart_port: + unregister_netdev(ndev); +out_register_netdev: + destroy_workqueue(priv->wq); +out_create_workqueue: + wqt_free_netregs(priv); +out_init_netregs: + iounmap(priv->immr); +out_ioremap_immr: + free_netdev(ndev); +out_alloc_ndev: + return ret; +} + +static int wqt_remove(struct of_device *op) +{ + struct net_device *ndev = dev_get_drvdata(&op->dev); + struct wqt_dev *priv = netdev_priv(ndev); + + uart_remove_one_port(&wqtuart_driver, &priv->port); + unregister_netdev(priv->ndev); + + flush_workqueue(priv->wq); + destroy_workqueue(priv->wq); + + wqtstatus_clrbit(priv, PCINET_NET_REGISTERS_VALID); + wqt_free_netregs(priv); + + dma_async_client_unregister(&priv->client); + + iounmap(priv->immr); + + free_netdev(ndev); + + return 0; +} + +static struct of_device_id wqt_match[] = { + { .compatible = "fsl,mpc8349-mu", }, + {}, +}; + +static struct of_platform_driver wqt_of_driver = { + .owner = THIS_MODULE, + .name = driver_name, + .match_table = wqt_match, + .probe = wqt_probe, + .remove = wqt_remove, +}; + +/*----------------------------------------------------------------------------*/ +/* DMA Client Infrastructure */ +/*----------------------------------------------------------------------------*/ + +/*----------------------------------------------------------------------------*/ +/* Module Init / Exit */ +/*----------------------------------------------------------------------------*/ + +static int __init wqt_init(void) +{ + int ret; + + ret = uart_register_driver(&wqtuart_driver); + + if (ret) + goto out_uart_register_driver; + + ret = of_register_platform_driver(&wqt_of_driver); + + if (ret) + goto out_of_register_platform_driver; + + return 0; + +out_of_register_platform_driver: + uart_unregister_driver(&wqtuart_driver); +out_uart_register_driver: + return ret; +} + +static void __exit wqt_exit(void) +{ + of_unregister_platform_driver(&wqt_of_driver); + uart_unregister_driver(&wqtuart_driver); +} + +MODULE_AUTHOR("Ira W. Snyder <iws@ovro.caltech.edu>"); +MODULE_DESCRIPTION("PCINet/PCISerial Driver for MPC8349EMDS"); +MODULE_LICENSE("GPL"); + +module_init(wqt_init); +module_exit(wqt_exit); diff --git a/drivers/net/pcinet_host.c b/drivers/net/pcinet_host.c new file mode 100644 index 0000000..95af93c --- /dev/null +++ b/drivers/net/pcinet_host.c @@ -0,0 +1,1383 @@ +/* + * PCINet and PCISerial Driver for Freescale MPC8349EMDS (Host side) + * + * Copyright (c) 2008 Ira W. Snyder <iws@ovro.caltech.edu> + * + * Heavily inspired by the drivers/net/fs_enet driver + * + * This file is licensed under the terms of the GNU General Public License + * version 2. This program is licensed "as is" without any warranty of any + * kind, whether express or implied. + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/init.h> +#include <linux/sched.h> +#include <linux/wait.h> +#include <linux/interrupt.h> +#include <linux/irqreturn.h> +#include <linux/pci.h> +#include <linux/serial.h> +#include <linux/serial_core.h> +#include <linux/etherdevice.h> +#include <linux/mutex.h> + +#include "pcinet.h" +#include "pcinet_hw.h" + +static const char driver_name[] = "wqt"; + +static void wqtuart_rx_char(struct uart_port *port, const char ch); +static void wqtuart_stop_tx(struct uart_port *port); + +struct wqt_dev; +typedef void (*wqt_irqhandler_t)(struct wqt_dev *); + +struct wqt_irqhandlers { + wqt_irqhandler_t net_start_req_handler; + wqt_irqhandler_t net_start_ack_handler; + wqt_irqhandler_t net_stop_req_handler; + wqt_irqhandler_t net_stop_ack_handler; + wqt_irqhandler_t net_rx_packet_handler; + wqt_irqhandler_t net_tx_complete_handler; + wqt_irqhandler_t uart_rx_ready_handler; + wqt_irqhandler_t uart_tx_empty_handler; +}; + +struct wqt_dev { + /*--------------------------------------------------------------------*/ + /* PCI Infrastructure */ + /*--------------------------------------------------------------------*/ + struct pci_dev *pdev; + struct device *dev; + void __iomem *immr; + + struct mutex irq_mutex; + int interrupt_count; + + spinlock_t irq_lock; + struct wqt_irqhandlers handlers; + + /*--------------------------------------------------------------------*/ + /* UART Device Infrastructure */ + /*--------------------------------------------------------------------*/ + struct uart_port port; + bool uart_rx_enabled; + bool uart_open; + + struct workqueue_struct *wq; + struct work_struct uart_tx_work; + wait_queue_head_t uart_tx_wait; /* sleep for uart_tx_ready */ + bool uart_tx_ready; /* transmitter state */ + + /*--------------------------------------------------------------------*/ + /* Ethernet Device Infrastructure */ + /*--------------------------------------------------------------------*/ + struct net_device *ndev; + void __iomem *netregs; + + /* Outstanding SKB */ + struct sk_buff *rx_skbs[PH_RING_SIZE]; + struct sk_buff *tx_skbs[PH_RING_SIZE]; + dma_addr_t rx_skb_addrs[PH_RING_SIZE]; + dma_addr_t tx_skb_addrs[PH_RING_SIZE]; + + /* Circular Buffer Descriptor base */ + cbd_t __iomem *rx_base; + cbd_t __iomem *tx_base; + + /* Current SKB index */ + cbd_t __iomem *cur_rx; + cbd_t __iomem *cur_tx; + cbd_t __iomem *dirty_tx; + int tx_free; + + struct tasklet_struct tx_complete_tasklet; + spinlock_t net_lock; + + struct mutex net_mutex; + int net_state; + struct work_struct net_start_work; + struct work_struct net_stop_work; + struct completion net_start_completion; + struct completion net_stop_completion; + struct napi_struct napi; +}; + +/*----------------------------------------------------------------------------*/ +/* Status Register Helper Operations */ +/*----------------------------------------------------------------------------*/ + +static DEFINE_SPINLOCK(status_lock); + +static void wqtstatus_setbit(struct wqt_dev *priv, u32 bit) +{ + unsigned long flags; + + spin_lock_irqsave(&status_lock, flags); + IMMR_W32(IMR1_OFFSET, IMMR_R32(IMR1_OFFSET) | bit); + spin_unlock_irqrestore(&status_lock, flags); +} + +static void wqtstatus_clrbit(struct wqt_dev *priv, u32 bit) +{ + unsigned long flags; + + spin_lock_irqsave(&status_lock, flags); + IMMR_W32(IMR1_OFFSET, IMMR_R32(IMR1_OFFSET) & ~bit); + spin_unlock_irqrestore(&status_lock, flags); +} + +static int wqtstatus_remote_testbit(struct wqt_dev *priv, u32 bit) +{ + return IMMR_R32(OMR1_OFFSET) & bit; +} + +/*----------------------------------------------------------------------------*/ +/* Message Sending and Processing Operations */ +/*----------------------------------------------------------------------------*/ + +static irqreturn_t wqt_interrupt(int irq, void *dev_id) +{ + struct wqt_dev *priv = dev_id; + u32 omisr, odr; + unsigned long flags; + + omisr = IMMR_R32(OMISR_OFFSET); + odr = IMMR_R32(ODR_OFFSET); + + if (!(omisr & 0x8)) + return IRQ_NONE; + + /* Clear all of the interrupt sources, we'll handle them next */ + IMMR_W32(ODR_OFFSET, odr); + + /* Lock over all of the handlers, so they cannot get called when + * the code doesn't expect them to be called */ + spin_lock_irqsave(&priv->irq_lock, flags); + + if (odr & UART_RX_READY_DBELL) + priv->handlers.uart_rx_ready_handler(priv); + + if (odr & UART_TX_EMPTY_DBELL) + priv->handlers.uart_tx_empty_handler(priv); + + if (odr & NET_RX_PACKET_DBELL) + priv->handlers.net_rx_packet_handler(priv); + + if (odr & NET_TX_COMPLETE_DBELL) + priv->handlers.net_tx_complete_handler(priv); + + if (odr & NET_START_REQ_DBELL) + priv->handlers.net_start_req_handler(priv); + + if (odr & NET_START_ACK_DBELL) + priv->handlers.net_start_ack_handler(priv); + + if (odr & NET_STOP_REQ_DBELL) + priv->handlers.net_stop_req_handler(priv); + + if (odr & NET_STOP_ACK_DBELL) + priv->handlers.net_stop_ack_handler(priv); + + spin_unlock_irqrestore(&priv->irq_lock, flags); + + return IRQ_HANDLED; +} + +/* Send a character through the mbox when it becomes available + * Blocking, must not be called with any spinlocks held */ +static int do_send_message(struct wqt_dev *priv, const char ch) +{ + struct uart_port *port = &priv->port; + bool tmp; + unsigned long flags; + + spin_lock_irqsave(&priv->irq_lock, flags); + while (priv->uart_tx_ready != true) { + spin_unlock_irqrestore(&priv->irq_lock, flags); + wait_event_timeout(priv->uart_tx_wait, priv->uart_tx_ready, HZ); + + spin_lock_irqsave(&port->lock, flags); + tmp = priv->uart_open; + spin_unlock_irqrestore(&port->lock, flags); + + if (!tmp) + return -EIO; + + spin_lock_irqsave(&priv->irq_lock, flags); + } + + /* Now the transmitter is free, send the message */ + IMMR_W32(IMR0_OFFSET, ch); + IMMR_W32(IDR_OFFSET, UART_RX_READY_DBELL); + + /* Mark the transmitter busy */ + priv->uart_tx_ready = false; + spin_unlock_irqrestore(&priv->irq_lock, flags); + return 0; +} + +/* Grab a character out of the uart tx buffer and send it */ +static void uart_tx_work_fn(struct work_struct *work) +{ + struct wqt_dev *priv = container_of(work, struct wqt_dev, uart_tx_work); + struct uart_port *port = &priv->port; + struct circ_buf *xmit = &port->info->xmit; + char ch; + unsigned long flags; + + spin_lock_irqsave(&port->lock, flags); + while (true) { + + /* Check for XON/XOFF (high priority) */ + if (port->x_char) { + ch = port->x_char; + port->x_char = 0; + spin_unlock_irqrestore(&port->lock, flags); + + if (do_send_message(priv, ch)) + return; + + spin_lock_irqsave(&port->lock, flags); + continue; + } + + /* If we're out of chars or the port is stopped, we're done */ + if (uart_circ_empty(xmit) || uart_tx_stopped(port)) { + wqtuart_stop_tx(port); + break; + } + + /* Grab the next char out of the buffer and send it */ + ch = xmit->buf[xmit->tail]; + xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1); + spin_unlock_irqrestore(&port->lock, flags); + + if (do_send_message(priv, ch)) + return; + + spin_lock_irqsave(&port->lock, flags); + } + + if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) + uart_write_wakeup(port); + + if (uart_circ_empty(xmit)) + wqtuart_stop_tx(port); + + spin_unlock_irqrestore(&port->lock, flags); +} + +/*----------------------------------------------------------------------------*/ +/* Interrupt Handlers */ +/*----------------------------------------------------------------------------*/ + +/* NOTE: All handlers are called with priv->irq_lock held */ + +static void empty_handler(struct wqt_dev *priv) +{ + /* Intentionally left empty */ +} + +static void net_start_req_handler(struct wqt_dev *priv) +{ + schedule_work(&priv->net_start_work); +} + +static void net_start_ack_handler(struct wqt_dev *priv) +{ + complete(&priv->net_start_completion); +} + +static void net_stop_req_handler(struct wqt_dev *priv) +{ + schedule_work(&priv->net_stop_work); +} + +static void net_stop_ack_handler(struct wqt_dev *priv) +{ + complete(&priv->net_stop_completion); +} + +static void net_tx_complete_handler(struct wqt_dev *priv) +{ + tasklet_schedule(&priv->tx_complete_tasklet); +} + +static void net_rx_packet_handler(struct wqt_dev *priv) +{ + wqtstatus_setbit(priv, PCINET_NET_RXINT_OFF); + netif_rx_schedule(priv->ndev, &priv->napi); +} + +static void uart_rx_ready_handler(struct wqt_dev *priv) +{ + wqtuart_rx_char(&priv->port, IMMR_R32(OMR0_OFFSET) & 0xff); + IMMR_W32(IDR_OFFSET, UART_TX_EMPTY_DBELL); +} + +static void uart_tx_empty_handler(struct wqt_dev *priv) +{ + priv->uart_tx_ready = true; + wake_up(&priv->uart_tx_wait); +} + +/*----------------------------------------------------------------------------*/ +/* Interrupt Request / Free Helpers */ +/*----------------------------------------------------------------------------*/ + +static void do_enable_net_startstop_handlers(struct wqt_dev *priv) +{ + unsigned long flags; + + spin_lock_irqsave(&priv->irq_lock, flags); + priv->handlers.net_start_req_handler = net_start_req_handler; + priv->handlers.net_start_ack_handler = net_start_ack_handler; + priv->handlers.net_stop_req_handler = net_stop_req_handler; + priv->handlers.net_stop_ack_handler = net_stop_ack_handler; + spin_unlock_irqrestore(&priv->irq_lock, flags); + + wqtstatus_setbit(priv, PCINET_NET_STATUS_RUNNING); +} + +static void do_disable_net_startstop_handlers(struct wqt_dev *priv) +{ + unsigned long flags; + + wqtstatus_clrbit(priv, PCINET_NET_STATUS_RUNNING); + + spin_lock_irqsave(&priv->irq_lock, flags); + priv->handlers.net_start_req_handler = empty_handler; + priv->handlers.net_start_ack_handler = empty_handler; + priv->handlers.net_stop_req_handler = empty_handler; + priv->handlers.net_stop_ack_handler = empty_handler; + spin_unlock_irqrestore(&priv->irq_lock, flags); +} + +static void do_enable_net_rxtx_handlers(struct wqt_dev *priv) +{ + unsigned long flags; + + spin_lock_irqsave(&priv->irq_lock, flags); + priv->handlers.net_rx_packet_handler = net_rx_packet_handler; + priv->handlers.net_tx_complete_handler = net_tx_complete_handler; + spin_unlock_irqrestore(&priv->irq_lock, flags); +} + +static void do_disable_net_rxtx_handlers(struct wqt_dev *priv) +{ + unsigned long flags; + + spin_lock_irqsave(&priv->irq_lock, flags); + priv->handlers.net_rx_packet_handler = empty_handler; + priv->handlers.net_tx_complete_handler = empty_handler; + spin_unlock_irqrestore(&priv->irq_lock, flags); +} + +static void do_enable_uart_handlers(struct wqt_dev *priv) +{ + unsigned long flags; + + spin_lock_irqsave(&priv->irq_lock, flags); + priv->handlers.uart_rx_ready_handler = uart_rx_ready_handler; + priv->handlers.uart_tx_empty_handler = uart_tx_empty_handler; + spin_unlock_irqrestore(&priv->irq_lock, flags); +} + +static void do_disable_uart_handlers(struct wqt_dev *priv) +{ + unsigned long flags; + + spin_lock_irqsave(&priv->irq_lock, flags); + priv->handlers.uart_rx_ready_handler = empty_handler; + priv->handlers.uart_tx_empty_handler = empty_handler; + spin_unlock_irqrestore(&priv->irq_lock, flags); +} + +static int wqt_request_irq(struct wqt_dev *priv) +{ + int ret = 0; + + mutex_lock(&priv->irq_mutex); + + if (priv->interrupt_count > 0) + goto out_unlock; + + /* Force all handlers to be disabled before attaching the handler */ + do_disable_net_startstop_handlers(priv); + do_disable_net_rxtx_handlers(priv); + do_disable_uart_handlers(priv); + + ret = request_irq(priv->pdev->irq, + wqt_interrupt, + IRQF_SHARED, + priv->ndev->name, + priv); + +out_unlock: + priv->interrupt_count++; + mutex_unlock(&priv->irq_mutex); + + return ret; +} + +static void wqt_free_irq(struct wqt_dev *priv) +{ + mutex_lock(&priv->irq_mutex); + priv->interrupt_count--; + + if (priv->interrupt_count > 0) + goto out_unlock; + + free_irq(priv->pdev->irq, priv); + +out_unlock: + mutex_unlock(&priv->irq_mutex); +} + +/*----------------------------------------------------------------------------*/ +/* Network Startup and Shutdown Helpers */ +/*----------------------------------------------------------------------------*/ + +/* NOTE: All helper functions prefixed with "do" must be called only from + * process context, with priv->net_mutex held. They are expected to sleep */ + +/* NOTE: queues must be stopped before initializing and uninitializing */ + +static void do_net_initialize_board(struct wqt_dev *priv) +{ + int i; + cbd_t __iomem *bdp; + + BUG_ON(!wqtstatus_remote_testbit(priv, PCINET_NET_REGISTERS_VALID)); + + /* Fill in RX ring */ + for (i = 0, bdp = priv->rx_base; i < PH_RING_SIZE; bdp++, i++) { + CBDW_SC(bdp, BD_MEM_READY); + CBDW_LEN(bdp, PH_MAX_FRSIZE); + CBDW_ADDR(bdp, priv->rx_skb_addrs[i]); + } + + /* Fill in TX ring */ + for (i = 0, bdp = priv->tx_base; i < PH_RING_SIZE; bdp++, i++) { + CBDW_SC(bdp, BD_MEM_READY); + CBDW_LEN(bdp, 0); + CBDW_ADDR(bdp, 0x0); + } +} + +static void do_net_uninitialize_board(struct wqt_dev *priv) +{ + struct sk_buff *skb; + dma_addr_t skb_addr; + cbd_t __iomem *bdp; + int i; + + /* Reset TX ring */ + for (i = 0, bdp = priv->tx_base; i < PH_RING_SIZE; bdp++, i++) { + if (priv->tx_skbs[i]) { + skb = priv->tx_skbs[i]; + skb_addr = priv->tx_skb_addrs[i]; + + dma_unmap_single(priv->dev, + skb_addr, + skb->len, + DMA_TO_DEVICE); + + dev_kfree_skb(skb); + + priv->tx_skbs[i] = NULL; + priv->tx_skb_addrs[i] = 0x0; + } + + CBDW_SC(bdp, BD_MEM_READY); + CBDW_LEN(bdp, 0); + CBDW_ADDR(bdp, 0x0); + } +} + +static void do_net_start_queues(struct wqt_dev *priv) +{ + if (priv->net_state == NET_STATE_RUNNING) + return; + + dev_dbg(priv->dev, "resetting buffer positions\n"); + priv->cur_rx = priv->rx_base; + priv->cur_tx = priv->tx_base; + priv->dirty_tx = priv->tx_base; + priv->tx_free = PH_RING_SIZE; + + dev_dbg(priv->dev, "enabling NAPI queue\n"); + napi_enable(&priv->napi); + + dev_dbg(priv->dev, "enabling tx_complete() tasklet\n"); + tasklet_enable(&priv->tx_complete_tasklet); + + dev_dbg(priv->dev, "enabling TX queue\n"); + netif_start_queue(priv->ndev); + + dev_dbg(priv->dev, "carrier on!\n"); + netif_carrier_on(priv->ndev); + + /* Enable the RX_PACKET and TX_COMPLETE interrupt handlers */ + do_enable_net_rxtx_handlers(priv); + + priv->net_state = NET_STATE_RUNNING; +} + +static void do_net_stop_queues(struct wqt_dev *priv) +{ + if (priv->net_state == NET_STATE_STOPPED) + return; + + /* Disable the RX_PACKET and TX_COMPLETE interrupt handlers */ + do_disable_net_rxtx_handlers(priv); + + dev_dbg(priv->dev, "disabling NAPI queue\n"); + napi_disable(&priv->napi); + + dev_dbg(priv->dev, "disabling tx_complete() tasklet\n"); + tasklet_disable(&priv->tx_complete_tasklet); + + dev_dbg(priv->dev, "disabling TX queue\n"); + netif_tx_disable(priv->ndev); + + dev_dbg(priv->dev, "carrier off!\n"); + netif_carrier_off(priv->ndev); + + priv->net_state = NET_STATE_STOPPED; +} + +/* Called when we get a request to start our queues and acknowledge */ +static void wqtnet_start_work_fn(struct work_struct *work) +{ + struct wqt_dev *priv = container_of(work, struct wqt_dev, + net_start_work); + + mutex_lock(&priv->net_mutex); + + do_net_initialize_board(priv); + do_net_start_queues(priv); + IMMR_W32(IDR_OFFSET, NET_START_ACK_DBELL); + + mutex_unlock(&priv->net_mutex); +} + +/* Called when we get a request to stop our queues and acknowledge */ +static void wqtnet_stop_work_fn(struct work_struct *work) +{ + struct wqt_dev *priv = container_of(work, struct wqt_dev, + net_stop_work); + + mutex_lock(&priv->net_mutex); + + do_net_stop_queues(priv); + do_net_uninitialize_board(priv); + IMMR_W32(IDR_OFFSET, NET_STOP_ACK_DBELL); + + mutex_unlock(&priv->net_mutex); +} + +/*----------------------------------------------------------------------------*/ +/* SKB Allocation Helpers */ +/*----------------------------------------------------------------------------*/ + +static void wqt_cleanup_skbs(struct wqt_dev *priv) +{ + struct sk_buff *skb; + dma_addr_t skb_addr; + int i; + + /* TX ring */ + for (i = 0; i < PH_RING_SIZE; ++i) { + if (priv->tx_skbs[i]) { + skb = priv->tx_skbs[i]; + skb_addr = priv->tx_skb_addrs[i]; + + dma_unmap_single(priv->dev, + skb_addr, + skb->len, + DMA_TO_DEVICE); + + dev_kfree_skb(skb); + + priv->tx_skbs[i] = NULL; + priv->tx_skb_addrs[i] = 0x0; + } + } + + /* RX ring */ + for (i = 0; i < PH_RING_SIZE; ++i) { + if (priv->rx_skbs[i]) { + skb = priv->rx_skbs[i]; + skb_addr = priv->rx_skb_addrs[i]; + + dma_unmap_single(priv->dev, + skb_addr, + PH_MAX_FRSIZE, + DMA_FROM_DEVICE); + + dev_kfree_skb(skb); + + priv->rx_skbs[i] = NULL; + priv->rx_skb_addrs[i] = 0x0; + } + } +} + +static int wqt_alloc_skbs(struct wqt_dev *priv) +{ + struct sk_buff *skb; + dma_addr_t skb_addr; + int i; + + /* RX ring */ + for (i = 0; i < PH_RING_SIZE; ++i) { + /* Paranoia check */ + BUG_ON(priv->rx_skbs[i] != NULL); + BUG_ON(priv->rx_skb_addrs[i] != 0x0); + + /* Allocate the skb */ + skb = dev_alloc_skb(PH_MAX_FRSIZE + NET_IP_ALIGN); + + if (skb == NULL) + goto out_err; + + skb_reserve(skb, NET_IP_ALIGN); + + /* DMA map the skb */ + skb_addr = dma_map_single(priv->dev, + skb->data, + PH_MAX_FRSIZE, + DMA_FROM_DEVICE); + + if (dma_mapping_error(skb_addr)) { + dev_kfree_skb(skb); + goto out_err; + } + + priv->rx_skbs[i] = skb; + priv->rx_skb_addrs[i] = skb_addr; + } + + /* TX ring */ + for (i = 0; i < PH_RING_SIZE; ++i) { + /* Paranoia check */ + BUG_ON(priv->tx_skbs[i] != NULL); + BUG_ON(priv->tx_skb_addrs[i] != 0x0); + } + + /* NOTE: the actual initialization of the board happens + * NOTE: in ph_initialize_board(), once the board has + * NOTE: requested to be initialized */ + + return 0; + +out_err: + wqt_cleanup_skbs(priv); + return -ENOMEM; +} + +/*----------------------------------------------------------------------------*/ +/* Network Device Operations */ +/*----------------------------------------------------------------------------*/ + +static int wqt_open(struct net_device *dev) +{ + struct wqt_dev *priv = netdev_priv(dev); + int ret; + + /* Check that the other side has registers */ + if (!wqtstatus_remote_testbit(priv, PCINET_NET_REGISTERS_VALID)) { + dev_err(priv->dev, "no driver installed at the other end\n"); + dev_err(priv->dev, "cowardly refusing to open\n"); + return -ENOTCONN; /* Transport endpoint is not connected */ + } + + /* Pretend the cable is unplugged until we are up and running */ + netif_carrier_off(dev); + + mutex_lock(&priv->net_mutex); + + ret = wqt_alloc_skbs(priv); + + if (ret) + goto out_err; + + do_net_initialize_board(priv); + + ret = wqt_request_irq(priv); + + if (ret) + goto out_err; + + /* Enable only the network start/stop interrupts */ + do_enable_net_startstop_handlers(priv); + + /* Check if the other side is running, if not, it will start us. + * Without the interrupt handler installed, there's no way it can + * respond to us anyway */ + if (!wqtstatus_remote_testbit(priv, PCINET_NET_STATUS_RUNNING)) + goto out_unlock; + + do_net_initialize_board(priv); + + IMMR_W32(IDR_OFFSET, NET_START_REQ_DBELL); + ret = wait_for_completion_timeout(&priv->net_start_completion, 5*HZ); + + if (!ret) { + /* Our start request timed out, therefore, the other + * side will start us when it comes back up */ + dev_dbg(priv->dev, "start timed out\n"); + } else { + do_net_start_queues(priv); + ret = 0; + } + +out_unlock: + mutex_unlock(&priv->net_mutex); + return 0; + +out_err: + wqt_cleanup_skbs(priv); + mutex_unlock(&priv->net_mutex); + return ret; +} + +static int wqt_stop(struct net_device *dev) +{ + struct wqt_dev *priv = netdev_priv(dev); + int ret; + + mutex_lock(&priv->net_mutex); + + do_net_stop_queues(priv); + + IMMR_W32(IDR_OFFSET, NET_STOP_REQ_DBELL); + ret = wait_for_completion_timeout(&priv->net_stop_completion, 5*HZ); + + if (!ret) + dev_warn(priv->dev, "other side did not stop on time!\n"); + else + ret = 0; + + do_disable_net_startstop_handlers(priv); + wqt_free_irq(priv); + do_net_uninitialize_board(priv); + wqt_cleanup_skbs(priv); + + mutex_unlock(&priv->net_mutex); + return 0; +} + +static int wqt_change_mtu(struct net_device *dev, int new_mtu) +{ + if ((new_mtu < 68) || (new_mtu > PH_MAX_MTU)) + return -EINVAL; + + dev->mtu = new_mtu; + return 0; +} + +static int wqt_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct wqt_dev *priv = netdev_priv(dev); + dma_addr_t skb_addr; + cbd_t __iomem *bdp; + int dirty_idx; + + spin_lock_bh(&priv->net_lock); + + bdp = priv->cur_tx; + dirty_idx = bdp - priv->tx_base; + + /* This should not happen, the queue should be stopped */ + if (priv->tx_free == 0 || CBDR_SC(bdp) != BD_MEM_READY) { + netif_stop_queue(dev); + spin_unlock_bh(&priv->net_lock); + return NETDEV_TX_BUSY; + } + + skb_addr = dma_map_single(priv->dev, + skb->data, + skb->len, + DMA_TO_DEVICE); + + if (dma_mapping_error(skb_addr)) { + dev_warn(priv->dev, "DMA mapping error\n"); + spin_unlock_bh(&priv->net_lock); + return -ENOMEM; + } + + BUG_ON(priv->tx_skbs[dirty_idx] != NULL); + BUG_ON(priv->tx_skb_addrs[dirty_idx] != 0x0); + + priv->tx_skbs[dirty_idx] = skb; + priv->tx_skb_addrs[dirty_idx] = skb_addr; + + CBDW_LEN(bdp, skb->len); + CBDW_ADDR(bdp, skb_addr); + CBDW_SC(bdp, BD_MEM_DIRTY); + + if (dirty_idx == PH_RING_SIZE - 1) + bdp = priv->tx_base; + else + bdp++; + + priv->cur_tx = bdp; + priv->tx_free--; + dev->trans_start = jiffies; + + if (priv->tx_free == 0) + netif_stop_queue(dev); + + if (!wqtstatus_remote_testbit(priv, PCINET_NET_RXINT_OFF)) + IMMR_W32(IDR_OFFSET, NET_RX_PACKET_DBELL); + + spin_unlock_bh(&priv->net_lock); + return NETDEV_TX_OK; +} + +static void wqt_tx_timeout(struct net_device *dev) +{ + struct wqt_dev *priv = netdev_priv(dev); + + dev->stats.tx_errors++; + IMMR_W32(IDR_OFFSET, NET_RX_PACKET_DBELL); +} + +static void wqt_tx_complete(unsigned long data) +{ + struct net_device *dev = (struct net_device *)data; + struct wqt_dev *priv = netdev_priv(dev); + struct sk_buff *skb; + dma_addr_t skb_addr; + cbd_t __iomem *bdp; + int do_wake, dirty_idx; + + spin_lock_bh(&priv->net_lock); + + bdp = priv->dirty_tx; + do_wake = 0; + + while (CBDR_SC(bdp) == BD_MEM_FREE) { + dirty_idx = bdp - priv->tx_base; + + skb = priv->tx_skbs[dirty_idx]; + skb_addr = priv->tx_skb_addrs[dirty_idx]; + + BUG_ON(skb == NULL); + BUG_ON(skb_addr == 0x0); + + dev->stats.tx_bytes += skb->len; + dev->stats.tx_packets++; + + /* Unmap and free the transmitted skb */ + dma_unmap_single(priv->dev, + skb_addr, + skb->len, + DMA_TO_DEVICE); + dev_kfree_skb_irq(skb); + + priv->tx_skbs[dirty_idx] = NULL; + priv->tx_skb_addrs[dirty_idx] = 0x0; + + /* Invalidate the buffer descriptor */ + CBDW_LEN(bdp, 0); + CBDW_ADDR(bdp, 0x0); + CBDW_SC(bdp, BD_MEM_READY); + + /* Update the bdp */ + if (dirty_idx == PH_RING_SIZE - 1) + bdp = priv->tx_base; + else + bdp++; + + if (!priv->tx_free++) + do_wake = 1; + } + + priv->dirty_tx = bdp; + + spin_unlock_bh(&priv->net_lock); + + if (do_wake) + netif_wake_queue(dev); +} + +static int wqt_rx_napi(struct napi_struct *napi, int budget) +{ + struct wqt_dev *priv = container_of(napi, struct wqt_dev, napi); + struct net_device *dev = priv->ndev; + int received = 0; + struct sk_buff *skb, *skbn; + dma_addr_t skb_addr, skbn_addr; + int pkt_len, dirty_idx; + cbd_t __iomem *bdp; + + bdp = priv->cur_rx; + + while (CBDR_SC(bdp) == BD_MEM_DIRTY) { + dirty_idx = bdp - priv->rx_base; + + skb = priv->rx_skbs[dirty_idx]; + skb_addr = priv->rx_skb_addrs[dirty_idx]; + + BUG_ON(skb == NULL); + BUG_ON(skb_addr == 0x0); + + /* Allocate the next rx skb and DMA map it */ + skbn = dev_alloc_skb(PH_MAX_FRSIZE + NET_IP_ALIGN); + + if (skbn == NULL) { + skbn = skb; + skbn_addr = skb_addr; + dev->stats.rx_dropped++; + goto out_noalloc; + } + + skb_reserve(skbn, NET_IP_ALIGN); + + skbn_addr = dma_map_single(priv->dev, + skbn->data, + PH_MAX_FRSIZE, + DMA_FROM_DEVICE); + + if (dma_mapping_error(skbn_addr)) { + dev_kfree_skb_irq(skbn); + skbn = skb; + skbn_addr = skb_addr; + dev->stats.rx_dropped++; + goto out_noalloc; + } + + /* DMA unmap the old skb and pass it up */ + dma_unmap_single(priv->dev, + skb_addr, + PH_MAX_FRSIZE, + DMA_FROM_DEVICE); + + pkt_len = CBDR_LEN(bdp); + skb_put(skb, pkt_len); + skb->protocol = eth_type_trans(skb, dev); +#ifdef CONFIG_PCINET_DISABLE_CHECKSUM + skb->ip_summed = CHECKSUM_UNNECESSARY; +#else + skb->ip_summed = CHECKSUM_NONE; +#endif + netif_receive_skb(skb); + received++; + dev->stats.rx_bytes += pkt_len; + dev->stats.rx_packets++; + +out_noalloc: + /* Write the new skb into the buffer descriptor */ + CBDW_LEN(bdp, PH_MAX_FRSIZE); + CBDW_ADDR(bdp, skbn_addr); + CBDW_SC(bdp, BD_MEM_FREE); + + priv->rx_skbs[dirty_idx] = skbn; + priv->rx_skb_addrs[dirty_idx] = skbn_addr; + + /* Update the bdp */ + if (dirty_idx == PH_RING_SIZE - 1) + bdp = priv->rx_base; + else + bdp++; + + if (received >= budget) + break; + } + + priv->cur_rx = bdp; + + /* We have processed all packets that the adapter had, but it + * was less than our budget, stop polling */ + if (received < budget) { + netif_rx_complete(dev, napi); + wqtstatus_clrbit(priv, PCINET_NET_RXINT_OFF); + } + + IMMR_W32(IDR_OFFSET, NET_TX_COMPLETE_DBELL); + + return received; +} + +/*----------------------------------------------------------------------------*/ +/* UART Device Operations */ +/*----------------------------------------------------------------------------*/ + +static unsigned int wqtuart_tx_empty(struct uart_port *port) +{ + return TIOCSER_TEMT; +} + +static void wqtuart_set_mctrl(struct uart_port *port, unsigned int mctrl) +{ +} + +static unsigned int wqtuart_get_mctrl(struct uart_port *port) +{ + return TIOCM_CAR | TIOCM_DSR | TIOCM_CTS; +} + +static void wqtuart_stop_tx(struct uart_port *port) +{ +} + +static void wqtuart_start_tx(struct uart_port *port) +{ + struct wqt_dev *priv = container_of(port, struct wqt_dev, port); + + queue_work(priv->wq, &priv->uart_tx_work); +} + +static void wqtuart_stop_rx(struct uart_port *port) +{ + struct wqt_dev *priv = container_of(port, struct wqt_dev, port); + + do_disable_uart_handlers(priv); + priv->uart_rx_enabled = false; + wqtstatus_clrbit(priv, PCINET_UART_RX_ENABLED); +} + +static void wqtuart_enable_ms(struct uart_port *port) +{ +} + +static void wqtuart_break_ctl(struct uart_port *port, int break_state) +{ +} + +static int wqtuart_startup(struct uart_port *port) +{ + struct wqt_dev *priv = container_of(port, struct wqt_dev, port); + int ret; + + ret = wqt_request_irq(priv); + + if (ret) + return ret; + + do_enable_uart_handlers(priv); + + /* Mark the transmitter and receiver ready */ + priv->uart_tx_ready = true; + priv->uart_rx_enabled = true; + wqtstatus_setbit(priv, PCINET_UART_RX_ENABLED); + + /* Let the other side know that we are ready to receive chars now */ + IMMR_W32(IDR_OFFSET, UART_TX_EMPTY_DBELL); + priv->uart_open = true; + return 0; +} + +static void wqtuart_shutdown(struct uart_port *port) +{ + struct wqt_dev *priv = container_of(port, struct wqt_dev, port); + + wqt_free_irq(priv); + + /* Make sure the uart_tx_work_fn() exits cleanly */ + priv->uart_open = false; + wake_up(&priv->uart_tx_wait); +} + +static void wqtuart_set_termios(struct uart_port *port, + struct ktermios *termios, + struct ktermios *old) +{ +} + +static const char *wqtuart_type(struct uart_port *port) +{ + return "WQTUART"; +} + +static int wqtuart_request_port(struct uart_port *port) +{ + return 0; +} + +static void wqtuart_config_port(struct uart_port *port, int flags) +{ +} + +static void wqtuart_release_port(struct uart_port *port) +{ +} + +static int wqtuart_verify_port(struct uart_port *port, + struct serial_struct *ser) +{ + return 0; +} + +static void wqtuart_rx_char(struct uart_port *port, const char ch) +{ + struct wqt_dev *priv = container_of(port, struct wqt_dev, port); + struct tty_struct *tty; + unsigned long flags; + + spin_lock_irqsave(&port->lock, flags); + + if (priv->uart_rx_enabled) { + tty = port->info->port.tty; + tty_insert_flip_char(tty, ch, TTY_NORMAL); + tty_flip_buffer_push(tty); + } + + spin_unlock_irqrestore(&port->lock, flags); +} + +static struct uart_ops wqtuart_ops = { + .tx_empty = wqtuart_tx_empty, + .set_mctrl = wqtuart_set_mctrl, + .get_mctrl = wqtuart_get_mctrl, + .stop_tx = wqtuart_stop_tx, + .start_tx = wqtuart_start_tx, + .stop_rx = wqtuart_stop_rx, + .enable_ms = wqtuart_enable_ms, + .break_ctl = wqtuart_break_ctl, + .startup = wqtuart_startup, + .shutdown = wqtuart_shutdown, + .set_termios = wqtuart_set_termios, + .type = wqtuart_type, + .release_port = wqtuart_release_port, + .request_port = wqtuart_request_port, + .config_port = wqtuart_config_port, + .verify_port = wqtuart_verify_port, +}; + +static struct uart_driver wqtuart_driver = { + .owner = THIS_MODULE, + .driver_name = driver_name, + .dev_name = "ttyPCI", + .major = 240, + .minor = 0, + .nr = 1, +}; + +/*----------------------------------------------------------------------------*/ +/* PCI Subsystem */ +/*----------------------------------------------------------------------------*/ + +static int wqt_probe(struct pci_dev *dev, const struct pci_device_id *id) +{ + struct net_device *ndev; + struct wqt_dev *priv; + int ret; + + ndev = alloc_etherdev(sizeof(*priv)); + + if (!ndev) { + ret = -ENOMEM; + goto out_alloc_ndev; + } + + pci_set_drvdata(dev, ndev); + priv = netdev_priv(ndev); + priv->pdev = dev; + priv->dev = &dev->dev; + priv->ndev = ndev; + + mutex_init(&priv->irq_mutex); + spin_lock_init(&priv->irq_lock); + + /* Hardware Initialization */ + ret = pci_enable_device(dev); + + if (ret) + goto out_pci_enable_dev; + + pci_set_master(dev); + + ret = pci_request_regions(dev, driver_name); + + if (ret) + goto out_pci_request_regions; + + priv->immr = pci_iomap(dev, 0, 0); + + if (!priv->immr) { + ret = -ENOMEM; + goto out_iomap_immr; + } + + priv->netregs = pci_iomap(dev, 1, 0); + + if (!priv->netregs) { + ret = -ENOMEM; + goto out_iomap_netregs; + } + + priv->rx_base = priv->netregs + PCINET_RXBD_BASE; + priv->tx_base = priv->netregs + PCINET_TXBD_BASE; + + ret = dma_set_mask(&dev->dev, 0xcfffffff); + + if (ret) { + dev_err(&dev->dev, "Unable to set DMA mask\n"); + ret = -ENODEV; + goto out_set_dma_mask; + } + + /* Initialize private data */ + priv->wq = create_singlethread_workqueue(driver_name); + + if (!priv->wq) { + ret = -ENOMEM; + goto out_create_workqueue; + } + + INIT_WORK(&priv->uart_tx_work, uart_tx_work_fn); + init_waitqueue_head(&priv->uart_tx_wait); + priv->uart_tx_ready = true; + + tasklet_init(&priv->tx_complete_tasklet, wqt_tx_complete, + (unsigned long)ndev); + tasklet_disable(&priv->tx_complete_tasklet); + spin_lock_init(&priv->net_lock); + + mutex_init(&priv->net_mutex); + priv->net_state = NET_STATE_STOPPED; + INIT_WORK(&priv->net_start_work, wqtnet_start_work_fn); + INIT_WORK(&priv->net_stop_work, wqtnet_stop_work_fn); + init_completion(&priv->net_start_completion); + init_completion(&priv->net_stop_completion); + + /* Mask all of the MBOX interrupts */ + IMMR_W32(OMIMR_OFFSET, 0x1 | 0x2); + + /* Network Device */ + random_ether_addr(ndev->dev_addr); + + ndev->open = wqt_open; + ndev->stop = wqt_stop; + ndev->change_mtu = wqt_change_mtu; + ndev->hard_start_xmit = wqt_hard_start_xmit; + ndev->tx_timeout = wqt_tx_timeout; + ndev->watchdog_timeo = HZ/4; + ndev->flags &= ~IFF_MULTICAST; /* No multicast support */ +#ifdef CONFIG_PCINET_DISABLE_CHECKSUM + ndev->features |= NETIF_F_NO_CSUM; /* No checksum needed */ +#endif + ndev->mtu = PH_MAX_MTU; + netif_napi_add(ndev, &priv->napi, wqt_rx_napi, PH_RING_SIZE); + + ret = register_netdev(ndev); + + if (ret) + goto out_register_netdev; + + /* UART Device */ + priv->port.ops = &wqtuart_ops; + priv->port.type = PORT_16550A; + priv->port.dev = &dev->dev; + priv->port.line = 0; + spin_lock_init(&priv->port.lock); + + ret = uart_add_one_port(&wqtuart_driver, &priv->port); + + if (ret) + goto out_add_uart_port; + + dev_info(priv->dev, "using ethernet device %s\n", ndev->name); + dev_info(priv->dev, "using serial device %s%d\n", + wqtuart_driver.dev_name, priv->port.line); + return 0; + +out_add_uart_port: + unregister_netdev(ndev); +out_register_netdev: + destroy_workqueue(priv->wq); +out_create_workqueue: +out_set_dma_mask: + pci_iounmap(dev, priv->netregs); +out_iomap_netregs: + pci_iounmap(dev, priv->immr); +out_iomap_immr: + pci_release_regions(dev); +out_pci_request_regions: + pci_disable_device(dev); +out_pci_enable_dev: + free_netdev(ndev); +out_alloc_ndev: + return ret; +} + +static void wqt_remove(struct pci_dev *dev) +{ + struct net_device *ndev = pci_get_drvdata(dev); + struct wqt_dev *priv = netdev_priv(ndev); + + uart_remove_one_port(&wqtuart_driver, &priv->port); + unregister_netdev(priv->ndev); + + flush_workqueue(priv->wq); + destroy_workqueue(priv->wq); + + pci_iounmap(dev, priv->netregs); + pci_iounmap(dev, priv->immr); + pci_release_regions(dev); + pci_disable_device(dev); + + free_netdev(ndev); +} + +#define PCI_DEVID_FSL_MPC8349EMDS 0x0080 + +/* The list of devices that this module will support */ +static struct pci_device_id wqt_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, PCI_DEVID_FSL_MPC8349EMDS), }, + { 0, } +}; +MODULE_DEVICE_TABLE(pci, wqt_ids); + +static struct pci_driver wqt_pci_driver = { + .name = (char *)driver_name, + .id_table = wqt_ids, + .probe = wqt_probe, + .remove = wqt_remove, +}; + +/*----------------------------------------------------------------------------*/ +/* Module Init / Exit */ +/*----------------------------------------------------------------------------*/ + +static int __init wqt_init(void) +{ + int ret; + + ret = uart_register_driver(&wqtuart_driver); + + if (ret) + goto out_uart_register_driver; + + ret = pci_register_driver(&wqt_pci_driver); + + if (ret) + goto out_pci_register_driver; + + return 0; + +out_pci_register_driver: + uart_unregister_driver(&wqtuart_driver); +out_uart_register_driver: + return ret; +} + +static void __exit wqt_exit(void) +{ + pci_unregister_driver(&wqt_pci_driver); + uart_unregister_driver(&wqtuart_driver); +} + +MODULE_AUTHOR("Ira W. Snyder <iws@ovro.caltech.edu>"); +MODULE_DESCRIPTION("PCINet/PCISerial Driver for MPC8349EMDS (Host side)"); +MODULE_LICENSE("GPL"); + +module_init(wqt_init); +module_exit(wqt_exit); diff --git a/drivers/net/pcinet_hw.h b/drivers/net/pcinet_hw.h new file mode 100644 index 0000000..499ba61 --- /dev/null +++ b/drivers/net/pcinet_hw.h @@ -0,0 +1,77 @@ +/* + * Register offsets for the MPC8349EMDS Message Unit from the IMMR base address + * + * Copyright (c) 2008 Ira W. Snyder <iws@ovro.caltech.edu> + * + * This file is licensed under the terms of the GNU General Public License + * version 2. This program is licensed "as is" without any warranty of any + * kind, whether express or implied. + */ + +#ifndef PCINET_HW_H +#define PCINET_HW_H + +/* mpc8349emds message unit register offsets */ +#define OMISR_OFFSET 0x8030 +#define OMIMR_OFFSET 0x8034 +#define IMR0_OFFSET 0x8050 +#define IMR1_OFFSET 0x8054 +#define OMR0_OFFSET 0x8058 +#define OMR1_OFFSET 0x805C +#define ODR_OFFSET 0x8060 +#define IDR_OFFSET 0x8068 +#define IMISR_OFFSET 0x8080 +#define IMIMR_OFFSET 0x8084 + + +/* mpc8349emds pci and local access window register offsets */ +#define LAWAR0_OFFSET 0x0064 +#define LAWAR0_ENABLE (1<<31) + +#define POCMR0_OFFSET 0x8410 +#define POCMR0_ENABLE (1<<31) + +#define POTAR0_OFFSET 0x8400 + +#define LAWAR1_OFFSET 0x006c +#define LAWAR1_ENABLE (1<<31) + +#define POCMR1_OFFSET 0x8428 +#define POCMR1_ENABLE (1<<31) + +#define POTAR1_OFFSET 0x8418 + + +/* mpc8349emds dma controller register offsets */ +#define DMAMR0_OFFSET 0x8100 +#define DMASR0_OFFSET 0x8104 +#define DMASAR0_OFFSET 0x8110 +#define DMADAR0_OFFSET 0x8118 +#define DMABCR0_OFFSET 0x8120 + +#define DMA_CHANNEL_BUSY (1<<2) + +#define DMA_DIRECT_MODE_SNOOP (1<<20) +#define DMA_CHANNEL_MODE_DIRECT (1<<2) +#define DMA_CHANNEL_START (1<<0) + + +/* mpc8349emds pci and local access window register offsets */ +#define LAWAR0_OFFSET 0x0064 +#define LAWAR0_ENABLE (1<<31) + +#define POCMR0_OFFSET 0x8410 +#define POCMR0_ENABLE (1<<31) + +#define POTAR0_OFFSET 0x8400 + + +/* mpc8349emds pci and inbound window register offsets */ +#define PITAR0_OFFSET 0x8568 +#define PIWAR0_OFFSET 0x8578 + +#define PIWAR0_ENABLED (1<<31) +#define PIWAR0_PREFETCH (1<<29) +#define PIWAR0_IWS_4K 0xb + +#endif /* PCINET_HW_H */
This adds support to Linux for a virtual ethernet interface which uses the PCI bus as its transport mechanism. It creates a simple, familiar, and fast method of communication for two devices connected by a PCI interface. I have implemented client support for the Freescale MPC8349EMDS board, which is capable of running in PCI Agent mode (It acts like a PCI card, but is a complete PowerPC computer, running Linux). It is almost certainly trivially ported to any MPC83xx system. It should be a relatively small effort to port it to any chip that can generate PCI interrupts and has at least one PCI accessible scratch register. It was developed to work in a CompactPCI crate of computers, one of which is a relatively standard x86 system (acting as the host) and many PowerPC systems (acting as clients). RFC v1 -> RFC v2: * remove vim modelines * use net_device->name in request_irq(), for irqbalance * remove unnecessary wqt_get_stats(), use default get_stats() instead * use dev_printk() and friends * add message unit to MPC8349EMDS dts file Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> --- This is the third posting of this driver. I got some feedback, and have corrected the problems. Stephen, thanks for the review! I also got some off-list feedback from a potential user, so I believe this is worth getting into mainline. I'll post up a revised version about once a week as long as the changes are minor. If they are more substantial, I'll post them as needed. GregKH: is this worth putting into the staging tree? (I left you out of the CC list to keep your email traffic down) The remaining issues I see in this driver: 1) ==== Naming ==== The name wqt originally stood for "workqueue-test" and somewhat evolved over time into the current driver. I'm looking for suggestions for a better name. It should be the same between the host and client drivers, to make porting the code between them easier. The drivers are /very/ similar other than the setup code. 2) ==== IMMR Usage ==== In the Freescale client driver, I use the whole set of board control registers (AKA IMMR registers). I only need a very small subset of them, during startup to set up the DMA window. I used the full set of registers so that I could share the register offsets between the drivers (in pcinet_hw.h) 3) ==== ioremap() of a physical address ==== In the Freescale client driver, I called ioremap() with the physical address of the IMMR registers, 0xe0000000. I don't know a better way to get them. They are somewhat exposed in the Flat Device Tree. Suggestions on how to extract them are welcome. 4) ==== Hardcoded DMA Window Address ==== In the Freescale client driver, I just hardcoded the address of the outbound PCI window into the DMA transfer code. It is 0x80000000. Suggestions on how to get this value at runtime are welcome. Rationale behind some decisions: 1) ==== Usage of the PCINET_NET_REGISTERS_VALID bit ==== I want to be able to use this driver from U-Boot to tftp a kernel over the PCI backplane, and then boot up the board. This means that the device descriptor memory, which lives in the client RAM, becomes invalid during boot. 2) ==== Buffer Descriptors in client memory ==== I chose to put the buffer descriptors in client memory rather than host memory. It seemed more logical to me at the time. In my application, I'll have 19 boards + 1 host per cPCI chassis. The client -> host direction will see most of the traffic, and so I thought I would cut down on the number of PCI accesses needed. I'm willing to change this. 3) ==== Usage of client DMA controller for all data transfer ==== This was done purely for speed. I tried using the CPU to transfer all data, and it is very slow: ~3MB/sec. Using the DMA controller gets me to ~40MB/sec (as tested with netperf). 4) ==== Static 1GB DMA window ==== Maintaining a window while DMA's in flight, and then changing it seemed too complicated. Also, testing showed that using a static window gave me a ~10MB/sec speedup compared to moving the window for each skb. 5) ==== The serial driver ==== Yes, there are two essentially separate drivers here. I needed a method to communicate with the U-Boot bootloader on these boards without plugging in a serial cable. With 19 clients + 1 host per chassis, the cable clutter is worth avoiding. Since everything is connected via the PCI bus anyway, I used that. A virtual serial port was simple to implement using the messaging unit hardware that I used for the network driver. I'll post both U-Boot drivers to their mailing list once this driver is finalized. Thanks, Ira arch/powerpc/boot/dts/mpc834x_mds.dts | 7 + drivers/net/Kconfig | 34 + drivers/net/Makefile | 3 + drivers/net/pcinet.h | 75 ++ drivers/net/pcinet_fsl.c | 1351 ++++++++++++++++++++++++++++++++ drivers/net/pcinet_host.c | 1383 +++++++++++++++++++++++++++++++++ drivers/net/pcinet_hw.h | 77 ++ 7 files changed, 2930 insertions(+), 0 deletions(-) create mode 100644 drivers/net/pcinet.h create mode 100644 drivers/net/pcinet_fsl.c create mode 100644 drivers/net/pcinet_host.c create mode 100644 drivers/net/pcinet_hw.h