Message ID | 20190128174547.twhwv64y2k5xx5x7@linutronix.de |
---|---|
State | RFC |
Delegated to: | David Miller |
Headers | show |
Series | [RFC] net: dp83640: expire old TX-skb | expand |
On Mon, Jan 28, 2019 at 06:45:47PM +0100, Sebastian Andrzej Siewior wrote: > During sendmsg() a cloned skb is saved via dp83640_txtstamp() in > ->tx_queue. After the NIC sends this packet, the PHY will reply with a > timestamp for that TX packet. If the cable is pulled at the right time I > don't see that packet. It might gets flushed as part of queue shutdown > on NIC's side. > Once the link is up again then after the next sendmsg() we enqueue > another skb in dp83640_txtstamp() and have two on the list. Then the PHY > will send a reply and decode_txts() attaches it to the first skb on the > list. > No crash occurs since refcounting works but we are one packet behind. > linuxptp/ptp4l usually closes the socket and opens a new one (in such a > timeout case) so those "stale" replies never get there. However it does > not resume normal operation anymore. Thanks for the detailed explanation. This sounds like a really rare bug, but maybe you guys were able to trigger it reliably? > Purge old skbs in decode_txts(). It is too bad that the Tx timestamp from the HW doesn't provide matching fields. Using the timeout is probably the best that we can do. > Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Order: signed-off goes before reviewed-by. > --- > drivers/net/phy/dp83640.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c > index 1dc043d0bc875..a50e0680a0322 100644 > --- a/drivers/net/phy/dp83640.c > +++ b/drivers/net/phy/dp83640.c > @@ -920,13 +920,13 @@ static void decode_txts(struct dp83640_private *dp83640, > { > struct skb_shared_hwtstamps shhwtstamps; > struct sk_buff *skb; > + struct dp83640_skb_info *skb_info; Reverse Christmas tree please, > u64 ns; > u8 overflow; and fix ^^^ while you are at it. > > /* We must already have the skb that triggered this. */ > - > +again: > skb = skb_dequeue(&dp83640->tx_queue); > - > if (!skb) { > pr_debug("have timestamp but tx_queue empty\n"); > return; > @@ -941,6 +941,11 @@ static void decode_txts(struct dp83640_private *dp83640, > } > return; > } > + skb_info = (struct dp83640_skb_info *)skb->cb; > + if (time_after(jiffies, skb_info->tmo)) { > + kfree_skb(skb); > + goto again; > + } > > ns = phy2txts(phy_txts); > memset(&shhwtstamps, 0, sizeof(shhwtstamps)); > @@ -1489,6 +1494,7 @@ static void dp83640_txtstamp(struct phy_device *phydev, > struct sk_buff *skb, int type) > { > struct dp83640_private *dp83640 = phydev->priv; > + struct dp83640_skb_info *skb_info = (struct dp83640_skb_info *)skb->cb; Reverse Christmas tree. Thanks, Richard > > switch (dp83640->hwts_tx_en) { > > @@ -1500,6 +1506,7 @@ static void dp83640_txtstamp(struct phy_device *phydev, > /* fall through */ > case HWTSTAMP_TX_ON: > skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; > + skb_info->tmo = jiffies + SKB_TIMESTAMP_TIMEOUT; > skb_queue_tail(&dp83640->tx_queue, skb); > break; > > -- > 2.20.1 >
On 2019-01-30 20:26:06 [-0800], Richard Cochran wrote: > Thanks for the detailed explanation. This sounds like a really rare > bug, but maybe you guys were able to trigger it reliably? More or less reliably, yes. I had two switches (for the uplink of the PHY) and a third for my regular network during testing. I had one combination which triggered the issue after no longer than the fifth disconnect and another one which happily survived _a lot_ longer. > > Purge old skbs in decode_txts(). > > It is too bad that the Tx timestamp from the HW doesn't provide > matching fields. Using the timeout is probably the best that we can > do. yeah. I've been looking around and it seems there is nothing. And since that packet from the PHY is sent very seen usually shortly after the transmit I went for something similar that is already done the RX path. Sebastian
diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c index 1dc043d0bc875..a50e0680a0322 100644 --- a/drivers/net/phy/dp83640.c +++ b/drivers/net/phy/dp83640.c @@ -920,13 +920,13 @@ static void decode_txts(struct dp83640_private *dp83640, { struct skb_shared_hwtstamps shhwtstamps; struct sk_buff *skb; + struct dp83640_skb_info *skb_info; u64 ns; u8 overflow; /* We must already have the skb that triggered this. */ - +again: skb = skb_dequeue(&dp83640->tx_queue); - if (!skb) { pr_debug("have timestamp but tx_queue empty\n"); return; @@ -941,6 +941,11 @@ static void decode_txts(struct dp83640_private *dp83640, } return; } + skb_info = (struct dp83640_skb_info *)skb->cb; + if (time_after(jiffies, skb_info->tmo)) { + kfree_skb(skb); + goto again; + } ns = phy2txts(phy_txts); memset(&shhwtstamps, 0, sizeof(shhwtstamps)); @@ -1489,6 +1494,7 @@ static void dp83640_txtstamp(struct phy_device *phydev, struct sk_buff *skb, int type) { struct dp83640_private *dp83640 = phydev->priv; + struct dp83640_skb_info *skb_info = (struct dp83640_skb_info *)skb->cb; switch (dp83640->hwts_tx_en) { @@ -1500,6 +1506,7 @@ static void dp83640_txtstamp(struct phy_device *phydev, /* fall through */ case HWTSTAMP_TX_ON: skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; + skb_info->tmo = jiffies + SKB_TIMESTAMP_TIMEOUT; skb_queue_tail(&dp83640->tx_queue, skb); break;