diff mbox series

can: don't count arbitration lose as an error

Message ID 20201127095941.21609-1-jhofstee@victronenergy.com
State Superseded
Headers show
Series can: don't count arbitration lose as an error | expand

Commit Message

Jeroen Hofstee Nov. 27, 2020, 9:59 a.m. UTC
Losing arbitration is normal in a CAN-bus network, it means that a
higher priority frame is being send and the pending message will be
retried later. Hence most driver only increment arbitration_lost, but
the sja1000 and sun4i driver also incremeant tx_error, causing errors
to be reported on a normal functioning CAN-bus. So stop counting them
as errors.

For completeness, the Kvaser USB hybra also increments the tx_error
on arbitration lose, but it does so in single shot. Since in that
case the message is not retried, that behaviour is kept.

Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com>
---
 drivers/net/can/sja1000/sja1000.c | 1 -
 drivers/net/can/sun4i_can.c       | 1 -
 2 files changed, 2 deletions(-)

Comments

Marc Kleine-Budde Nov. 27, 2020, 10:30 a.m. UTC | #1
On 11/27/20 10:59 AM, Jeroen Hofstee wrote:
> Losing arbitration is normal in a CAN-bus network, it means that a
> higher priority frame is being send and the pending message will be
> retried later. Hence most driver only increment arbitration_lost, but
> the sja1000 and sun4i driver also incremeant tx_error, causing errors
> to be reported on a normal functioning CAN-bus. So stop counting them
> as errors.

Sounds plausible.

> For completeness, the Kvaser USB hybra also increments the tx_error
> on arbitration lose, but it does so in single shot. Since in that
> case the message is not retried, that behaviour is kept.

You mean only in one shot mode? What about one shot mode on the sja1000 cores?

> Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com>

I've split this into two patches, and added Fixes: lines, and pushed this for
now to linux-can/sja1000.

https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git/log/?h=sja1000

regards,
Marc
Jeroen Hofstee Nov. 27, 2020, 11:09 a.m. UTC | #2
Hi,

On 11/27/20 11:30 AM, Marc Kleine-Budde wrote:
> On 11/27/20 10:59 AM, Jeroen Hofstee wrote:
>> Losing arbitration is normal in a CAN-bus network, it means that a
>> higher priority frame is being send and the pending message will be
>> retried later. Hence most driver only increment arbitration_lost, but
>> the sja1000 and sun4i driver also incremeant tx_error, causing errors
>> to be reported on a normal functioning CAN-bus. So stop counting them
>> as errors.
> Sounds plausible.
>
>> For completeness, the Kvaser USB hybra also increments the tx_error
>> on arbitration lose, but it does so in single shot. Since in that
>> case the message is not retried, that behaviour is kept.
> You mean only in one shot mode?

Yes, well at least the function is called kvaser_usb_hydra_one_shot_fail.


>   What about one shot mode on the sja1000 cores?


That is a good question. I guess it will be counted as error by:

         if (isrc & IRQ_TI) {
             /* transmission buffer released */
             if (priv->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT &&
                 !(status & SR_TCS)) {
                 stats->tx_errors++;
                 can_free_echo_skb(dev, 0);
             } else {
                 /* transmission complete */
                 stats->tx_bytes +=
                     priv->read_reg(priv, SJA1000_FI) & 0xf;
                 stats->tx_packets++;
                 can_get_echo_skb(dev, 0);
             }
             netif_wake_queue(dev);
             can_led_event(dev, CAN_LED_EVENT_TX);
         }

 From the datasheet, Transmit Interrupt:

"set; this bit is set whenever the transmit bufferstatus
changes from ‘0-to-1’ (released) and the TIE bit is set
within the interrupt enable register".

I cannot test it though, since I don't have a sja1000.

>
>> Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com>
> I've split this into two patches, and added Fixes: lines, and pushed this for
> now to linux-can/sja1000.
>
Thanks, regards,

Jeroen
Jeroen Hofstee Nov. 27, 2020, 4:02 p.m. UTC | #3
Hello Marc,

[...]

>
>
>>   What about one shot mode on the sja1000 cores?
>
>
> That is a good question. I guess it will be counted as error by:
>
>         if (isrc & IRQ_TI) {
>             /* transmission buffer released */
>             if (priv->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT &&
>                 !(status & SR_TCS)) {
>                 stats->tx_errors++;
>                 can_free_echo_skb(dev, 0);
>             } else {
>                 /* transmission complete */
>                 stats->tx_bytes +=
>                     priv->read_reg(priv, SJA1000_FI) & 0xf;
>                 stats->tx_packets++;
>                 can_get_echo_skb(dev, 0);
>             }
>             netif_wake_queue(dev);
>             can_led_event(dev, CAN_LED_EVENT_TX);
>         }
>
>

I just realized this is likely the case. If it wasn't called,
can_free_echo_skb won't be called, causing a memory
leak. I guess someone would have noticed that.

Regards,

Jeroen
Oliver Hartkopp Nov. 28, 2020, 5:23 p.m. UTC | #4
On 27.11.20 12:09, Jeroen Hofstee wrote:
> Hi,
> 
> On 11/27/20 11:30 AM, Marc Kleine-Budde wrote:
>> On 11/27/20 10:59 AM, Jeroen Hofstee wrote:
>>> Losing arbitration is normal in a CAN-bus network, it means that a
>>> higher priority frame is being send and the pending message will be
>>> retried later. Hence most driver only increment arbitration_lost, but
>>> the sja1000 and sun4i driver also incremeant tx_error, causing errors
>>> to be reported on a normal functioning CAN-bus. So stop counting them
>>> as errors.
>> Sounds plausible.
>>
>>> For completeness, the Kvaser USB hybra also increments the tx_error
>>> on arbitration lose, but it does so in single shot. Since in that
>>> case the message is not retried, that behaviour is kept.
>> You mean only in one shot mode?
> 
> Yes, well at least the function is called kvaser_usb_hydra_one_shot_fail.
> 
> 
>>   What about one shot mode on the sja1000 cores?
> 
> 
> That is a good question. I guess it will be counted as error by:
> 
>          if (isrc & IRQ_TI) {
>              /* transmission buffer released */
>              if (priv->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT &&
>                  !(status & SR_TCS)) {
>                  stats->tx_errors++;
>                  can_free_echo_skb(dev, 0);
>              } else {
>                  /* transmission complete */
>                  stats->tx_bytes +=
>                      priv->read_reg(priv, SJA1000_FI) & 0xf;
>                  stats->tx_packets++;
>                  can_get_echo_skb(dev, 0);
>              }
>              netif_wake_queue(dev);
>              can_led_event(dev, CAN_LED_EVENT_TX);
>          }
> 
>  From the datasheet, Transmit Interrupt:
> 
> "set; this bit is set whenever the transmit bufferstatus
> changes from ‘0-to-1’ (released) and the TIE bit is set
> within the interrupt enable register".
> 
> I cannot test it though, since I don't have a sja1000.

I have a PCAN-ExpressCard 34 here, which should make it in a test setup 
as it acts as a PCI attached SJA1000.

Will take a look at that arbitration lost behaviour on Monday. A really 
interesting detail!

Best,
Oliver

> 
>>
>>> Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com>
>> I've split this into two patches, and added Fixes: lines, and pushed 
>> this for
>> now to linux-can/sja1000.
>>
> Thanks, regards,
> 
> Jeroen
> 
>
Jeroen Hofstee Nov. 29, 2020, 3:52 p.m. UTC | #5
Hello Oliver,

On 11/28/20 6:23 PM, Oliver Hartkopp wrote:
>
>
> On 27.11.20 12:09, Jeroen Hofstee wrote:
>> Hi,
>>
>> On 11/27/20 11:30 AM, Marc Kleine-Budde wrote:
>>> On 11/27/20 10:59 AM, Jeroen Hofstee wrote:
>>>> Losing arbitration is normal in a CAN-bus network, it means that a
>>>> higher priority frame is being send and the pending message will be
>>>> retried later. Hence most driver only increment arbitration_lost, but
>>>> the sja1000 and sun4i driver also incremeant tx_error, causing errors
>>>> to be reported on a normal functioning CAN-bus. So stop counting them
>>>> as errors.
>>> Sounds plausible.
>>>
>>>> For completeness, the Kvaser USB hybra also increments the tx_error
>>>> on arbitration lose, but it does so in single shot. Since in that
>>>> case the message is not retried, that behaviour is kept.
>>> You mean only in one shot mode?
>>
>> Yes, well at least the function is called 
>> kvaser_usb_hydra_one_shot_fail.
>>
>>
>>>   What about one shot mode on the sja1000 cores?
>>
>>
>> That is a good question. I guess it will be counted as error by:
>>
>>          if (isrc & IRQ_TI) {
>>              /* transmission buffer released */
>>              if (priv->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT &&
>>                  !(status & SR_TCS)) {
>>                  stats->tx_errors++;
>>                  can_free_echo_skb(dev, 0);
>>              } else {
>>                  /* transmission complete */
>>                  stats->tx_bytes +=
>>                      priv->read_reg(priv, SJA1000_FI) & 0xf;
>>                  stats->tx_packets++;
>>                  can_get_echo_skb(dev, 0);
>>              }
>>              netif_wake_queue(dev);
>>              can_led_event(dev, CAN_LED_EVENT_TX);
>>          }
>>
>>  From the datasheet, Transmit Interrupt:
>>
>> "set; this bit is set whenever the transmit bufferstatus
>> changes from ‘0-to-1’ (released) and the TIE bit is set
>> within the interrupt enable register".
>>
>> I cannot test it though, since I don't have a sja1000.
>
> I have a PCAN-ExpressCard 34 here, which should make it in a test 
> setup as it acts as a PCI attached SJA1000.
>
> Will take a look at that arbitration lost behaviour on Monday. A 
> really interesting detail!
>
>

Thanks, appreciated. I would be surprised if this path is not taken though.

With kind regards,

Jeroen
Marc Kleine-Budde Nov. 30, 2020, 11:50 a.m. UTC | #6
On 11/27/20 11:30 AM, Marc Kleine-Budde wrote:
> On 11/27/20 10:59 AM, Jeroen Hofstee wrote:
>> Losing arbitration is normal in a CAN-bus network, it means that a
>> higher priority frame is being send and the pending message will be
>> retried later. Hence most driver only increment arbitration_lost, but
>> the sja1000 and sun4i driver also incremeant tx_error, causing errors
>> to be reported on a normal functioning CAN-bus. So stop counting them
>> as errors.
> 
> Sounds plausible.
> 
>> For completeness, the Kvaser USB hybra also increments the tx_error
>> on arbitration lose, but it does so in single shot. Since in that
>> case the message is not retried, that behaviour is kept.
> 
> You mean only in one shot mode? What about one shot mode on the sja1000 cores?
> 
>> Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com>
> 
> I've split this into two patches, and added Fixes: lines, and pushed this for
> now to linux-can/sja1000.

Added to linux-can/testing

Tnx,
Marc
Oliver Hartkopp Dec. 2, 2020, 2:35 p.m. UTC | #7
Hi Jeroen,

On 27.11.20 12:09, Jeroen Hofstee wrote:
> On 11/27/20 11:30 AM, Marc Kleine-Budde wrote:
>> On 11/27/20 10:59 AM, Jeroen Hofstee wrote:
>>> Losing arbitration is normal in a CAN-bus network, it means that a
>>> higher priority frame is being send and the pending message will be
>>> retried later. Hence most driver only increment arbitration_lost, but
>>> the sja1000 and sun4i driver also incremeant tx_error, causing errors
>>> to be reported on a normal functioning CAN-bus. So stop counting them
>>> as errors.
>> Sounds plausible.
>>
>>> For completeness, the Kvaser USB hybra also increments the tx_error
>>> on arbitration lose, but it does so in single shot. Since in that
>>> case the message is not retried, that behaviour is kept.
>> You mean only in one shot mode?
> 
> Yes, well at least the function is called kvaser_usb_hydra_one_shot_fail.
> 
> 
>>   What about one shot mode on the sja1000 cores?
> 
> 
> That is a good question. I guess it will be counted as error by:
> 
>          if (isrc & IRQ_TI) {
>              /* transmission buffer released */
>              if (priv->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT &&
>                  !(status & SR_TCS)) {
>                  stats->tx_errors++;

I can confirm with CAN_CTRLMODE_ONE_SHOT active and the patch ("can: 
sja1000: sja1000_err(): don't count arbitration lose as an error")

https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git/commit/?h=testing&id=bd0ccb92efb09c7da5b55162b283b42a93539ed7

I now get ONE(!) increment for tx_errors and ONE increment in the 
arbitration-lost counter.

Before the above patch I had TWO tx_errors for each arbitration lost case.

>                  can_free_echo_skb(dev, 0);
>              } else {
>                  /* transmission complete */
>                  stats->tx_bytes +=
>                      priv->read_reg(priv, SJA1000_FI) & 0xf;
>                  stats->tx_packets++;
>                  can_get_echo_skb(dev, 0);
>              }
>              netif_wake_queue(dev);
>              can_led_event(dev, CAN_LED_EVENT_TX);
>          }
> 
>  From the datasheet, Transmit Interrupt:
> 
> "set; this bit is set whenever the transmit bufferstatus
> changes from ‘0-to-1’ (released) and the TIE bit is set
> within the interrupt enable register".
> 
> I cannot test it though, since I don't have a sja1000.

Do we agree that in one-shot mode both the tx_errors and the 
arbitration_lost counters are increased in the arbitration-lost case?

At least this would fit to the Kvaser USB behaviour.

And btw. I wondered if we should remove the check for 
CAN_CTRLMODE_ONE_SHOT here, as we ALWAYS should count a tx_error and 
drop the echo_skb when we have a TX-interrupt and TX-complete flag is zero.

So replace:

if (priv->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT &&
                   !(status & SR_TCS)) {

with:

if (!(status & SR_TCS)) {

Any suggestions?

Best regards,
Oliver
Jeroen Hofstee Dec. 2, 2020, 3:37 p.m. UTC | #8
Hello Oliver,

On 12/2/20 3:35 PM, Oliver Hartkopp wrote:
>
>
> On 27.11.20 12:09, Jeroen Hofstee wrote:
>> On 11/27/20 11:30 AM, Marc Kleine-Budde wrote:
>>> On 11/27/20 10:59 AM, Jeroen Hofstee wrote:
>>>> Losing arbitration is normal in a CAN-bus network, it means that a
>>>> higher priority frame is being send and the pending message will be
>>>> retried later. Hence most driver only increment arbitration_lost, but
>>>> the sja1000 and sun4i driver also incremeant tx_error, causing errors
>>>> to be reported on a normal functioning CAN-bus. So stop counting them
>>>> as errors.
>>> Sounds plausible.
>>>
>>>> For completeness, the Kvaser USB hybra also increments the tx_error
>>>> on arbitration lose, but it does so in single shot. Since in that
>>>> case the message is not retried, that behaviour is kept.
>>> You mean only in one shot mode?
>>
>> Yes, well at least the function is called 
>> kvaser_usb_hydra_one_shot_fail.
>>
>>
>>>   What about one shot mode on the sja1000 cores?
>>
>>
>> That is a good question. I guess it will be counted as error by:
>>
>>          if (isrc & IRQ_TI) {
>>              /* transmission buffer released */
>>              if (priv->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT &&
>>                  !(status & SR_TCS)) {
>>                  stats->tx_errors++;
>
> I can confirm with CAN_CTRLMODE_ONE_SHOT active and the patch ("can: 
> sja1000: sja1000_err(): don't count arbitration lose as an error")
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git/commit/?h=testing&id=bd0ccb92efb09c7da5b55162b283b42a93539ed7 
>
>
> I now get ONE(!) increment for tx_errors and ONE increment in the 
> arbitration-lost counter.
>
> Before the above patch I had TWO tx_errors for each arbitration lost case.


Good, thanks for checking!

>
>>                  can_free_echo_skb(dev, 0);
>>              } else {
>>                  /* transmission complete */
>>                  stats->tx_bytes +=
>>                      priv->read_reg(priv, SJA1000_FI) & 0xf;
>>                  stats->tx_packets++;
>>                  can_get_echo_skb(dev, 0);
>>              }
>>              netif_wake_queue(dev);
>>              can_led_event(dev, CAN_LED_EVENT_TX);
>>          }
>>
>>  From the datasheet, Transmit Interrupt:
>>
>> "set; this bit is set whenever the transmit bufferstatus
>> changes from ‘0-to-1’ (released) and the TIE bit is set
>> within the interrupt enable register".
>>
>> I cannot test it though, since I don't have a sja1000.
>
> Do we agree that in one-shot mode both the tx_errors and the 
> arbitration_lost counters are increased in the arbitration-lost case?
>
> At least this would fit to the Kvaser USB behaviour.


I have no opinion about that. I just kept existing behavior.


>
> And btw. I wondered if we should remove the check for 
> CAN_CTRLMODE_ONE_SHOT here, as we ALWAYS should count a tx_error and 
> drop the echo_skb when we have a TX-interrupt and TX-complete flag is 
> zero.
>
> So replace:
>
> if (priv->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT &&
>                   !(status & SR_TCS)) {
>
> with:
>
> if (!(status & SR_TCS)) {
>
> Any suggestions?
>

In theory, yes. But I can't think of a reason you would end
up there without CAN_CTRLMODE_ONE_SHOT being set.
Aborting the current transmission in non single shot mode
will get you there and incorrectly report the message as
transmitted, but that is not implemented afaik.

Regards,

Jeroen
Oliver Hartkopp Dec. 2, 2020, 4:22 p.m. UTC | #9
Hello Jeroen,

On 02.12.20 16:37, Jeroen Hofstee wrote:
> On 12/2/20 3:35 PM, Oliver Hartkopp wrote:

>> Do we agree that in one-shot mode both the tx_errors and the 
>> arbitration_lost counters are increased in the arbitration-lost case?
>>
>> At least this would fit to the Kvaser USB behaviour.
> 
> 
> I have no opinion about that. I just kept existing behavior.

That's ok for me either.

>> And btw. I wondered if we should remove the check for 
>> CAN_CTRLMODE_ONE_SHOT here, as we ALWAYS should count a tx_error and 
>> drop the echo_skb when we have a TX-interrupt and TX-complete flag is 
>> zero.
>>
>> So replace:
>>
>> if (priv->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT &&
>>                   !(status & SR_TCS)) {
>>
>> with:
>>
>> if (!(status & SR_TCS)) {
>>
>> Any suggestions?
>>
> 
> In theory, yes. But I can't think of a reason you would end
> up there without CAN_CTRLMODE_ONE_SHOT being set.

Right. Me too. But for that reason I would remove that extra check to 
catch this error even if CAN_CTRLMODE_ONE_SHOT is not enabled.

> Aborting the current transmission in non single shot mode
> will get you there and incorrectly report the message as
> transmitted, but that is not implemented afaik.

Ahem, no. If you get there the echo_skb is deleted and the tx_errors 
counter is increased. Just as it should be.

Regards,
Oliver
Jeroen Hofstee Dec. 2, 2020, 4:31 p.m. UTC | #10
Hello Oliver,

On 12/2/20 5:22 PM, Oliver Hartkopp wrote:
> [...]
>> Aborting the current transmission in non single shot mode
>> will get you there and incorrectly report the message as
>> transmitted, but that is not implemented afaik.
>
> Ahem, no. If you get there the echo_skb is deleted and the tx_errors 
> counter is increased. Just as it should be.
>
>
>

Ahem, yes. I meant how it is, so it will take the else
without CAN_CTRLMODE_ONE_SHOT being set.
But since that is not implemented, it is not relevant.

Regards,

Jeroen
diff mbox series

Patch

diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
index 9f107798f904..25a4d7d0b349 100644
--- a/drivers/net/can/sja1000/sja1000.c
+++ b/drivers/net/can/sja1000/sja1000.c
@@ -474,7 +474,6 @@  static int sja1000_err(struct net_device *dev, uint8_t isrc, uint8_t status)
 		netdev_dbg(dev, "arbitration lost interrupt\n");
 		alc = priv->read_reg(priv, SJA1000_ALC);
 		priv->can.can_stats.arbitration_lost++;
-		stats->tx_errors++;
 		cf->can_id |= CAN_ERR_LOSTARB;
 		cf->data[0] = alc & 0x1f;
 	}
diff --git a/drivers/net/can/sun4i_can.c b/drivers/net/can/sun4i_can.c
index e2c6cf4b2228..b3f2f4fe5ee0 100644
--- a/drivers/net/can/sun4i_can.c
+++ b/drivers/net/can/sun4i_can.c
@@ -604,7 +604,6 @@  static int sun4i_can_err(struct net_device *dev, u8 isrc, u8 status)
 		netdev_dbg(dev, "arbitration lost interrupt\n");
 		alc = readl(priv->base + SUN4I_REG_STA_ADDR);
 		priv->can.can_stats.arbitration_lost++;
-		stats->tx_errors++;
 		if (likely(skb)) {
 			cf->can_id |= CAN_ERR_LOSTARB;
 			cf->data[0] = (alc >> 8) & 0x1f;