diff mbox

[net-next,v2,2/4] can: cc770: add legacy ISA bus driver for the CC770 and AN82527

Message ID 4EE5C824.2050704@grandegger.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Wolfgang Grandegger Dec. 12, 2011, 9:23 a.m. UTC
Hi Wolfgang,

On 12/11/2011 07:33 PM, Wolfgang Zarre wrote:
> Hello Wolfgang,
>> On 12/07/2011 02:42 PM, Wolfgang Grandegger wrote:
>>> Hi Wolfgang,
>>>
>>> On 12/06/2011 10:08 PM, Wolfgang Zarre wrote:
...
>>>> Let me know if You need more or some other tests.
>>>
>>> You could provoke some state changes or bus-off conditions to see if the
>>> berr-counter shows reasonable results. I'm currently consolidating and
>>> unifying error state and bus-off handling. Would be nice if you could do
>>> some further tests when I have the patches ready...
>>
>> I just pushed the mentioned modifications to the "devel" branch of my
>> "wg-linux-can-next" [1] repository. You can get it as shown below:
>>
>>    $ git clone --reference=<some-recent-net-next-tree>  \
>>        git://gitorious.org/~wgrandegger/linux-can/wg-linux-can-next.git
>>    $ git checkout -b devel devel
>>
>> [1] https://gitorious.org/~wgrandegger/linux-can/wg-linux-can-next
>>
>> Wolfgang.
> 
> OK, I was trying so far and You will find below the results.
> Just FYI the states on the PLC side couldn't be verified because the
> function
> provided by the manufacturer is not working at all and CAN analyser was not
> available.
> 
> We are running CANopen and therefore the PLC will send automatically a
> heartbeat.
> 
> I produced the bus-off state through a short circuit between L/H which was
> working as expected.
> 
> A bit odd was that on the second try I had to reload the module
> because a ip down/up was not enough.

Oops, not good.

> Let me know if You would need further tests or different procedure.

The state changes are reported via error messages, which you can list
with "candump -td -e any,0:0,#FFFFFFFF" with the attached patch.

> Producing L/H short circuit for 2 seconds
> dmesg:
> [  885.409058] cc770_isa cc770_isa.0: can0: status interrupt (0x5b)
> [  885.420475] cc770_isa cc770_isa.0: can0: status interrupt (0xc5)
> [  885.420496] cc770_isa cc770_isa.0: can0: bus-off
> 
> ip -d -s link show can0
> 4: can0: <NO-CARRIER,NOARP,UP,ECHO> mtu 16 qdisc pfifo_fast state DOWN
> qlen 10
>     link/can
>     can state BUS-OFF (berr-counter tx 92 rx 103) restart-ms 0
>     bitrate 500000 sample-point 0.875
>     tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>     cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>     clock 8000000
>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>     0          0          0          1          0          1
>     RX: bytes  packets  errors  dropped overrun mcast
>     544        382      0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     30         29       0       0       0       0
> 
> Sending and receiving stops.
> 
> Trying to recover on PC:
> ip link set can0 down;
> ip -d -s link show can0
> 4: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>     link/can
>     can state STOPPED (berr-counter tx 92 rx 103) restart-ms 0
>     bitrate 500000 sample-point 0.875
>     tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>     cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>     clock 8000000
>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>     0          0          0          1          0          1
>     RX: bytes  packets  errors  dropped overrun mcast
>     544        382      0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     30         29       0       1       0       0
> 
> ip link set can0 up type can bitrate 500000;
> dmesg:
> [ 1090.937778] cc770_isa cc770_isa.0: can0: setting BTR0=0x00 BTR1=0x1c
> [ 1090.937869] cc770_isa cc770_isa.0: can0: Message object 15 for RX
> data, RTR, SFF and EFF
> [ 1090.937885] cc770_isa cc770_isa.0: can0: Message object 11 for TX
> data, RTR, SFF and EFF
> [ 1090.938050] ADDRCONF(NETDEV_CHANGE): can0: link becomes ready
> [ 1090.940769] cc770_isa cc770_isa.0: can0: status interrupt (0x5)
> 
> ip -d -s link show can0
> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP qlen 10
>     link/can
>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>     bitrate 500000 sample-point 0.875
>     tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>     cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>     clock 8000000
>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>     0          0          0          1          0          1
>     RX: bytes  packets  errors  dropped overrun mcast
>     552        383      0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     30         29       0       1       0       0
> 
> PLC in unknown state but not sending heartbeat,
> Rebooting PLC

Hm, does it work if you do the bus-off recovery manually with?

  # ip link set can0 up type can restart

... or automatically with?

  # ip link set can0 up type can restart-ms 5000

Anyway, rebooting/reloading should never be necessary. I will check on
my i82572.

> -----------------------------------------
> Disconnecting cable for around 4 seconds:
> 
> dmesg:
> [ 2339.660283] cc770_isa cc770_isa.0: can0: status interrupt (0x5b)
> 
> ip -d -s link show can0
> 6: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
> qlen 10
>     link/can
>     can state ERROR-WARNING (berr-counter tx 128 rx 128) restart-ms 0
>     bitrate 500000 sample-point 0.875
>     tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>     cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>     clock 8000000
>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>     0          0          0          1          0          0
>     RX: bytes  packets  errors  dropped overrun mcast
>     459        298      0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     193        192      0       0       0       0

TX and RX berr-counter are >= 128. I wonder why error passive was not
reached.

> Connecting again:
> ip -d -s link show can0
> 6: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
> qlen 10
>     link/can
>     can state ERROR-WARNING (berr-counter tx 120 rx 0) restart-ms 0
>     bitrate 500000 sample-point 0.875
>     tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>     cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>     clock 8000000
>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>     0          0          0          1          0          0
>     RX: bytes  packets  errors  dropped overrun mcast
>     473        311      0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     200        200      0       0       0       0
> 
> After some time (around 125 seconds):
> dmesg:
> [ 2387.172008] cc770_isa cc770_isa.0: can0: status interrupt (0x18)
> ip -d -s link show can0
> 6: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
> qlen 10
>     link/can
>     can state ERROR-ACTIVE (berr-counter tx 29 rx 0) restart-ms 0
>     bitrate 500000 sample-point 0.875
>     tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>     cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>     clock 8000000
>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>     0          0          0          1          0          0
>     RX: bytes  packets  errors  dropped overrun mcast
>     616        447      0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     291        291      0       0       0       0

OK, the state is back to error active (counter < 96).

Thanks for testing...

Wolfgang.

Comments

Wolfgang Zarre Dec. 12, 2011, 11:18 a.m. UTC | #1
Hello Wolfgang,
> Hi Wolfgang,
>
> On 12/11/2011 07:33 PM, Wolfgang Zarre wrote:
>> Hello Wolfgang,
>>> On 12/07/2011 02:42 PM, Wolfgang Grandegger wrote:
>>>> Hi Wolfgang,
>>>>
>>>> On 12/06/2011 10:08 PM, Wolfgang Zarre wrote:
> ...
>>>>> Let me know if You need more or some other tests.
>>>>
>>>> You could provoke some state changes or bus-off conditions to see if the
>>>> berr-counter shows reasonable results. I'm currently consolidating and
>>>> unifying error state and bus-off handling. Would be nice if you could do
>>>> some further tests when I have the patches ready...
>>>
>>> I just pushed the mentioned modifications to the "devel" branch of my
>>> "wg-linux-can-next" [1] repository. You can get it as shown below:
>>>
>>>     $ git clone --reference=<some-recent-net-next-tree>   \
>>>         git://gitorious.org/~wgrandegger/linux-can/wg-linux-can-next.git
>>>     $ git checkout -b devel devel
>>>
>>> [1] https://gitorious.org/~wgrandegger/linux-can/wg-linux-can-next
>>>
>>> Wolfgang.
>>
>> OK, I was trying so far and You will find below the results.
>> Just FYI the states on the PLC side couldn't be verified because the
>> function
>> provided by the manufacturer is not working at all and CAN analyser was not
>> available.
>>
>> We are running CANopen and therefore the PLC will send automatically a
>> heartbeat.
>>
>> I produced the bus-off state through a short circuit between L/H which was
>> working as expected.
>>
>> A bit odd was that on the second try I had to reload the module
>> because a ip down/up was not enough.
>
> Oops, not good.
>

But might be in connection with the strange behaviour of the PLC.

>> Let me know if You would need further tests or different procedure.
>
> The state changes are reported via error messages, which you can list
> with "candump -td -e any,0:0,#FFFFFFFF" with the attached patch.
>

Thanks, I'll try this with the next series of tests.

>> Producing L/H short circuit for 2 seconds
>> dmesg:
>> [  885.409058] cc770_isa cc770_isa.0: can0: status interrupt (0x5b)
>> [  885.420475] cc770_isa cc770_isa.0: can0: status interrupt (0xc5)
>> [  885.420496] cc770_isa cc770_isa.0: can0: bus-off
>>
>> ip -d -s link show can0
>> 4: can0:<NO-CARRIER,NOARP,UP,ECHO>  mtu 16 qdisc pfifo_fast state DOWN
>> qlen 10
>>      link/can
>>      can state BUS-OFF (berr-counter tx 92 rx 103) restart-ms 0
>>      bitrate 500000 sample-point 0.875
>>      tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>>      cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>>      clock 8000000
>>      re-started bus-errors arbit-lost error-warn error-pass bus-off
>>      0          0          0          1          0          1
>>      RX: bytes  packets  errors  dropped overrun mcast
>>      544        382      0       0       0       0
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      30         29       0       0       0       0
>>
>> Sending and receiving stops.
>>
>> Trying to recover on PC:
>> ip link set can0 down;
>> ip -d -s link show can0
>> 4: can0:<NOARP,ECHO>  mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>      link/can
>>      can state STOPPED (berr-counter tx 92 rx 103) restart-ms 0
>>      bitrate 500000 sample-point 0.875
>>      tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>>      cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>>      clock 8000000
>>      re-started bus-errors arbit-lost error-warn error-pass bus-off
>>      0          0          0          1          0          1
>>      RX: bytes  packets  errors  dropped overrun mcast
>>      544        382      0       0       0       0
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      30         29       0       1       0       0
>>
>> ip link set can0 up type can bitrate 500000;
>> dmesg:
>> [ 1090.937778] cc770_isa cc770_isa.0: can0: setting BTR0=0x00 BTR1=0x1c
>> [ 1090.937869] cc770_isa cc770_isa.0: can0: Message object 15 for RX
>> data, RTR, SFF and EFF
>> [ 1090.937885] cc770_isa cc770_isa.0: can0: Message object 11 for TX
>> data, RTR, SFF and EFF
>> [ 1090.938050] ADDRCONF(NETDEV_CHANGE): can0: link becomes ready
>> [ 1090.940769] cc770_isa cc770_isa.0: can0: status interrupt (0x5)
>>
>> ip -d -s link show can0
>> 4: can0:<NOARP,UP,LOWER_UP,ECHO>  mtu 16 qdisc pfifo_fast state UP qlen 10
>>      link/can
>>      can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>>      bitrate 500000 sample-point 0.875
>>      tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>>      cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>>      clock 8000000
>>      re-started bus-errors arbit-lost error-warn error-pass bus-off
>>      0          0          0          1          0          1
>>      RX: bytes  packets  errors  dropped overrun mcast
>>      552        383      0       0       0       0
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      30         29       0       1       0       0
>>
>> PLC in unknown state but not sending heartbeat,
>> Rebooting PLC
>
> Hm, does it work if you do the bus-off recovery manually with?
>
>    # ip link set can0 up type can restart
>
> ... or automatically with?
>
>    # ip link set can0 up type can restart-ms 5000

Ah, ok, good point, will try out as well with the next series of tests

>
> Anyway, rebooting/reloading should never be necessary. I will check on
> my i82572.
>
>> -----------------------------------------
>> Disconnecting cable for around 4 seconds:
>>
>> dmesg:
>> [ 2339.660283] cc770_isa cc770_isa.0: can0: status interrupt (0x5b)
>>
>> ip -d -s link show can0
>> 6: can0:<NOARP,UP,LOWER_UP,ECHO>  mtu 16 qdisc pfifo_fast state UNKNOWN
>> qlen 10
>>      link/can
>>      can state ERROR-WARNING (berr-counter tx 128 rx 128) restart-ms 0
>>      bitrate 500000 sample-point 0.875
>>      tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>>      cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>>      clock 8000000
>>      re-started bus-errors arbit-lost error-warn error-pass bus-off
>>      0          0          0          1          0          0
>>      RX: bytes  packets  errors  dropped overrun mcast
>>      459        298      0       0       0       0
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      193        192      0       0       0       0
>
> TX and RX berr-counter are>= 128. I wonder why error passive was not
> reached.

Hmmm, that is a good question and You are right > 127 should be error-passive,
anyway, just realised now, what means then 'error-warning' because I just
know error-active, error-passive and bus-off.

>
>> Connecting again:
>> ip -d -s link show can0
>> 6: can0:<NOARP,UP,LOWER_UP,ECHO>  mtu 16 qdisc pfifo_fast state UNKNOWN
>> qlen 10
>>      link/can
>>      can state ERROR-WARNING (berr-counter tx 120 rx 0) restart-ms 0
>>      bitrate 500000 sample-point 0.875
>>      tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>>      cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>>      clock 8000000
>>      re-started bus-errors arbit-lost error-warn error-pass bus-off
>>      0          0          0          1          0          0
>>      RX: bytes  packets  errors  dropped overrun mcast
>>      473        311      0       0       0       0
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      200        200      0       0       0       0
>>
>> After some time (around 125 seconds):
>> dmesg:
>> [ 2387.172008] cc770_isa cc770_isa.0: can0: status interrupt (0x18)
>> ip -d -s link show can0
>> 6: can0:<NOARP,UP,LOWER_UP,ECHO>  mtu 16 qdisc pfifo_fast state UNKNOWN
>> qlen 10
>>      link/can
>>      can state ERROR-ACTIVE (berr-counter tx 29 rx 0) restart-ms 0
>>      bitrate 500000 sample-point 0.875
>>      tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>>      cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>>      clock 8000000
>>      re-started bus-errors arbit-lost error-warn error-pass bus-off
>>      0          0          0          1          0          0
>>      RX: bytes  packets  errors  dropped overrun mcast
>>      616        447      0       0       0       0
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      291        291      0       0       0       0
>
> OK, the state is back to error active (counter<  96).
>
> Thanks for testing...

You are welcome, however, I have to thank You for Your work done.

So, I'll try as soon as I can another series of tests and may be
You let me know if You have patches I should include as well.

>
> Wolfgang.
>
>
>

Thanks

Wolfgang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wolfgang Grandegger Dec. 12, 2011, 11:55 a.m. UTC | #2
On 12/12/2011 12:18 PM, Wolfgang Zarre wrote:
> Hello Wolfgang,
>> Hi Wolfgang,
>>
>> On 12/11/2011 07:33 PM, Wolfgang Zarre wrote:
>>> Hello Wolfgang,
>>>> On 12/07/2011 02:42 PM, Wolfgang Grandegger wrote:
>>>>> Hi Wolfgang,
>>>>>
>>>>> On 12/06/2011 10:08 PM, Wolfgang Zarre wrote:
>> ...
>>>>>> Let me know if You need more or some other tests.
>>>>>
>>>>> You could provoke some state changes or bus-off conditions to see
>>>>> if the
>>>>> berr-counter shows reasonable results. I'm currently consolidating and
>>>>> unifying error state and bus-off handling. Would be nice if you
>>>>> could do
>>>>> some further tests when I have the patches ready...
>>>>
>>>> I just pushed the mentioned modifications to the "devel" branch of my
>>>> "wg-linux-can-next" [1] repository. You can get it as shown below:
>>>>
>>>>     $ git clone --reference=<some-recent-net-next-tree>   \
>>>>        
>>>> git://gitorious.org/~wgrandegger/linux-can/wg-linux-can-next.git
>>>>     $ git checkout -b devel devel
>>>>
>>>> [1] https://gitorious.org/~wgrandegger/linux-can/wg-linux-can-next
>>>>
>>>> Wolfgang.
>>>
>>> OK, I was trying so far and You will find below the results.
>>> Just FYI the states on the PLC side couldn't be verified because the
>>> function
>>> provided by the manufacturer is not working at all and CAN analyser
>>> was not
>>> available.
>>>
>>> We are running CANopen and therefore the PLC will send automatically a
>>> heartbeat.
>>>
>>> I produced the bus-off state through a short circuit between L/H
>>> which was
>>> working as expected.
>>>
>>> A bit odd was that on the second try I had to reload the module
>>> because a ip down/up was not enough.
>>
>> Oops, not good.
>>
> 
> But might be in connection with the strange behaviour of the PLC.

It's a bug! netif_start_queue is missing at the end of the open
function. Got lost some how. I have just updated (rebased!) my
wg-linux-can-next repository.

Wolfgang.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wolfgang Zarre Dec. 21, 2011, 6:32 p.m. UTC | #3
Hello Wolfgang,

> On 12/12/2011 12:18 PM, Wolfgang Zarre wrote:
>> Hello Wolfgang,
>>> Hi Wolfgang,
>>>
>>> On 12/11/2011 07:33 PM, Wolfgang Zarre wrote:
>>>> Hello Wolfgang,
>>>>> On 12/07/2011 02:42 PM, Wolfgang Grandegger wrote:
>>>>>> Hi Wolfgang,
>>>>>>
>>>>>> On 12/06/2011 10:08 PM, Wolfgang Zarre wrote:
>>> ...
>>>>>>> Let me know if You need more or some other tests.
>>>>>>
>>>>>> You could provoke some state changes or bus-off conditions to see
>>>>>> if the
>>>>>> berr-counter shows reasonable results. I'm currently consolidating and
>>>>>> unifying error state and bus-off handling. Would be nice if you
>>>>>> could do
>>>>>> some further tests when I have the patches ready...
>>>>>
>>>>> I just pushed the mentioned modifications to the "devel" branch of my
>>>>> "wg-linux-can-next" [1] repository. You can get it as shown below:
>>>>>
>>>>>      $ git clone --reference=<some-recent-net-next-tree>    \
>>>>>
>>>>> git://gitorious.org/~wgrandegger/linux-can/wg-linux-can-next.git
>>>>>      $ git checkout -b devel devel
>>>>>
>>>>> [1] https://gitorious.org/~wgrandegger/linux-can/wg-linux-can-next
>>>>>
>>>>> Wolfgang.
>>>>
>>>> OK, I was trying so far and You will find below the results.
>>>> Just FYI the states on the PLC side couldn't be verified because the
>>>> function
>>>> provided by the manufacturer is not working at all and CAN analyser
>>>> was not
>>>> available.
>>>>
>>>> We are running CANopen and therefore the PLC will send automatically a
>>>> heartbeat.
>>>>
>>>> I produced the bus-off state through a short circuit between L/H
>>>> which was
>>>> working as expected.
>>>>
>>>> A bit odd was that on the second try I had to reload the module
>>>> because a ip down/up was not enough.
>>>
>>> Oops, not good.
>>>
>>
>> But might be in connection with the strange behaviour of the PLC.
>
> It's a bug! netif_start_queue is missing at the end of the open
> function. Got lost some how. I have just updated (rebased!) my
> wg-linux-can-next repository.

Ok, I was checking out last week and since I'm running one test series
after the other.

There are several odd issues I could found and I'm trying to trace them
down beside some other work.

Even with an assumed correct configuration like I was using with the lincan
driver I'm loosing telegrams so around 1 till 2 in 500000 but might be a
different sample-point at the PLC which is opaque due the predefined setting.
For the next test I'll set the BTR's directly.
Further sometimes I can find one in dropped but mostly not.

But more odd is that after an undefined time the transmission gets
stuck followed by a buffer overrun but can receive.
No error messages nor changes in ip -d -s link show can0.

Additional it seems that neither the automatic restart nor
the manual one works.

ip link set can0 up type can restart gives me 'RTNETLINK answers: Invalid
argument' and ip link set can0 up type can bitrate 500000 restart a
RTNETLINK answers: Device or resource busy but nothing connected to can0.

So I have to perform per example  ip link set can0 down;ip link set can0 up
type can bitrate 500000 restart-ms 2000 sample-point 0.75
but this is emptying the buffer and these telegrams are lost then as well.

I was comparing with my lincan driver which was running so far ok also
to confirm a proper working PLC.

First I assumed that maybe the set_reset_mode procedure is responsible for
that misbehaviour because according to the cc770 manual we should wait for
a zero of bit 7 RstST of the CPU interface register but when the transmission
gets stuck there was no call for set_reset_mode.

Maybe it's ending up somehow recessive.

Anyway, I might compare the registers of both drivers just to figure out
what's going on but maybe You have an idea as well.

Problem is just it runs always quite some time until the issues happen
otherwise it would be more easy.



>
> Wolfgang.


Wolfgang

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wolfgang Grandegger Dec. 22, 2011, 9:37 a.m. UTC | #4
Hi Wolfgang,

On 12/21/2011 07:32 PM, Wolfgang Zarre wrote:
> Hello Wolfgang,
...

>> It's a bug! netif_start_queue is missing at the end of the open
>> function. Got lost some how. I have just updated (rebased!) my
>> wg-linux-can-next repository.
> 
> Ok, I was checking out last week and since I'm running one test series
> after the other.
> 
> There are several odd issues I could found and I'm trying to trace them
> down beside some other work.
> 
> Even with an assumed correct configuration like I was using with the lincan
> driver I'm loosing telegrams so around 1 till 2 in 500000 but might be a
> different sample-point at the PLC which is opaque due the predefined
> setting.

In principle, messages can be lost because the cc770 does buffer only up
to two messages in hardware. If they are not read out quickly enough,
message loss will happen. The CAN statistics should list such overruns,
though.

> For the next test I'll set the BTR's directly.

OK, if you do not see bus errors, everything should be fine.

> Further sometimes I can find one in dropped but mostly not.
> 
> But more odd is that after an undefined time the transmission gets
> stuck followed by a buffer overrun but can receive.

I recently found a bug. Please try this fix:

http://marc.info/?l=linux-can&m=132370253713701&w=4

Did you realize related error messages in the dmesg output?

> No error messages nor changes in ip -d -s link show can0.
> 
> Additional it seems that neither the automatic restart nor
> the manual one works.

What version are you using. I think this problem has been fixed by
adding the missing netif_start_queue() at the end of the open
function, as mentioned above. Do you have that in your driver?

> ip link set can0 up type can restart gives me 'RTNETLINK answers: Invalid
> argument' and ip link set can0 up type can bitrate 500000 restart a
> RTNETLINK answers: Device or resource busy but nothing connected to can0.

The error message is shown because you try to set bitrate when the
device is up. For the restart after bus-off just type:

  # ip link set can0 type can restart

Anyway, if you run into a bus-off, then it's likely that you have
electrical problems on the CAN bus, e.g. termination, mismatching
bit-timing parameters.

> So I have to perform per example  ip link set can0 down;ip link set can0 up
> type can bitrate 500000 restart-ms 2000 sample-point 0.75
> but this is emptying the buffer and these telegrams are lost then as well.
> 
> I was comparing with my lincan driver which was running so far ok also
> to confirm a proper working PLC.
> 
> First I assumed that maybe the set_reset_mode procedure is responsible for
> that misbehaviour because according to the cc770 manual we should wait for
> a zero of bit 7 RstST of the CPU interface register but when the
> transmission
> gets stuck there was no call for set_reset_mode.
> 
> Maybe it's ending up somehow recessive.
> 
> Anyway, I might compare the registers of both drivers just to figure out
> what's going on but maybe You have an idea as well.
> 
> Problem is just it runs always quite some time until the issues happen
> otherwise it would be more easy.

Again, please check if you have netif_start_queue() at the end of the
open function.

Wolfgang.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wolfgang Zarre Dec. 22, 2011, 1:20 p.m. UTC | #5
Hello Wolfgang,
> Hi Wolfgang,
>
> On 12/21/2011 07:32 PM, Wolfgang Zarre wrote:
>> Hello Wolfgang,
> ...
>
>>> It's a bug! netif_start_queue is missing at the end of the open
>>> function. Got lost some how. I have just updated (rebased!) my
>>> wg-linux-can-next repository.
>>
>> Ok, I was checking out last week and since I'm running one test series
>> after the other.
>>
>> There are several odd issues I could found and I'm trying to trace them
>> down beside some other work.
>>
>> Even with an assumed correct configuration like I was using with the lincan
>> driver I'm loosing telegrams so around 1 till 2 in 500000 but might be a
>> different sample-point at the PLC which is opaque due the predefined
>> setting.
>
> In principle, messages can be lost because the cc770 does buffer only up
> to two messages in hardware. If they are not read out quickly enough,
> message loss will happen. The CAN statistics should list such overruns,
> though.
>
Actually I loose them on transmission, not reception, but as mentioned
one time we traced with a second PC and there the telegrams are not lost
which means they are really going over the bus physically.
So maybe just a timing issue but for now secondary.

However the telegrams are sent with 5ms space parallel to the heartbeat.

>> For the next test I'll set the BTR's directly.
>
> OK, if you do not see bus errors, everything should be fine.
>

The test with BTR's set was not working out due the fact that
the software for coding the PLC doesn't allow, I'm loving it.

>> Further sometimes I can find one in dropped but mostly not.
>>
>> But more odd is that after an undefined time the transmission gets
>> stuck followed by a buffer overrun but can receive.
>
> I recently found a bug. Please try this fix:
>
> http://marc.info/?l=linux-can&m=132370253713701&w=4

The fix is already included as checked out.

>
> Did you realize related error messages in the dmesg output?

Nothing at all, as mentioned .

>
>> No error messages nor changes in ip -d -s link show can0.
>>
>> Additional it seems that neither the automatic restart nor
>> the manual one works.
>
> What version are you using. I think this problem has been fixed by
> adding the missing netif_start_queue() at the end of the open
> function, as mentioned above. Do you have that in your driver?
>

Yes, is already included as well, I'm using commit
eec921ac28fde243456078a557768808d93d94a3


>> ip link set can0 up type can restart gives me 'RTNETLINK answers: Invalid
>> argument' and ip link set can0 up type can bitrate 500000 restart a
>> RTNETLINK answers: Device or resource busy but nothing connected to can0.
>
> The error message is shown because you try to set bitrate when the
> device is up. For the restart after bus-off just type:
>
>    # ip link set can0 type can restart

Actually I tried it when it's get stuck but is anyway a hint that
the device is still up,

>
> Anyway, if you run into a bus-off, then it's likely that you have
> electrical problems on the CAN bus, e.g. termination, mismatching
> bit-timing parameters.

As said I have no indication of any kind of problem:
5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
     link/can
     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 2000
     bitrate 500000 sample-point 0.750
     tq 125 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
     cc770: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
     clock 16000000
     re-started bus-errors arbit-lost error-warn error-pass bus-off
     0          0          0          0          0          0
     RX: bytes  packets  errors  dropped overrun mcast
     76506      74616    0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     2450703    616355   0       0       0       0

>
>> So I have to perform per example  ip link set can0 down;ip link set can0 up
>> type can bitrate 500000 restart-ms 2000 sample-point 0.75
>> but this is emptying the buffer and these telegrams are lost then as well.
>>
>> I was comparing with my lincan driver which was running so far ok also
>> to confirm a proper working PLC.
>>
>> First I assumed that maybe the set_reset_mode procedure is responsible for
>> that misbehaviour because according to the cc770 manual we should wait for
>> a zero of bit 7 RstST of the CPU interface register but when the
>> transmission
>> gets stuck there was no call for set_reset_mode.
>>
>> Maybe it's ending up somehow recessive.
>>
>> Anyway, I might compare the registers of both drivers just to figure out
>> what's going on but maybe You have an idea as well.
>>
>> Problem is just it runs always quite some time until the issues happen
>> otherwise it would be more easy.
>
> Again, please check if you have netif_start_queue() at the end of the
> open function.
>

As said I'm using eec921ac28fde243456078a557768808d93d94a3

However, I'll try further to investigate that issue due the fact having it
running with my lincan without problems and therefore it should be possible
to find the problem.

> Wolfgang.

Wolfgang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From e7b36500c9491ab026bd3c16dfca2ca4338524ac Mon Sep 17 00:00:00 2001
From: Wolfgang Grandegger <wg@grandegger.com>
Date: Mon, 12 Dec 2011 10:09:22 +0100
Subject: [PATCH] candump: add support for error states going backward

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
---
 lib.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/lib.c b/lib.c
index a8ed2fe..7f810b9 100644
--- a/lib.c
+++ b/lib.c
@@ -318,6 +318,7 @@  static const char *error_classes[] = {
 	"bus-off",
 	"bus-error",
 	"restarted-after-bus-off",
+	"state-change",
 };
 
 static const char *controller_problems[] = {
@@ -327,6 +328,7 @@  static const char *controller_problems[] = {
 	"tx-error-warning",
 	"rx-error-passive",
 	"tx-error-passive",
+	"back-to-error-active",
 };
 
 static const char *protocol_violation_types[] = {
@@ -471,6 +473,8 @@  void snprintf_can_error_frame(char *buf, size_t len, struct can_frame *cf,
 			if (mask == CAN_ERR_LOSTARB)
 				n += snprintf_error_lostarb(buf + n, len - n,
 							   cf);
+			if (mask == CAN_ERR_STATE_CHANGE)
+				n += snprintf_error_ctrl(buf + n, len - n, cf);
 			if (mask == CAN_ERR_CRTL)
 				n += snprintf_error_ctrl(buf + n, len - n, cf);
 			if (mask == CAN_ERR_PROT)
-- 
1.7.4.1