Message ID | 20190617045612.3509-1-tuong.t.lien@dektech.com.au |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net] tipc: fix issues with early FAILOVER_MSG from peer | expand |
From: Tuong Lien <tuong.t.lien@dektech.com.au> Date: Mon, 17 Jun 2019 11:56:12 +0700 > It appears that a FAILOVER_MSG can come from peer even when the failure > link is resetting (i.e. just after the 'node_write_unlock()'...). This > means the failover procedure on the node has not been started yet. > The situation is as follows: ... > Once this happens, the link failover procedure will be triggered > wrongly on the receiving node since the node isn't in FAILINGOVER state > but then another link failover will be carried out. > The consequences are: > > 1) A peer might get stuck in FAILINGOVER state because the 'sync_point' > was set, reset and set incorrectly, the criteria to end the failover > would not be met, it could keep waiting for a message that has already > received. > > 2) The early FAILOVER_MSG(s) could be queued in the link failover > deferdq but would be purged or not pulled out because the 'drop_point' > was not set correctly. > > 3) The early FAILOVER_MSG(s) could be dropped too. > > 4) The dummy FAILOVER_MSG could make the peer leaving FAILINGOVER state > shortly, but later on it would be restarted. > > The same situation can also happen when the link is in PEER_RESET state > and a FAILOVER_MSG arrives. > > The commit resolves the issues by forcing the link down immediately, so > the failover procedure will be started normally (which is the same as > when receiving a FAILOVER_MSG and the link is in up state). > > Also, the function "tipc_node_link_failover()" is toughen to avoid such > a situation from happening. > > Acked-by: Jon Maloy <jon.maloy@ericsson.se> > Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Applied, thank you.
diff --git a/net/tipc/link.c b/net/tipc/link.c index d5ed509e0660..bcfb0a4ab485 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -1762,7 +1762,6 @@ void tipc_link_failover_prepare(struct tipc_link *l, struct tipc_link *tnl, * node has entered SELF_DOWN_PEER_LEAVING and both peer nodes * would have to start over from scratch instead. */ - WARN_ON(l && tipc_link_is_up(l)); tnl->drop_point = 1; tnl->failover_reasm_skb = NULL; diff --git a/net/tipc/node.c b/net/tipc/node.c index e4dba865105e..65644642c091 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -777,9 +777,9 @@ static void tipc_node_link_up(struct tipc_node *n, int bearer_id, * disturbance, wrong session, etc.) * 3. Link <1B-2B> up * 4. Link endpoint 2A down (e.g. due to link tolerance timeout) - * 5. Node B starts failover onto link <1B-2B> + * 5. Node 2 starts failover onto link <1B-2B> * - * ==> Node A does never start link/node failover! + * ==> Node 1 does never start link/node failover! * * @n: tipc node structure * @l: link peer endpoint failingover (- can be NULL) @@ -794,6 +794,10 @@ static void tipc_node_link_failover(struct tipc_node *n, struct tipc_link *l, if (!tipc_link_is_up(tnl)) return; + /* Don't rush, failure link may be in the process of resetting */ + if (l && !tipc_link_is_reset(l)) + return; + tipc_link_fsm_evt(tnl, LINK_SYNCH_END_EVT); tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT); @@ -1719,7 +1723,7 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb, /* Initiate or update failover mode if applicable */ if ((usr == TUNNEL_PROTOCOL) && (mtyp == FAILOVER_MSG)) { syncpt = oseqno + exp_pkts - 1; - if (pl && tipc_link_is_up(pl)) { + if (pl && !tipc_link_is_reset(pl)) { __tipc_node_link_down(n, &pb_id, xmitq, &maddr); trace_tipc_node_link_down(n, true, "node link down <- failover!");