diff mbox

[ovs-dev] ovn-northd: Recalculate db mappings if the txn returns TXN_ERROR

Message ID 566D7DD4.5070708@redhat.com
State Changes Requested
Headers show

Commit Message

Numan Siddique Dec. 13, 2015, 2:16 p.m. UTC
Fixes the issue
Reported-at: http://openvswitch.org/pipermail/discuss/2015-November/019443.html

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
---
 lib/ovsdb-idl.c         |  1 +
 lib/ovsdb-idl.h         |  1 +
 ovn/northd/ovn-northd.c | 21 ++++++++++++++++++++-
 3 files changed, 22 insertions(+), 1 deletion(-)

Comments

Ben Pfaff Feb. 3, 2016, 11:35 p.m. UTC | #1
On Sun, Dec 13, 2015 at 07:46:52PM +0530, Numan Siddique wrote:
> Fixes the issue
> Reported-at: http://openvswitch.org/pipermail/discuss/2015-November/019443.html
> 
> Signed-off-by: Numan Siddique <nusiddiq@redhat.com>

This patch has been sitting too long without review.  This is my fault.
I apologize.

I'm concerned about this patch for a couple of reasons.  First, the goal
of the ovsdb_idl_loop abstraction is to make it easy for clients to get
all of this stuff right.  This patch adds more code outside the
ovsdb_idl_loop implementation in its client, somewhat reducing the value
of the abstraction.  It would be better if we could instead integrate
the logic into ovsdb_idl_loop itself.

My second concern is larger.  TXN_ERROR is supposed to indicate a hard
error, one for which retrying immediately is not going to help.  What
kind of error did testing actually encounter?  Perhaps the real bug is
that an error is being misclassified as TXN_ERROR.
Numan Siddique Feb. 4, 2016, 7:07 a.m. UTC | #2
On 02/04/2016 05:05 AM, Ben Pfaff wrote:
> I'm concerned about this patch for a couple of reasons.  First, the goal
> of the ovsdb_idl_loop abstraction is to make it easy for clients to get
> all of this stuff right.  This patch adds more code outside the
> ovsdb_idl_loop implementation in its client, somewhat reducing the value
> of the abstraction.  It would be better if we could instead integrate
> the logic into ovsdb_idl_loop itself.

Thanks for the review and comments Ben. I agree.
> My second concern is larger.  TXN_ERROR is supposed to indicate a hard
> error, one for which retrying immediately is not going to help.  What
> kind of error did testing actually encounter?  Perhaps the real bug is
> that an error is being misclassified as TXN_ERROR.

I will investigate it further to see why TXN_ERROR is returned.

Thanks
Numan
Numan Siddique Feb. 4, 2016, 1:43 p.m. UTC | #3
On 02/04/2016 12:37 PM, Numan Siddique wrote:
> Thanks for the review and comments Ben. I agree.
>> > My second concern is larger.  TXN_ERROR is supposed to indicate a hard
>> > error, one for which retrying immediately is not going to help.  What
>> > kind of error did testing actually encounter?  Perhaps the real bug is
>> > that an error is being misclassified as TXN_ERROR.
> I will investigate it further to see why TXN_ERROR is returned.

I did some testing and I am not able to see the TXN_ERROR anymore, but still I am able to reproduce
the issue mentioned here - http://openvswitch.org/pipermail/discuss/2015-November/019443.html

Just to brief about the issue
 - Suppose Southbound db table Logical_Flows has 8 flows in it
 - And I modify the ovn_northd.c to add a flow (lets say in build_lswitch_flows) and
   restart ovn-northd, I don't see the Logical_Flows table updated to 9 flows.
 - Later if i update Northbound db (by running ovn-nbctl), the Logical_Flows gets synced up.
 - I am able to reproduce this issue most of the times.

The reason for the issue is the below code

**********************
        if (ovnnb_seqno != ovsdb_idl_get_seqno(ctx.ovnnb_idl)) {
            ovnnb_seqno = ovsdb_idl_get_seqno(ctx.ovnnb_idl);
            ovnnb_db_run(&ctx);
        }
        if (ovnsb_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) {
            ovnsb_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl);
            ovnsb_db_run(&ctx);
        }
********************

When ovnnb_db_run is called, and if ctx->ovnsb_txn is NULL, ovnnb_db_run returns immediately without
generating the logical flows.

ovnnb_db_run is not called again until the Northbound db seqno changes.

I think its better to revert this commit - f20396e051ea4fd623bfc022cc78d9bd52a850e5


Thanks
Numan
diff mbox

Patch

diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
index 0d02ae8..2e536c9 100644
--- a/lib/ovsdb-idl.c
+++ b/lib/ovsdb-idl.c
@@ -3209,6 +3209,7 @@  ovsdb_idl_loop_commit_and_wait(struct ovsdb_idl_loop *loop)
             ovsdb_idl_txn_destroy(txn);
             loop->committing_txn = NULL;
         }
+        loop->last_commit_status = status;
     }
 
     ovsdb_idl_wait(loop->idl);
diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h
index 4c66ae0..5f7f2fc 100644
--- a/lib/ovsdb-idl.h
+++ b/lib/ovsdb-idl.h
@@ -266,6 +266,7 @@  struct ovsdb_idl_loop {
     unsigned int precommit_seqno;
 
     struct ovsdb_idl_txn *open_txn;
+    enum ovsdb_idl_txn_status last_commit_status;
 };
 
 #define OVSDB_IDL_LOOP_INITIALIZER(IDL) { .idl = (IDL) }
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 270b116..be4d6a8 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -1912,6 +1912,8 @@  main(int argc, char *argv[])
 
     /* Main loop. */
     exiting = false;
+    bool ovnnb_changes_pending = false;
+    bool ovnsb_changes_pending = false;
     while (!exiting) {
         struct northd_context ctx = {
             .ovnnb_idl = ovnnb_idl_loop.idl,
@@ -1922,11 +1924,28 @@  main(int argc, char *argv[])
 
         if (ovnnb_seqno != ovsdb_idl_get_seqno(ctx.ovnnb_idl)) {
             ovnnb_seqno = ovsdb_idl_get_seqno(ctx.ovnnb_idl);
-            ovnnb_db_run(&ctx);
+            ovnnb_changes_pending = true;
         }
         if (ovnsb_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) {
             ovnsb_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl);
+            ovnsb_changes_pending = true;
+        }
+
+        /*
+         *  We need to recalculate the mappings to the OVN-sb db
+         *   - If the OVN-nb db has changed or
+         *   - If the previous update to the OVN-sb is not yet completed.
+         */
+        if (ovnnb_changes_pending ||
+            ovnsb_idl_loop.last_commit_status == TXN_ERROR) {
+            ovnnb_db_run(&ctx);
+            ovnnb_changes_pending = false;
+        }
+
+        if (ovnsb_changes_pending ||
+            ovnnb_idl_loop.last_commit_status == TXN_ERROR) {
             ovnsb_db_run(&ctx);
+            ovnsb_changes_pending = false;
         }
 
         unixctl_server_run(unixctl);