diff mbox series

[ovs-dev,v2,5/5] northd, controller: Network Function Health monitoring.

Message ID 20250313082650.19553-6-sragdha.chaudhu@nutanix.com
State Superseded
Headers show
Series *** Network Function Insertion. *** | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_ovn-kubernetes success github build: passed
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed
ovsrobot/github-robot-_Build_and_Test fail github build: failed

Commit Message

Sragdhara Datta Chaudhuri March 13, 2025, 8:26 a.m. UTC
The LB health monitoring functionality has been extended to support NFs.
Network_Function_Group has a list of Network_Functions, each of which has a
reference to network_Function_Health_Check that has the monitoring config.
There is a corresponding SB service_monitor maintaining the online/offline
status. When status changes, northd picks one of the “online” NFs and sets in
network_function_active field of NFG. The redirection rule in LS uses the
ports from this NF.

Ovn-controller performs the health monitoring by sending ICMP echo request
with source IP and MAC from NB global options “svc_monitor_ip4” and
“svc_monitor_mac”, and destination IP and MAC from new NB global options
“svc_monitor_ip4_dst” and “svc_monitor_mac_dst”. The sequence number and id
are randomly generated and stored in service_mon. The NF VM forwards the
same packet out of the other port. When it comes out, ovn-controller matches
the sequence number and id with stored values and marks online if matched.

In SB Service_Monitor table three new fields have been added:
type: to indicate “load-balancer” or “network-function”
mac: the destination MAC address for the monitor packets
logical_input_port: The LSP to which the probe packet would be sent
                    (taken from inport of Network_Function)

Co-authored-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
Co-authored-by: Karthik Chandrashekar <karthik.c@nutanix.com>
Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
Signed-off-by: Karthik Chandrashekar <karthik.c@nutanix.com>
Signed-off-by: Sragdhara Datta Chaudhuri <sragdha.chaudhu@nutanix.com>
---
 controller/pinctrl.c      | 252 +++++++++++++++++++++++++++++-------
 northd/en-global-config.c |  75 +++++++++++
 northd/en-global-config.h |  12 +-
 northd/en-northd.c        |   4 +
 northd/en-sync-sb.c       |  16 ++-
 northd/northd.c           | 266 +++++++++++++++++++++++++++++++++-----
 northd/northd.h           |   6 +-
 ovn-sb.ovsschema          |  12 +-
 ovn-sb.xml                |  22 +++-
 9 files changed, 576 insertions(+), 89 deletions(-)

Comments

Sragdhara Datta Chaudhuri March 14, 2025, 8:22 p.m. UTC | #1
Recheck-request: github-robot-_Build_and_Test


From: Sragdhara Datta Chaudhuri <sragdha.chaudhu@nutanix.com>
Date: Thursday, March 13, 2025 at 1:27 AM
To: ovs-dev@openvswitch.org <ovs-dev@openvswitch.org>
Cc: Sragdhara Datta Chaudhuri <sragdha.chaudhu@nutanix.com>, Naveen Yerramneni <naveen.yerramneni@nutanix.com>, Karthik Chandrashekar <karthik.c@nutanix.com>
Subject: [PATCH OVN v2 5/5] northd, controller: Network Function Health monitoring.
The LB health monitoring functionality has been extended to support NFs.
Network_Function_Group has a list of Network_Functions, each of which has a
reference to network_Function_Health_Check that has the monitoring config.
There is a corresponding SB service_monitor maintaining the online/offline
status. When status changes, northd picks one of the “online” NFs and sets in
network_function_active field of NFG. The redirection rule in LS uses the
ports from this NF.

Ovn-controller performs the health monitoring by sending ICMP echo request
with source IP and MAC from NB global options “svc_monitor_ip4” and
“svc_monitor_mac”, and destination IP and MAC from new NB global options
“svc_monitor_ip4_dst” and “svc_monitor_mac_dst”. The sequence number and id
are randomly generated and stored in service_mon. The NF VM forwards the
same packet out of the other port. When it comes out, ovn-controller matches
the sequence number and id with stored values and marks online if matched.

In SB Service_Monitor table three new fields have been added:
type: to indicate “load-balancer” or “network-function”
mac: the destination MAC address for the monitor packets
logical_input_port: The LSP to which the probe packet would be sent
                    (taken from inport of Network_Function)

Co-authored-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
Co-authored-by: Karthik Chandrashekar <karthik.c@nutanix.com>
Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
Signed-off-by: Karthik Chandrashekar <karthik.c@nutanix.com>
Signed-off-by: Sragdhara Datta Chaudhuri <sragdha.chaudhu@nutanix.com>
---
 controller/pinctrl.c      | 252 +++++++++++++++++++++++++++++-------
 northd/en-global-config.c |  75 +++++++++++
 northd/en-global-config.h |  12 +-
 northd/en-northd.c        |   4 +
 northd/en-sync-sb.c       |  16 ++-
 northd/northd.c           | 266 +++++++++++++++++++++++++++++++++-----
 northd/northd.h           |   6 +-
 ovn-sb.ovsschema          |  12 +-
 ovn-sb.xml                |  22 +++-
 9 files changed, 576 insertions(+), 89 deletions(-)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index 47c4bf78b..1d7289b85 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -7480,8 +7480,17 @@ enum svc_monitor_status {
 enum svc_monitor_protocol {
     SVC_MON_PROTO_TCP,
     SVC_MON_PROTO_UDP,
+    SVC_MON_PROTO_ICMP,
 };

+enum svc_monitor_type {
+    /* load balancer */
+    SVC_MON_TYPE_LB,
+    /* network function */
+    SVC_MON_TYPE_NF,
+};
+
+
 /* Service monitor health checks. */
 struct svc_monitor {
     struct hmap_node hmap_node;
@@ -7494,6 +7503,7 @@ struct svc_monitor {
     /* key */
     struct in6_addr ip;
     uint32_t dp_key;
+    uint32_t input_port_key;
     uint32_t port_key;
     uint32_t proto_port; /* tcp/udp port */

@@ -7526,6 +7536,7 @@ struct svc_monitor {
     int n_failures;

     enum svc_monitor_protocol protocol;
+    enum svc_monitor_type type;
     enum svc_monitor_state state;
     enum svc_monitor_status status;
     struct dp_packet pkt;
@@ -7533,6 +7544,9 @@ struct svc_monitor {
     uint32_t seq_no;
     ovs_be16 tp_src;

+    ovs_be16 icmp_id;
+    ovs_be16 icmp_seq_no;
+
     bool delete;
 };

@@ -7598,9 +7612,28 @@ sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn,

     const struct sbrec_service_monitor *sb_svc_mon;
     SBREC_SERVICE_MONITOR_TABLE_FOR_EACH (sb_svc_mon, svc_mon_table) {
+        enum svc_monitor_type mon_type;
+        if (sb_svc_mon->type
+            && !strcmp(sb_svc_mon->type, "network-function")) {
+            mon_type = SVC_MON_TYPE_NF;
+        } else {
+            mon_type = SVC_MON_TYPE_LB;
+        }
+
+        enum svc_monitor_protocol protocol;
+        if (!strcmp(sb_svc_mon->protocol, "udp")) {
+            protocol = SVC_MON_PROTO_UDP;
+        } else if (!strcmp(sb_svc_mon->protocol, "icmp")) {
+            protocol = SVC_MON_PROTO_ICMP;
+        } else {
+            protocol = SVC_MON_PROTO_TCP;
+        }
+
         const struct sbrec_port_binding *pb
             = lport_lookup_by_name(sbrec_port_binding_by_name,
                                    sb_svc_mon->logical_port);
+        const struct sbrec_port_binding *input_pb = NULL;
+
         if (!pb) {
             continue;
         }
@@ -7620,39 +7653,65 @@ sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn,

         struct eth_addr ea;
         bool mac_found = false;
-        for (size_t i = 0; i < pb->n_mac && !mac_found; i++) {
-            struct lport_addresses laddrs;

-            if (!extract_lsp_addresses(pb->mac[i], &laddrs)) {
+        if (mon_type == SVC_MON_TYPE_NF) {
+            if (protocol != SVC_MON_PROTO_ICMP) {
+                continue;
+            }
+            input_pb = lport_lookup_by_name(sbrec_port_binding_by_name,
+                                            sb_svc_mon->logical_input_port);
+            if (!input_pb) {
+                continue;
+            }
+            if (input_pb->chassis != our_chassis) {
+                continue;
+            }
+            if (strcmp(sb_svc_mon->mac, "")) {
+                if (eth_addr_from_string(sb_svc_mon->mac, &ea)) {
+                    mac_found = true;
+                }
+            }
+        } else {
+            if (protocol != SVC_MON_PROTO_TCP &&
+                protocol != SVC_MON_PROTO_UDP) {
                 continue;
             }

-            if (is_ipv4) {
-                for (size_t j = 0; j < laddrs.n_ipv4_addrs; j++) {
-                    if (ip4 == laddrs.ipv4_addrs[j].addr) {
-                        ea = laddrs.ea;
-                        mac_found = true;
-                        break;
-                    }
+            for (size_t i = 0; i < pb->n_mac && !mac_found; i++) {
+                struct lport_addresses laddrs;
+
+                if (!extract_lsp_addresses(pb->mac[i], &laddrs)) {
+                    continue;
                 }
-            } else {
-                for (size_t j = 0; j < laddrs.n_ipv6_addrs; j++) {
-                    if (IN6_ARE_ADDR_EQUAL(&ip_addr,
-                                           &laddrs.ipv6_addrs[j].addr)) {
-                        ea = laddrs.ea;
-                        mac_found = true;
-                        break;
+
+                if (is_ipv4) {
+                    for (size_t j = 0; j < laddrs.n_ipv4_addrs; j++) {
+                        if (ip4 == laddrs.ipv4_addrs[j].addr) {
+                            ea = laddrs.ea;
+                            mac_found = true;
+                            break;
+                        }
+                    }
+                } else {
+                    for (size_t j = 0; j < laddrs.n_ipv6_addrs; j++) {
+                        if (IN6_ARE_ADDR_EQUAL(&ip_addr,
+                                               &laddrs.ipv6_addrs[j].addr)) {
+                            ea = laddrs.ea;
+                            mac_found = true;
+                            break;
+                        }
                     }
                 }
-            }

-            if (!mac_found && !laddrs.n_ipv4_addrs && !laddrs.n_ipv6_addrs) {
-                /* IP address(es) are not configured. Use the first mac. */
-                ea = laddrs.ea;
-                mac_found = true;
-            }
+                if (!mac_found && !laddrs.n_ipv4_addrs &&
+                    !laddrs.n_ipv6_addrs) {
+                    /* IP address(es) are not configured. Use the first mac. */
+                    ea = laddrs.ea;
+                    mac_found = true;
+                }

-            destroy_lport_addresses(&laddrs);
+                destroy_lport_addresses(&laddrs);
+            }
         }

         if (!mac_found) {
@@ -7661,23 +7720,18 @@ sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn,

         uint32_t dp_key = pb->datapath->tunnel_key;
         uint32_t port_key = pb->tunnel_key;
+        uint32_t input_port_key = input_pb ? input_pb->tunnel_key : UINT32_MAX;
         uint32_t hash =
             hash_bytes(&ip_addr, sizeof ip_addr,
                        hash_3words(dp_key, port_key, sb_svc_mon->port));

-        enum svc_monitor_protocol protocol;
-        if (!sb_svc_mon->protocol || strcmp(sb_svc_mon->protocol, "udp")) {
-            protocol = SVC_MON_PROTO_TCP;
-        } else {
-            protocol = SVC_MON_PROTO_UDP;
-        }
-
         svc_mon = pinctrl_find_svc_monitor(dp_key, port_key, &ip_addr,
                                            sb_svc_mon->port, protocol, hash);

         if (!svc_mon) {
             svc_mon = xmalloc(sizeof *svc_mon);
             svc_mon->dp_key = dp_key;
+            svc_mon->input_port_key = input_port_key;
             svc_mon->port_key = port_key;
             svc_mon->proto_port = sb_svc_mon->port;
             svc_mon->ip = ip_addr;
@@ -7685,6 +7739,7 @@ sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn,
             svc_mon->state = SVC_MON_S_INIT;
             svc_mon->status = SVC_MON_ST_UNKNOWN;
             svc_mon->protocol = protocol;
+            svc_mon->type = mon_type;

             smap_init(&svc_mon->options);
             svc_mon->interval =
@@ -8578,11 +8633,67 @@ svc_monitor_send_udp_health_check(struct rconn *swconn,
     ofpbuf_uninit(&ofpacts);
 }

+
+static void
+svc_monitor_send_icmp_health_check__(struct rconn *swconn,
+                                     struct svc_monitor *svc_mon)
+{
+    uint64_t packet_stub[128 / 8];
+    struct dp_packet packet;
+    dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub);
+
+    struct eth_addr eth_src;
+    eth_addr_from_string(svc_mon->sb_svc_mon->src_mac, &eth_src);
+
+    ovs_be32 ip4_src;
+    ip_parse(svc_mon->sb_svc_mon->src_ip, &ip4_src);
+    pinctrl_compose_ipv4(&packet, eth_src, svc_mon->ea, ip4_src,
+                         in6_addr_get_mapped_ipv4(&svc_mon->ip),
+                         IPPROTO_ICMP, 255, ICMP_HEADER_LEN);
+
+    struct icmp_header *ih = dp_packet_l4(&packet);
+    ih->icmp_fields.echo.id = svc_mon->icmp_id;
+    ih->icmp_fields.echo.seq = svc_mon->icmp_seq_no;
+
+    uint8_t icmp_code = 0;
+    packet_set_icmp(&packet, ICMP4_ECHO_REQUEST, icmp_code);
+
+    ih->icmp_csum = 0;
+    ih->icmp_csum = csum(ih, sizeof *ih);
+
+    uint64_t ofpacts_stub[4096 / 8];
+    struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(ofpacts_stub);
+    enum ofp_version version = rconn_get_version(swconn);
+    put_load(svc_mon->dp_key, MFF_LOG_DATAPATH, 0, 64, &ofpacts);
+    put_load(svc_mon->input_port_key, MFF_LOG_OUTPORT, 0, 32, &ofpacts);
+    put_load(1, MFF_LOG_FLAGS, MLF_LOCAL_ONLY, 1, &ofpacts);
+    struct ofpact_resubmit *resubmit = ofpact_put_RESUBMIT(&ofpacts);
+    resubmit->in_port = OFPP_CONTROLLER;
+    resubmit->table_id = OFTABLE_LOCAL_OUTPUT;
+
+    struct ofputil_packet_out po = {
+        .packet = dp_packet_data(&packet),
+        .packet_len = dp_packet_size(&packet),
+        .buffer_id = UINT32_MAX,
+        .ofpacts = ofpacts.data,
+        .ofpacts_len = ofpacts.size,
+    };
+    match_set_in_port(&po.flow_metadata, OFPP_CONTROLLER);
+    enum ofputil_protocol proto = ofputil_protocol_from_ofp_version(version);
+    queue_msg(swconn, ofputil_encode_packet_out(&po, proto));
+    dp_packet_uninit(&packet);
+    ofpbuf_uninit(&ofpacts);
+}
+
 static void
 svc_monitor_send_health_check(struct rconn *swconn,
                               struct svc_monitor *svc_mon)
 {
-    if (svc_mon->protocol == SVC_MON_PROTO_TCP) {
+    if (svc_mon->protocol == SVC_MON_PROTO_ICMP) {
+        svc_mon->icmp_id = (OVS_FORCE ovs_be16) random_uint16();
+        svc_mon->icmp_seq_no = (OVS_FORCE ovs_be16) random_uint16();
+        svc_monitor_send_icmp_health_check__(swconn, svc_mon);
+    } else if (svc_mon->protocol == SVC_MON_PROTO_TCP) {
         svc_mon->seq_no = random_uint32();
         svc_mon->tp_src = htons(get_random_src_port());
         svc_monitor_send_tcp_health_check__(swconn, svc_mon,
@@ -8623,13 +8734,14 @@ svc_monitors_run(struct rconn *swconn,

         case SVC_MON_S_WAITING:
             if (current_time > svc_mon->wait_time) {
-                if (svc_mon->protocol ==  SVC_MON_PROTO_TCP) {
-                    svc_mon->n_failures++;
-                    svc_mon->state = SVC_MON_S_OFFLINE;
-                } else {
+                if (svc_mon->protocol ==  SVC_MON_PROTO_UDP) {
                     svc_mon->n_success++;
                     svc_mon->state = SVC_MON_S_ONLINE;
+                } else {
+                    svc_mon->n_failures++;
+                    svc_mon->state = SVC_MON_S_OFFLINE;
                 }
+
                 svc_mon->next_send_time = current_time + svc_mon->interval;
                 next_run_time = svc_mon->next_send_time;
             } else {
@@ -8690,6 +8802,27 @@ svc_monitors_wait(long long int svc_monitors_next_run_time)
     }
 }

+
+static void
+pinctrl_handle_icmp_svc_check(struct dp_packet *pkt_in,
+                              struct svc_monitor *svc_mon)
+{
+    struct icmp_header *ih = dp_packet_l4(pkt_in);
+
+    if (!ih) {
+        return;
+    }
+
+    if ((ih->icmp_fields.echo.id != svc_mon->icmp_id) ||
+        (ih->icmp_fields.echo.seq != svc_mon->icmp_seq_no)) {
+        return;
+    }
+
+    svc_mon->n_success++;
+    svc_mon->state = SVC_MON_S_ONLINE;
+    svc_mon->next_send_time = time_msec() + svc_mon->interval;
+}
+
 static bool
 pinctrl_handle_tcp_svc_check(struct rconn *swconn,
                              struct dp_packet *pkt_in,
@@ -8746,6 +8879,7 @@ pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow,
     uint32_t dp_key = ntohll(md->flow.metadata);
     uint32_t port_key = md->flow.regs[MFF_LOG_INPORT - MFF_REG0];
     struct in6_addr ip_addr;
+    struct in6_addr dst_ip_addr;
     struct eth_header *in_eth = dp_packet_data(pkt_in);
     uint8_t ip_proto;

@@ -8761,10 +8895,12 @@ pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow,
         }

         ip_addr = in6_addr_mapped_ipv4(ip_flow->nw_src);
+        dst_ip_addr = in6_addr_mapped_ipv4(ip_flow->nw_dst);
         ip_proto = in_ip->ip_proto;
     } else {
         struct ovs_16aligned_ip6_hdr *in_ip = dp_packet_l3(pkt_in);
         ip_addr = ip_flow->ipv6_src;
+        dst_ip_addr = ip_flow->ipv6_dst;
         ip_proto = in_ip->ip6_nxt;
     }

@@ -8777,7 +8913,6 @@ pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow,
         return;
     }

-
     if (ip_proto == IPPROTO_TCP) {
         uint32_t hash =
             hash_bytes(&ip_addr, sizeof ip_addr,
@@ -8806,17 +8941,36 @@ pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow,
             return;
         }

-        const void *in_ip = dp_packet_get_icmp_payload(pkt_in);
-        if (!in_ip) {
-            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
-            VLOG_WARN_RL(&rl, "Original IP datagram not present in "
-                         "ICMP packet");
-            return;
-        }
-
         if (in_eth->eth_type == htons(ETH_TYPE_IP)) {
             struct icmp_header *ih = l4h;
             /* It's ICMP packet. */
+            if (ih->icmp_type == ICMP4_ECHO_REQUEST && ih->icmp_code == 0) {
+                uint32_t hash = hash_bytes(&dst_ip_addr, sizeof dst_ip_addr,
+                                           hash_3words(dp_key, port_key, 0));
+                struct svc_monitor *svc_mon =
+                    pinctrl_find_svc_monitor(dp_key, port_key, &dst_ip_addr, 0,
+                                             SVC_MON_PROTO_ICMP, hash);
+                if (!svc_mon) {
+                    static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(
+                        1, 5);
+                    VLOG_WARN_RL(&rl, "handle service check: Service monitor "
+                                 "not found for ICMP request");
+                    return;
+                }
+                if (svc_mon->type == SVC_MON_TYPE_NF) {
+                    pinctrl_handle_icmp_svc_check(pkt_in, svc_mon);
+                }
+                return;
+            }
+
+            const void *in_ip = dp_packet_get_icmp_payload(pkt_in);
+            if (!in_ip) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+                VLOG_WARN_RL(&rl, "Original IP datagram not present in "
+                             "ICMP packet");
+                return;
+            }
+
             if (ih->icmp_type != ICMP4_DST_UNREACH || ih->icmp_code != 3) {
                 return;
             }
@@ -8838,6 +8992,14 @@ pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow,
                 return;
             }
         } else {
+            const void *in_ip = dp_packet_get_icmp_payload(pkt_in);
+            if (!in_ip) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+                VLOG_WARN_RL(&rl, "Original IP datagram not present in "
+                             "ICMP packet");
+                return;
+            }
+
             struct icmp6_header *ih6 = l4h;
             if (ih6->icmp6_type != 1 || ih6->icmp6_code != 4) {
                 return;
diff --git a/northd/en-global-config.c b/northd/en-global-config.c
index c103b137f..2a89f2c4d 100644
--- a/northd/en-global-config.c
+++ b/northd/en-global-config.c
@@ -20,6 +20,7 @@

 /* OVS includes */
 #include "openvswitch/vlog.h"
+#include "socket-util.h"

 /* OVN includes */
 #include "debug.h"
@@ -61,6 +62,35 @@ en_global_config_init(struct engine_node *node OVS_UNUSED,
     return data;
 }

+static void
+update_svc_monitor_addr(const char *new_ip4, const char **old_ip4_pptr)
+{
+    if (new_ip4) {
+        struct sockaddr_storage svc_mon_addr;
+        if (inet_parse_address(new_ip4, &svc_mon_addr)) {
+            struct ds ip_s = DS_EMPTY_INITIALIZER;
+            ss_format_address_nobracks(&svc_mon_addr, &ip_s);
+            if ((*old_ip4_pptr == NULL)
+                || strcmp(*old_ip4_pptr, ds_steal_cstr(&ip_s))) {
+                if (*old_ip4_pptr) {
+                    free(CONST_CAST(void *, *old_ip4_pptr));
+                }
+                *old_ip4_pptr = ds_steal_cstr(&ip_s);
+            }
+        } else {
+            if (*old_ip4_pptr) {
+                free(CONST_CAST(void *, *old_ip4_pptr));
+                *old_ip4_pptr = NULL;
+            }
+        }
+    } else {
+        if (*old_ip4_pptr) {
+            free(CONST_CAST(void *, *old_ip4_pptr));
+            *old_ip4_pptr = NULL;
+        }
+    }
+}
+
 void
 en_global_config_run(struct engine_node *node , void *data)
 {
@@ -108,6 +138,27 @@ en_global_config_run(struct engine_node *node , void *data)
         }
     }

+    const char *dst_monitor_mac = smap_get(&nb->options,
+                                           "svc_monitor_mac_dst");
+    if (dst_monitor_mac) {
+        if (eth_addr_from_string(dst_monitor_mac,
+                                 &config_data->svc_monitor_mac_ea_dst)) {
+            snprintf(config_data->svc_monitor_mac_dst,
+                     sizeof config_data->svc_monitor_mac_dst,
+                     ETH_ADDR_FMT,
+                     ETH_ADDR_ARGS(config_data->svc_monitor_mac_ea_dst));
+        } else {
+            dst_monitor_mac = NULL;
+        }
+    }
+
+    const char *monitor_ip4 = smap_get(&nb->options, "svc_monitor_ip4");
+    update_svc_monitor_addr(monitor_ip4, &config_data->svc_monitor_ip4);
+    const char *monitor_ip4_dst = smap_get(&nb->options,
+                                           "svc_monitor_ip4_dst");
+    update_svc_monitor_addr(monitor_ip4_dst,
+                            &config_data->svc_monitor_ip4_dst);
+
     struct smap *options = &config_data->nb_options;
     smap_destroy(options);
     smap_clone(options, &nb->options);
@@ -123,6 +174,15 @@ en_global_config_run(struct engine_node *node , void *data)
                      config_data->svc_monitor_mac);
     }

+    if (!dst_monitor_mac) {
+        eth_addr_random(&config_data->svc_monitor_mac_ea_dst);
+        snprintf(config_data->svc_monitor_mac_dst,
+                 sizeof config_data->svc_monitor_mac_dst, ETH_ADDR_FMT,
+                 ETH_ADDR_ARGS(config_data->svc_monitor_mac_ea_dst));
+        smap_replace(options, "svc_monitor_mac_dst",
+                     config_data->svc_monitor_mac_dst);
+    }
+
     bool ic_vxlan_mode = false;
     const struct nbrec_logical_switch *nbs;
     NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH (nbs, nbrec_ls_table) {
@@ -245,6 +305,21 @@ global_config_nb_global_handler(struct engine_node *node, void *data)
         return false;
     }

+    if (config_out_of_sync(&nb->options, &config_data->nb_options,
+                           "svc_monitor_mac_dst", true)) {
+        return false;
+    }
+
+    if (config_out_of_sync(&nb->options, &config_data->nb_options,
+                           "svc_monitor_ip4", false)) {
+        return false;
+    }
+
+    if (config_out_of_sync(&nb->options, &config_data->nb_options,
+                           "svc_monitor_ip4_dst", false)) {
+        return false;
+    }
+
     /* Check if max_tunid has changed or not. */
     if (config_out_of_sync(&nb->options, &config_data->nb_options,
                            "max_tunid", true)) {
diff --git a/northd/en-global-config.h b/northd/en-global-config.h
index de88db18b..2240c9cd9 100644
--- a/northd/en-global-config.h
+++ b/northd/en-global-config.h
@@ -36,13 +36,19 @@ struct ed_type_global_config {
     const struct nbrec_nb_global *nb_global;
     const struct sbrec_sb_global *sb_global;

-    /* MAC allocated for service monitor usage. Just one mac is allocated
+    /* MAC allocated for service monitor usage. Just one pair is allocated
      * for this purpose and ovn-controller's on each chassis will make use
-     * of this mac when sending out the packets to monitor the services
+     * of this pair when sending out the packets to monitor the services
      * defined in Service_Monitor Southbound table. Since these packets
-     * are locally handled, having just one mac is good enough. */
+     * are locally handled, having just one pair is good enough. */
     char svc_monitor_mac[ETH_ADDR_STRLEN + 1];
     struct eth_addr svc_monitor_mac_ea;
+    char svc_monitor_mac_dst[ETH_ADDR_STRLEN + 1];
+    struct eth_addr svc_monitor_mac_ea_dst;
+
+    /* IP configured for LB and NF service monitor usage. */
+    const char *svc_monitor_ip4;
+    const char *svc_monitor_ip4_dst;

     struct chassis_features features;

diff --git a/northd/en-northd.c b/northd/en-northd.c
index 2549a6995..35ade6518 100644
--- a/northd/en-northd.c
+++ b/northd/en-northd.c
@@ -113,6 +113,10 @@ northd_get_input_data(struct engine_node *node,
     input_data->sb_options = &global_config->sb_options;
     input_data->svc_monitor_mac = global_config->svc_monitor_mac;
     input_data->svc_monitor_mac_ea = global_config->svc_monitor_mac_ea;
+    input_data->svc_monitor_mac_dst = global_config->svc_monitor_mac_dst;
+    input_data->svc_monitor_mac_ea_dst = global_config->svc_monitor_mac_ea_dst;
+    input_data->svc_monitor_ip4 = global_config->svc_monitor_ip4;
+    input_data->svc_monitor_ip4_dst = global_config->svc_monitor_ip4_dst;
     input_data->features = &global_config->features;
 }

diff --git a/northd/en-sync-sb.c b/northd/en-sync-sb.c
index d9dc25eb8..d1030b567 100644
--- a/northd/en-sync-sb.c
+++ b/northd/en-sync-sb.c
@@ -49,7 +49,8 @@ static void sync_addr_sets(struct ovsdb_idl_txn *ovnsb_txn,
                            const struct sbrec_address_set_table *,
                            const struct lr_stateful_table *,
                            const struct ovn_datapaths *,
-                           const char *svc_monitor_macp);
+                           const char *svc_monitor_macp,
+                           const char *svc_monitor_macp_dst);
 static const struct sbrec_address_set *sb_address_set_lookup_by_name(
     struct ovsdb_idl_index *, const char *name);
 static void update_sb_addr_set(struct sorted_array *,
@@ -104,7 +105,8 @@ en_sync_to_sb_addr_set_run(struct engine_node *node, void *data OVS_UNUSED)
                    nb_port_group_table, sb_address_set_table,
                    &lr_stateful_data->table,
                    &northd_data->lr_datapaths,
-                   global_config->svc_monitor_mac);
+                   global_config->svc_monitor_mac,
+                   global_config->svc_monitor_mac_dst);

     engine_set_node_state(node, EN_UPDATED);
 }
@@ -446,7 +448,8 @@ sync_addr_sets(struct ovsdb_idl_txn *ovnsb_txn,
                const struct sbrec_address_set_table *sb_address_set_table,
                const struct lr_stateful_table *lr_statefuls,
                const struct ovn_datapaths *lr_datapaths,
-               const char *svc_monitor_macp)
+               const char *svc_monitor_macp,
+               const char *svc_monitor_macp_dst)
 {
     struct shash sb_address_sets = SHASH_INITIALIZER(&sb_address_sets);

@@ -456,8 +459,11 @@ sync_addr_sets(struct ovsdb_idl_txn *ovnsb_txn,
         shash_add(&sb_address_sets, sb_address_set->name, sb_address_set);
     }

-    /* Service monitor MAC. */
-    struct sorted_array svc = sorted_array_create(&svc_monitor_macp, 1, false);
+    /* Service monitor MACs. */
+    const char *svc_macs[] = {svc_monitor_macp, svc_monitor_macp_dst};
+    size_t n_macs = sizeof(svc_macs) / sizeof(svc_macs[0]);
+    struct sorted_array svc = sorted_array_create(svc_macs, n_macs,
+                                                  false);
     sync_addr_set(ovnsb_txn, "svc_monitor_mac", &svc, &sb_address_sets);
     sorted_array_destroy(&svc);

diff --git a/northd/northd.c b/northd/northd.c
index f0e998f55..9f5abcb56 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -3633,7 +3633,9 @@ get_service_mon(const struct hmap *monitor_map,
 static struct service_monitor_info *
 create_or_get_service_mon(struct ovsdb_idl_txn *ovnsb_txn,
                           struct hmap *monitor_map,
-                          const char *ip, const char *logical_port,
+                          const char *type, const char *ip,
+                          const char *logical_port,
+                          const char *logical_input_port,
                           uint16_t service_port, const char *protocol,
                           const char *chassis_name)
 {
@@ -3657,9 +3659,14 @@ create_or_get_service_mon(struct ovsdb_idl_txn *ovnsb_txn,

     struct sbrec_service_monitor *sbrec_mon =
         sbrec_service_monitor_insert(ovnsb_txn);
+    sbrec_service_monitor_set_type(sbrec_mon, type);
     sbrec_service_monitor_set_ip(sbrec_mon, ip);
     sbrec_service_monitor_set_port(sbrec_mon, service_port);
     sbrec_service_monitor_set_logical_port(sbrec_mon, logical_port);
+    if (logical_input_port) {
+        sbrec_service_monitor_set_logical_input_port(sbrec_mon,
+            logical_input_port);
+    }
     sbrec_service_monitor_set_protocol(sbrec_mon, protocol);
     if (chassis_name) {
         sbrec_service_monitor_set_chassis_name(sbrec_mon, chassis_name);
@@ -3670,6 +3677,99 @@ create_or_get_service_mon(struct ovsdb_idl_txn *ovnsb_txn,
     return mon_info;
 }

+static void
+ovn_nf_svc_create(struct ovsdb_idl_txn *ovnsb_txn,
+                  struct hmap *monitor_map,
+                  struct sset *svc_monitor_lsps,
+                  struct hmap *ls_ports,
+                  const char *mac_src, const char *mac_dst,
+                  const char *ip_src, const char *ip_dst,
+                  const char *logical_port, const char *logical_input_port,
+                  const struct smap *health_check_options)
+{
+    if (!ip_src || !ip_dst || !mac_src || !mac_dst) {
+       static struct vlog_rate_limit rl =
+          VLOG_RATE_LIMIT_INIT(1, 1);
+       VLOG_ERR_RL(&rl, "NetworkFunction: invalid  service monitor src_mac:%s "
+                    "dst_mac:%s src_ip:%s dst_ip:%s\n",
+                     mac_src, mac_dst, ip_src, ip_dst);
+        return;
+    }
+
+    const char *ports[] = {logical_port, logical_input_port};
+    size_t n_ports = sizeof(ports) / sizeof(ports[0]);
+    const char *chassis_name = NULL;
+    bool port_up = true;
+
+    for (int i = 0; i < n_ports; i++) {
+        const char *port = ports[i];
+        sset_add(svc_monitor_lsps, port);
+        struct ovn_port *op = ovn_port_find(ls_ports, port);
+        if (op == NULL) {
+            static struct vlog_rate_limit rl =
+            VLOG_RATE_LIMIT_INIT(1, 1);
+            VLOG_ERR_RL(&rl, "NetworkFunction: skip health check, port:%s "
+                        "not found\n",  port);
+            return;
+        }
+
+        if (op->sb && op->sb->chassis) {
+            if (chassis_name == NULL) {
+                chassis_name = op->sb->chassis->name;
+            } else if (strcmp(chassis_name, op->sb->chassis->name)) {
+                 static struct vlog_rate_limit rl =
+                    VLOG_RATE_LIMIT_INIT(1, 1);
+                 VLOG_ERR_RL(&rl, "NetworkFunction: chassis mismatch "
+                      " chassis:%s port:%s\n", op->sb->chassis->name, port);
+            }
+        }
+        port_up &= (op->sb->n_up && op->sb->up[0]);
+    }
+
+
+    struct service_monitor_info *mon_info =
+        create_or_get_service_mon(ovnsb_txn, monitor_map,
+                                  "network-function", ip_dst,
+                                  logical_port,
+                                  logical_input_port,
+                                  0,
+                                  "icmp",
+                                  chassis_name);
+    ovs_assert(mon_info);
+    sbrec_service_monitor_set_options(
+        mon_info->sbrec_mon, health_check_options);
+
+    if (!mon_info->sbrec_mon->src_mac ||
+        strcmp(mon_info->sbrec_mon->src_mac, mac_src)) {
+        sbrec_service_monitor_set_src_mac(mon_info->sbrec_mon,
+                                          mac_src);
+    }
+
+    if (!mon_info->sbrec_mon->mac ||
+        strcmp(mon_info->sbrec_mon->mac, mac_dst)) {
+        sbrec_service_monitor_set_mac(mon_info->sbrec_mon,
+                                      mac_dst);
+    }
+
+    if (!mon_info->sbrec_mon->src_ip ||
+        strcmp(mon_info->sbrec_mon->src_ip, ip_src)) {
+        sbrec_service_monitor_set_src_ip(mon_info->sbrec_mon, ip_src);
+    }
+
+    if (!mon_info->sbrec_mon->ip ||
+        strcmp(mon_info->sbrec_mon->ip, ip_dst)) {
+        sbrec_service_monitor_set_ip(mon_info->sbrec_mon, ip_dst);
+    }
+
+    if (!port_up
+        && mon_info->sbrec_mon->status
+        && !strcmp(mon_info->sbrec_mon->status, "online")) {
+        sbrec_service_monitor_set_status(mon_info->sbrec_mon,
+                                         "offline");
+    }
+    mon_info->required = true;
+}
+
 static void
 ovn_lb_svc_create(struct ovsdb_idl_txn *ovnsb_txn,
                   const struct ovn_northd_lb *lb,
@@ -3715,8 +3815,10 @@ ovn_lb_svc_create(struct ovsdb_idl_txn *ovnsb_txn,

             struct service_monitor_info *mon_info =
                 create_or_get_service_mon(ovnsb_txn, monitor_map,
+                                          "load-balancer",
                                           backend->ip_str,
                                           backend_nb->logical_port,
+                                          NULL,
                                           backend->port,
                                           protocol,
                                           chassis_name);
@@ -3947,12 +4049,16 @@ build_lb_datapaths(const struct hmap *lbs, const struct hmap *lb_groups,
 }

 static void
-build_lb_svcs(
+build_svcs(
     struct ovsdb_idl_txn *ovnsb_txn,
     const struct sbrec_service_monitor_table *sbrec_service_monitor_table,
     const char *svc_monitor_mac,
     const struct eth_addr *svc_monitor_mac_ea,
+    const char *svc_monitor_mac_dst,
+    const char *svc_monitor_ip4,
+    const char *svc_monitor_ip4_dst,
     struct hmap *ls_ports, struct hmap *lb_dps_map,
+    const struct nbrec_network_function_table *nbrec_network_function_table,
     struct sset *svc_monitor_lsps,
     struct hmap *svc_monitor_map)
 {
@@ -3975,6 +4081,21 @@ build_lb_svcs(
                           svc_monitor_lsps);
     }

+    const struct nbrec_network_function *nbrec_nf;
+    NBREC_NETWORK_FUNCTION_TABLE_FOR_EACH (nbrec_nf,
+                            nbrec_network_function_table) {
+        if (nbrec_nf->health_check) {
+            ovn_nf_svc_create(ovnsb_txn,
+                              svc_monitor_map,
+                              svc_monitor_lsps,
+                              ls_ports,
+                              svc_monitor_mac, svc_monitor_mac_dst,
+                              svc_monitor_ip4, svc_monitor_ip4_dst,
+                              nbrec_nf->outport->name, nbrec_nf->inport->name,
+                              &nbrec_nf->health_check->options);
+        }
+    }
+
     struct service_monitor_info *mon_info;
     HMAP_FOR_EACH_SAFE (mon_info, hmap_node, svc_monitor_map) {
         if (!mon_info->required) {
@@ -4040,18 +4161,9 @@ build_lb_count_dps(struct hmap *lb_dps_map,
  */
 static void
 build_lb_port_related_data(
-    struct ovsdb_idl_txn *ovnsb_txn,
-    const struct sbrec_service_monitor_table *sbrec_service_monitor_table,
-    const char *svc_monitor_mac,
-    const struct eth_addr *svc_monitor_mac_ea,
-    struct ovn_datapaths *lr_datapaths, struct hmap *ls_ports,
-    struct hmap *lb_dps_map, struct hmap *lb_group_dps_map,
-    struct sset *svc_monitor_lsps,
-    struct hmap *svc_monitor_map)
+    struct ovn_datapaths *lr_datapaths,
+    struct hmap *lb_dps_map, struct hmap *lb_group_dps_map)
 {
-    build_lb_svcs(ovnsb_txn, sbrec_service_monitor_table, svc_monitor_mac,
-                  svc_monitor_mac_ea, ls_ports, lb_dps_map,
-                  svc_monitor_lsps, svc_monitor_map);
     build_lswitch_lbs_from_lrouter(lr_datapaths, lb_dps_map, lb_group_dps_map);
 }

@@ -17392,16 +17504,6 @@ build_ls_stateful_flows(const struct ls_stateful_record *ls_stateful_rec,
     build_lb_hairpin(ls_stateful_rec, od, lflows, ls_stateful_rec->lflow_ref);
 }

-static struct nbrec_network_function *
-network_function_get_active(const struct nbrec_network_function_group *nfg)
-{
-    /* Another patch adds the healthmon support. This is temporary. */
-    if (nfg->n_network_function == 0) {
-        return NULL;
-    }
-    return nfg->network_function[0];
-}
-
 /* For packets received on tunnel and egressing towards a network-function port
  * commit the tunnel interface id in CT. This will be utilized when the packet
  * comes out of the other network-function interface of the service VM. The
@@ -17443,6 +17545,101 @@ build_lswitch_stateful_nf(struct ovn_port *op,
                   ds_cstr(match), ds_cstr(actions), op->lflow_ref);
 }

+static struct nbrec_network_function *
+network_function_get_active(const struct nbrec_network_function_group *nfg)
+{
+    return nfg->network_function_active;
+}
+
+static void
+network_function_update_active(const struct nbrec_network_function_group *nfg,
+                            const struct hmap *svc_monitor_map,
+                            const char *svc_monitor_ip4_dst)
+{
+    if (!nfg->n_network_function) {
+        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+        VLOG_ERR_RL(&rl, "NetworkFunction: No network_function found in "
+                         "network_function_group %s", nfg->name);
+        if (nfg->network_function_active) {
+            nbrec_network_function_group_set_network_function_active(nfg,
+                                                                     NULL);
+        }
+        return;
+    }
+    struct nbrec_network_function *nf_active = nfg->network_function[0];
+    struct nbrec_network_function *nf_active_prev = NULL;
+    uint16_t best_score = 0;
+    bool healthy_nf_available = false;
+    if (nfg->network_function_active) {
+        nf_active_prev = nfg->network_function_active;
+    }
+
+    for (int i = 0; i < nfg->n_network_function; i++) {
+        struct nbrec_network_function *nf = nfg->network_function[i];
+        uint16_t curr_score = 0;
+        if (nf->health_check == NULL) {
+            VLOG_DBG("NetworkFunction: Health check is not configured for "
+                     "network_function %s", nf->name);
+            /* Consider network_function as healthy if health_check is
+             * not configured. */
+            curr_score += 3;
+            healthy_nf_available = true;
+        } else {
+            struct service_monitor_info *mon_info =
+            get_service_mon(svc_monitor_map, svc_monitor_ip4_dst,
+                            nf->outport->name, 0, "icmp");
+            if (mon_info == NULL) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+                VLOG_ERR_RL(&rl, "NetworkFunction: Service_monitor is not "
+                            "found for network_function:%s", nf->name);
+            } else if (mon_info->sbrec_mon->status
+                       && !strcmp(mon_info->sbrec_mon->status, "online")) {
+                curr_score += 3;
+                healthy_nf_available = true;
+            }
+        }
+
+        if (nf_active_prev && (nf == nf_active_prev)) {
+            curr_score += 1;
+        }
+
+        if (curr_score > best_score) {
+            nf_active = nf;
+            best_score = curr_score;
+        }
+    }
+
+    if (!healthy_nf_available) {
+        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+        VLOG_WARN_RL(&rl, "NetworkFunction: No healthy network_function found "
+                     "in network_function_group %s, "
+                     "selected network_function %s as active", nfg->name,
+                     nf_active->name);
+    }
+
+    if (nf_active_prev != nf_active) {
+        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+        VLOG_INFO_RL(&rl, "NetworkFunction: Update active network_function %s "
+                     "in network_function_group %s",
+                     nf_active->name, nfg->name);
+        nbrec_network_function_group_set_network_function_active(nfg,
+                                                                 nf_active);
+    }
+}
+
+static void build_network_function_active(
+    const struct nbrec_network_function_group_table *nbrec_nfg_table,
+    struct hmap *svc_monitor_map,
+    const char *svc_monitor_ip4_dst)
+{
+    const struct nbrec_network_function_group *nbrec_nfg;
+    NBREC_NETWORK_FUNCTION_GROUP_TABLE_FOR_EACH (nbrec_nfg,
+                            nbrec_nfg_table) {
+        network_function_update_active(nbrec_nfg, svc_monitor_map,
+                                       svc_monitor_ip4_dst);
+    }
+}
+
 static void
 consider_network_function(
                struct lflow_table *lflows, struct ovn_datapath *od,
@@ -19362,18 +19559,25 @@ ovnnb_db_run(struct northd_input *input_data,
                 input_data->sbrec_ha_chassis_grp_by_name,
                 &data->ls_datapaths.datapaths, &data->lr_datapaths.datapaths,
                 &data->ls_ports, &data->lr_ports);
-    build_lb_port_related_data(ovnsb_txn,
-                               input_data->sbrec_service_monitor_table,
-                               input_data->svc_monitor_mac,
-                               &input_data->svc_monitor_mac_ea,
-                               &data->lr_datapaths, &data->ls_ports,
+    build_lb_port_related_data(&data->lr_datapaths,
                                &data->lb_datapaths_map,
-                               &data->lb_group_datapaths_map,
-                               &data->svc_monitor_lsps,
-                               &data->svc_monitor_map);
+                               &data->lb_group_datapaths_map);
+    build_svcs(ovnsb_txn, input_data->sbrec_service_monitor_table,
+               input_data->svc_monitor_mac,
+               &input_data->svc_monitor_mac_ea,
+               input_data->svc_monitor_mac_dst,
+               input_data->svc_monitor_ip4,
+               input_data->svc_monitor_ip4_dst,
+               &data->ls_ports, &data->lb_datapaths_map,
+               input_data->nbrec_network_function_table,
+               &data->svc_monitor_lsps, &data->svc_monitor_map);
     build_lb_count_dps(&data->lb_datapaths_map,
                        ods_size(&data->ls_datapaths),
                        ods_size(&data->lr_datapaths));
+    build_network_function_active(
+        input_data->nbrec_network_function_group_table,
+        &data->svc_monitor_map,
+        input_data->svc_monitor_ip4_dst);
     build_ipam(&data->ls_datapaths.datapaths, &data->ls_ports);
     build_lrouter_groups(&data->lr_ports, &data->lr_list);
     build_ip_mcast(ovnsb_txn, input_data->sbrec_ip_multicast_table,
diff --git a/northd/northd.h b/northd/northd.h
index 7ec546957..5132e627c 100644
--- a/northd/northd.h
+++ b/northd/northd.h
@@ -65,6 +65,10 @@ struct northd_input {
     const struct smap *sb_options;
     const char *svc_monitor_mac;
     struct eth_addr svc_monitor_mac_ea;
+    const char *svc_monitor_mac_dst;
+    struct eth_addr svc_monitor_mac_ea_dst;
+    const char *svc_monitor_ip4;
+    const char *svc_monitor_ip4_dst;
     const struct chassis_features *features;

     /* ACL ID inputs. */
@@ -236,8 +240,8 @@ struct lflow_input {
     const struct hmap *lb_datapaths_map;
     const struct sset *bfd_ports;
     const struct chassis_features *features;
-    const struct hmap *svc_monitor_map;
     bool ovn_internal_version_changed;
+    const struct hmap *svc_monitor_map;
     const char *svc_monitor_mac;
     const struct sampling_app_table *sampling_apps;
     struct group_ecmp_route_data *route_data;
diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
index c6058f42c..db7e71d17 100644
--- a/ovn-sb.ovsschema
+++ b/ovn-sb.ovsschema
@@ -1,7 +1,7 @@
 {
     "name": "OVN_Southbound",
-    "version": "20.41.0",
-    "cksum": "2343742948 34719",
+    "version": "20.42.0",
+    "cksum": "3932698824 35046",
     "tables": {
         "SB_Global": {
             "columns": {
@@ -503,14 +503,20 @@
             "isRoot": true},
         "Service_Monitor": {
             "columns": {
+                "type": {"type": {"key": {
+                           "type": "string",
+                           "enum": ["set", ["load-balancer",
+                                            "network-function"]]}}},
                 "ip": {"type": "string"},
+                "mac": {"type": "string"},
                 "protocol": {
                     "type": {"key": {"type": "string",
-                             "enum": ["set", ["tcp", "udp"]]},
+                             "enum": ["set", ["tcp", "udp", "icmp"]]},
                              "min": 0, "max": 1}},
                 "port": {"type": {"key": {"type": "integer",
                                           "minInteger": 0,
                                           "maxInteger": 65535}}},
+                "logical_input_port": {"type": "string"},
                 "logical_port": {"type": "string"},
                 "src_mac": {"type": "string"},
                 "src_ip": {"type": "string"},
diff --git a/ovn-sb.xml b/ovn-sb.xml
index 39acb81a4..77dd824be 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -4944,10 +4944,20 @@ tcp.flags = RST;
         service monitor.
       </p>

+      <column name="type">
+        The type of the service. Supported values are "load-balancer" and
+        "network-function".
+      </column>
+
       <column name="ip">
+        Destination IP used in monitor packets. For load-balancer this is the
         IP of the service to be monitored. Only IPv4 is supported.
       </column>

+      <column name="mac">
+        Destination MAC address used in monitor packets for network-function.
+      </column>
+
       <column name="protocol">
         The protocol of the service.
       </column>
@@ -4956,10 +4966,20 @@ tcp.flags = RST;
         The TCP or UDP port of the service.
       </column>

+      <column name="logical_input_port">
+        This is applicable only for network-function type. The VIF of the
+        logical port on which monitor packets have to be sent. The
+        <code>ovn-controller</code> that binds this <code>logical_port</code>
+        monitors the service by sending periodic monitor packets.
+      </column>
+
       <column name="logical_port">
         The VIF of the logical port on which the service is running. The
         <code>ovn-controller</code> that binds this <code>logical_port</code>
-        monitors the service by sending periodic monitor packets.
+        monitors the service by sending periodic monitor packets. For
+        load-balancer this is the port to which monitor packets are sent and
+        from which response packets are received. For network-function this
+        is the port from which the forwarded monitor packets are received.
       </column>

       <column name="src_mac">
--
2.39.3
diff mbox series

Patch

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index 47c4bf78b..1d7289b85 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -7480,8 +7480,17 @@  enum svc_monitor_status {
 enum svc_monitor_protocol {
     SVC_MON_PROTO_TCP,
     SVC_MON_PROTO_UDP,
+    SVC_MON_PROTO_ICMP,
 };
 
+enum svc_monitor_type {
+    /* load balancer */
+    SVC_MON_TYPE_LB,
+    /* network function */
+    SVC_MON_TYPE_NF,
+};
+
+
 /* Service monitor health checks. */
 struct svc_monitor {
     struct hmap_node hmap_node;
@@ -7494,6 +7503,7 @@  struct svc_monitor {
     /* key */
     struct in6_addr ip;
     uint32_t dp_key;
+    uint32_t input_port_key;
     uint32_t port_key;
     uint32_t proto_port; /* tcp/udp port */
 
@@ -7526,6 +7536,7 @@  struct svc_monitor {
     int n_failures;
 
     enum svc_monitor_protocol protocol;
+    enum svc_monitor_type type;
     enum svc_monitor_state state;
     enum svc_monitor_status status;
     struct dp_packet pkt;
@@ -7533,6 +7544,9 @@  struct svc_monitor {
     uint32_t seq_no;
     ovs_be16 tp_src;
 
+    ovs_be16 icmp_id;
+    ovs_be16 icmp_seq_no;
+
     bool delete;
 };
 
@@ -7598,9 +7612,28 @@  sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn,
 
     const struct sbrec_service_monitor *sb_svc_mon;
     SBREC_SERVICE_MONITOR_TABLE_FOR_EACH (sb_svc_mon, svc_mon_table) {
+        enum svc_monitor_type mon_type;
+        if (sb_svc_mon->type
+            && !strcmp(sb_svc_mon->type, "network-function")) {
+            mon_type = SVC_MON_TYPE_NF;
+        } else {
+            mon_type = SVC_MON_TYPE_LB;
+        }
+
+        enum svc_monitor_protocol protocol;
+        if (!strcmp(sb_svc_mon->protocol, "udp")) {
+            protocol = SVC_MON_PROTO_UDP;
+        } else if (!strcmp(sb_svc_mon->protocol, "icmp")) {
+            protocol = SVC_MON_PROTO_ICMP;
+        } else {
+            protocol = SVC_MON_PROTO_TCP;
+        }
+
         const struct sbrec_port_binding *pb
             = lport_lookup_by_name(sbrec_port_binding_by_name,
                                    sb_svc_mon->logical_port);
+        const struct sbrec_port_binding *input_pb = NULL;
+
         if (!pb) {
             continue;
         }
@@ -7620,39 +7653,65 @@  sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn,
 
         struct eth_addr ea;
         bool mac_found = false;
-        for (size_t i = 0; i < pb->n_mac && !mac_found; i++) {
-            struct lport_addresses laddrs;
 
-            if (!extract_lsp_addresses(pb->mac[i], &laddrs)) {
+        if (mon_type == SVC_MON_TYPE_NF) {
+            if (protocol != SVC_MON_PROTO_ICMP) {
+                continue;
+            }
+            input_pb = lport_lookup_by_name(sbrec_port_binding_by_name,
+                                            sb_svc_mon->logical_input_port);
+            if (!input_pb) {
+                continue;
+            }
+            if (input_pb->chassis != our_chassis) {
+                continue;
+            }
+            if (strcmp(sb_svc_mon->mac, "")) {
+                if (eth_addr_from_string(sb_svc_mon->mac, &ea)) {
+                    mac_found = true;
+                }
+            }
+        } else {
+            if (protocol != SVC_MON_PROTO_TCP &&
+                protocol != SVC_MON_PROTO_UDP) {
                 continue;
             }
 
-            if (is_ipv4) {
-                for (size_t j = 0; j < laddrs.n_ipv4_addrs; j++) {
-                    if (ip4 == laddrs.ipv4_addrs[j].addr) {
-                        ea = laddrs.ea;
-                        mac_found = true;
-                        break;
-                    }
+            for (size_t i = 0; i < pb->n_mac && !mac_found; i++) {
+                struct lport_addresses laddrs;
+
+                if (!extract_lsp_addresses(pb->mac[i], &laddrs)) {
+                    continue;
                 }
-            } else {
-                for (size_t j = 0; j < laddrs.n_ipv6_addrs; j++) {
-                    if (IN6_ARE_ADDR_EQUAL(&ip_addr,
-                                           &laddrs.ipv6_addrs[j].addr)) {
-                        ea = laddrs.ea;
-                        mac_found = true;
-                        break;
+
+                if (is_ipv4) {
+                    for (size_t j = 0; j < laddrs.n_ipv4_addrs; j++) {
+                        if (ip4 == laddrs.ipv4_addrs[j].addr) {
+                            ea = laddrs.ea;
+                            mac_found = true;
+                            break;
+                        }
+                    }
+                } else {
+                    for (size_t j = 0; j < laddrs.n_ipv6_addrs; j++) {
+                        if (IN6_ARE_ADDR_EQUAL(&ip_addr,
+                                               &laddrs.ipv6_addrs[j].addr)) {
+                            ea = laddrs.ea;
+                            mac_found = true;
+                            break;
+                        }
                     }
                 }
-            }
 
-            if (!mac_found && !laddrs.n_ipv4_addrs && !laddrs.n_ipv6_addrs) {
-                /* IP address(es) are not configured. Use the first mac. */
-                ea = laddrs.ea;
-                mac_found = true;
-            }
+                if (!mac_found && !laddrs.n_ipv4_addrs &&
+                    !laddrs.n_ipv6_addrs) {
+                    /* IP address(es) are not configured. Use the first mac. */
+                    ea = laddrs.ea;
+                    mac_found = true;
+                }
 
-            destroy_lport_addresses(&laddrs);
+                destroy_lport_addresses(&laddrs);
+            }
         }
 
         if (!mac_found) {
@@ -7661,23 +7720,18 @@  sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn,
 
         uint32_t dp_key = pb->datapath->tunnel_key;
         uint32_t port_key = pb->tunnel_key;
+        uint32_t input_port_key = input_pb ? input_pb->tunnel_key : UINT32_MAX;
         uint32_t hash =
             hash_bytes(&ip_addr, sizeof ip_addr,
                        hash_3words(dp_key, port_key, sb_svc_mon->port));
 
-        enum svc_monitor_protocol protocol;
-        if (!sb_svc_mon->protocol || strcmp(sb_svc_mon->protocol, "udp")) {
-            protocol = SVC_MON_PROTO_TCP;
-        } else {
-            protocol = SVC_MON_PROTO_UDP;
-        }
-
         svc_mon = pinctrl_find_svc_monitor(dp_key, port_key, &ip_addr,
                                            sb_svc_mon->port, protocol, hash);
 
         if (!svc_mon) {
             svc_mon = xmalloc(sizeof *svc_mon);
             svc_mon->dp_key = dp_key;
+            svc_mon->input_port_key = input_port_key;
             svc_mon->port_key = port_key;
             svc_mon->proto_port = sb_svc_mon->port;
             svc_mon->ip = ip_addr;
@@ -7685,6 +7739,7 @@  sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn,
             svc_mon->state = SVC_MON_S_INIT;
             svc_mon->status = SVC_MON_ST_UNKNOWN;
             svc_mon->protocol = protocol;
+            svc_mon->type = mon_type;
 
             smap_init(&svc_mon->options);
             svc_mon->interval =
@@ -8578,11 +8633,67 @@  svc_monitor_send_udp_health_check(struct rconn *swconn,
     ofpbuf_uninit(&ofpacts);
 }
 
+
+static void
+svc_monitor_send_icmp_health_check__(struct rconn *swconn,
+                                     struct svc_monitor *svc_mon)
+{
+    uint64_t packet_stub[128 / 8];
+    struct dp_packet packet;
+    dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub);
+
+    struct eth_addr eth_src;
+    eth_addr_from_string(svc_mon->sb_svc_mon->src_mac, &eth_src);
+
+    ovs_be32 ip4_src;
+    ip_parse(svc_mon->sb_svc_mon->src_ip, &ip4_src);
+    pinctrl_compose_ipv4(&packet, eth_src, svc_mon->ea, ip4_src,
+                         in6_addr_get_mapped_ipv4(&svc_mon->ip),
+                         IPPROTO_ICMP, 255, ICMP_HEADER_LEN);
+
+    struct icmp_header *ih = dp_packet_l4(&packet);
+    ih->icmp_fields.echo.id = svc_mon->icmp_id;
+    ih->icmp_fields.echo.seq = svc_mon->icmp_seq_no;
+
+    uint8_t icmp_code = 0;
+    packet_set_icmp(&packet, ICMP4_ECHO_REQUEST, icmp_code);
+
+    ih->icmp_csum = 0;
+    ih->icmp_csum = csum(ih, sizeof *ih);
+
+    uint64_t ofpacts_stub[4096 / 8];
+    struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(ofpacts_stub);
+    enum ofp_version version = rconn_get_version(swconn);
+    put_load(svc_mon->dp_key, MFF_LOG_DATAPATH, 0, 64, &ofpacts);
+    put_load(svc_mon->input_port_key, MFF_LOG_OUTPORT, 0, 32, &ofpacts);
+    put_load(1, MFF_LOG_FLAGS, MLF_LOCAL_ONLY, 1, &ofpacts);
+    struct ofpact_resubmit *resubmit = ofpact_put_RESUBMIT(&ofpacts);
+    resubmit->in_port = OFPP_CONTROLLER;
+    resubmit->table_id = OFTABLE_LOCAL_OUTPUT;
+
+    struct ofputil_packet_out po = {
+        .packet = dp_packet_data(&packet),
+        .packet_len = dp_packet_size(&packet),
+        .buffer_id = UINT32_MAX,
+        .ofpacts = ofpacts.data,
+        .ofpacts_len = ofpacts.size,
+    };
+    match_set_in_port(&po.flow_metadata, OFPP_CONTROLLER);
+    enum ofputil_protocol proto = ofputil_protocol_from_ofp_version(version);
+    queue_msg(swconn, ofputil_encode_packet_out(&po, proto));
+    dp_packet_uninit(&packet);
+    ofpbuf_uninit(&ofpacts);
+}
+
 static void
 svc_monitor_send_health_check(struct rconn *swconn,
                               struct svc_monitor *svc_mon)
 {
-    if (svc_mon->protocol == SVC_MON_PROTO_TCP) {
+    if (svc_mon->protocol == SVC_MON_PROTO_ICMP) {
+        svc_mon->icmp_id = (OVS_FORCE ovs_be16) random_uint16();
+        svc_mon->icmp_seq_no = (OVS_FORCE ovs_be16) random_uint16();
+        svc_monitor_send_icmp_health_check__(swconn, svc_mon);
+    } else if (svc_mon->protocol == SVC_MON_PROTO_TCP) {
         svc_mon->seq_no = random_uint32();
         svc_mon->tp_src = htons(get_random_src_port());
         svc_monitor_send_tcp_health_check__(swconn, svc_mon,
@@ -8623,13 +8734,14 @@  svc_monitors_run(struct rconn *swconn,
 
         case SVC_MON_S_WAITING:
             if (current_time > svc_mon->wait_time) {
-                if (svc_mon->protocol ==  SVC_MON_PROTO_TCP) {
-                    svc_mon->n_failures++;
-                    svc_mon->state = SVC_MON_S_OFFLINE;
-                } else {
+                if (svc_mon->protocol ==  SVC_MON_PROTO_UDP) {
                     svc_mon->n_success++;
                     svc_mon->state = SVC_MON_S_ONLINE;
+                } else {
+                    svc_mon->n_failures++;
+                    svc_mon->state = SVC_MON_S_OFFLINE;
                 }
+
                 svc_mon->next_send_time = current_time + svc_mon->interval;
                 next_run_time = svc_mon->next_send_time;
             } else {
@@ -8690,6 +8802,27 @@  svc_monitors_wait(long long int svc_monitors_next_run_time)
     }
 }
 
+
+static void
+pinctrl_handle_icmp_svc_check(struct dp_packet *pkt_in,
+                              struct svc_monitor *svc_mon)
+{
+    struct icmp_header *ih = dp_packet_l4(pkt_in);
+
+    if (!ih) {
+        return;
+    }
+
+    if ((ih->icmp_fields.echo.id != svc_mon->icmp_id) ||
+        (ih->icmp_fields.echo.seq != svc_mon->icmp_seq_no)) {
+        return;
+    }
+
+    svc_mon->n_success++;
+    svc_mon->state = SVC_MON_S_ONLINE;
+    svc_mon->next_send_time = time_msec() + svc_mon->interval;
+}
+
 static bool
 pinctrl_handle_tcp_svc_check(struct rconn *swconn,
                              struct dp_packet *pkt_in,
@@ -8746,6 +8879,7 @@  pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow,
     uint32_t dp_key = ntohll(md->flow.metadata);
     uint32_t port_key = md->flow.regs[MFF_LOG_INPORT - MFF_REG0];
     struct in6_addr ip_addr;
+    struct in6_addr dst_ip_addr;
     struct eth_header *in_eth = dp_packet_data(pkt_in);
     uint8_t ip_proto;
 
@@ -8761,10 +8895,12 @@  pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow,
         }
 
         ip_addr = in6_addr_mapped_ipv4(ip_flow->nw_src);
+        dst_ip_addr = in6_addr_mapped_ipv4(ip_flow->nw_dst);
         ip_proto = in_ip->ip_proto;
     } else {
         struct ovs_16aligned_ip6_hdr *in_ip = dp_packet_l3(pkt_in);
         ip_addr = ip_flow->ipv6_src;
+        dst_ip_addr = ip_flow->ipv6_dst;
         ip_proto = in_ip->ip6_nxt;
     }
 
@@ -8777,7 +8913,6 @@  pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow,
         return;
     }
 
-
     if (ip_proto == IPPROTO_TCP) {
         uint32_t hash =
             hash_bytes(&ip_addr, sizeof ip_addr,
@@ -8806,17 +8941,36 @@  pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow,
             return;
         }
 
-        const void *in_ip = dp_packet_get_icmp_payload(pkt_in);
-        if (!in_ip) {
-            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
-            VLOG_WARN_RL(&rl, "Original IP datagram not present in "
-                         "ICMP packet");
-            return;
-        }
-
         if (in_eth->eth_type == htons(ETH_TYPE_IP)) {
             struct icmp_header *ih = l4h;
             /* It's ICMP packet. */
+            if (ih->icmp_type == ICMP4_ECHO_REQUEST && ih->icmp_code == 0) {
+                uint32_t hash = hash_bytes(&dst_ip_addr, sizeof dst_ip_addr,
+                                           hash_3words(dp_key, port_key, 0));
+                struct svc_monitor *svc_mon =
+                    pinctrl_find_svc_monitor(dp_key, port_key, &dst_ip_addr, 0,
+                                             SVC_MON_PROTO_ICMP, hash);
+                if (!svc_mon) {
+                    static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(
+                        1, 5);
+                    VLOG_WARN_RL(&rl, "handle service check: Service monitor "
+                                 "not found for ICMP request");
+                    return;
+                }
+                if (svc_mon->type == SVC_MON_TYPE_NF) {
+                    pinctrl_handle_icmp_svc_check(pkt_in, svc_mon);
+                }
+                return;
+            }
+
+            const void *in_ip = dp_packet_get_icmp_payload(pkt_in);
+            if (!in_ip) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+                VLOG_WARN_RL(&rl, "Original IP datagram not present in "
+                             "ICMP packet");
+                return;
+            }
+
             if (ih->icmp_type != ICMP4_DST_UNREACH || ih->icmp_code != 3) {
                 return;
             }
@@ -8838,6 +8992,14 @@  pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow,
                 return;
             }
         } else {
+            const void *in_ip = dp_packet_get_icmp_payload(pkt_in);
+            if (!in_ip) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+                VLOG_WARN_RL(&rl, "Original IP datagram not present in "
+                             "ICMP packet");
+                return;
+            }
+
             struct icmp6_header *ih6 = l4h;
             if (ih6->icmp6_type != 1 || ih6->icmp6_code != 4) {
                 return;
diff --git a/northd/en-global-config.c b/northd/en-global-config.c
index c103b137f..2a89f2c4d 100644
--- a/northd/en-global-config.c
+++ b/northd/en-global-config.c
@@ -20,6 +20,7 @@ 
 
 /* OVS includes */
 #include "openvswitch/vlog.h"
+#include "socket-util.h"
 
 /* OVN includes */
 #include "debug.h"
@@ -61,6 +62,35 @@  en_global_config_init(struct engine_node *node OVS_UNUSED,
     return data;
 }
 
+static void
+update_svc_monitor_addr(const char *new_ip4, const char **old_ip4_pptr)
+{
+    if (new_ip4) {
+        struct sockaddr_storage svc_mon_addr;
+        if (inet_parse_address(new_ip4, &svc_mon_addr)) {
+            struct ds ip_s = DS_EMPTY_INITIALIZER;
+            ss_format_address_nobracks(&svc_mon_addr, &ip_s);
+            if ((*old_ip4_pptr == NULL)
+                || strcmp(*old_ip4_pptr, ds_steal_cstr(&ip_s))) {
+                if (*old_ip4_pptr) {
+                    free(CONST_CAST(void *, *old_ip4_pptr));
+                }
+                *old_ip4_pptr = ds_steal_cstr(&ip_s);
+            }
+        } else {
+            if (*old_ip4_pptr) {
+                free(CONST_CAST(void *, *old_ip4_pptr));
+                *old_ip4_pptr = NULL;
+            }
+        }
+    } else {
+        if (*old_ip4_pptr) {
+            free(CONST_CAST(void *, *old_ip4_pptr));
+            *old_ip4_pptr = NULL;
+        }
+    }
+}
+
 void
 en_global_config_run(struct engine_node *node , void *data)
 {
@@ -108,6 +138,27 @@  en_global_config_run(struct engine_node *node , void *data)
         }
     }
 
+    const char *dst_monitor_mac = smap_get(&nb->options,
+                                           "svc_monitor_mac_dst");
+    if (dst_monitor_mac) {
+        if (eth_addr_from_string(dst_monitor_mac,
+                                 &config_data->svc_monitor_mac_ea_dst)) {
+            snprintf(config_data->svc_monitor_mac_dst,
+                     sizeof config_data->svc_monitor_mac_dst,
+                     ETH_ADDR_FMT,
+                     ETH_ADDR_ARGS(config_data->svc_monitor_mac_ea_dst));
+        } else {
+            dst_monitor_mac = NULL;
+        }
+    }
+
+    const char *monitor_ip4 = smap_get(&nb->options, "svc_monitor_ip4");
+    update_svc_monitor_addr(monitor_ip4, &config_data->svc_monitor_ip4);
+    const char *monitor_ip4_dst = smap_get(&nb->options,
+                                           "svc_monitor_ip4_dst");
+    update_svc_monitor_addr(monitor_ip4_dst,
+                            &config_data->svc_monitor_ip4_dst);
+
     struct smap *options = &config_data->nb_options;
     smap_destroy(options);
     smap_clone(options, &nb->options);
@@ -123,6 +174,15 @@  en_global_config_run(struct engine_node *node , void *data)
                      config_data->svc_monitor_mac);
     }
 
+    if (!dst_monitor_mac) {
+        eth_addr_random(&config_data->svc_monitor_mac_ea_dst);
+        snprintf(config_data->svc_monitor_mac_dst,
+                 sizeof config_data->svc_monitor_mac_dst, ETH_ADDR_FMT,
+                 ETH_ADDR_ARGS(config_data->svc_monitor_mac_ea_dst));
+        smap_replace(options, "svc_monitor_mac_dst",
+                     config_data->svc_monitor_mac_dst);
+    }
+
     bool ic_vxlan_mode = false;
     const struct nbrec_logical_switch *nbs;
     NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH (nbs, nbrec_ls_table) {
@@ -245,6 +305,21 @@  global_config_nb_global_handler(struct engine_node *node, void *data)
         return false;
     }
 
+    if (config_out_of_sync(&nb->options, &config_data->nb_options,
+                           "svc_monitor_mac_dst", true)) {
+        return false;
+    }
+
+    if (config_out_of_sync(&nb->options, &config_data->nb_options,
+                           "svc_monitor_ip4", false)) {
+        return false;
+    }
+
+    if (config_out_of_sync(&nb->options, &config_data->nb_options,
+                           "svc_monitor_ip4_dst", false)) {
+        return false;
+    }
+
     /* Check if max_tunid has changed or not. */
     if (config_out_of_sync(&nb->options, &config_data->nb_options,
                            "max_tunid", true)) {
diff --git a/northd/en-global-config.h b/northd/en-global-config.h
index de88db18b..2240c9cd9 100644
--- a/northd/en-global-config.h
+++ b/northd/en-global-config.h
@@ -36,13 +36,19 @@  struct ed_type_global_config {
     const struct nbrec_nb_global *nb_global;
     const struct sbrec_sb_global *sb_global;
 
-    /* MAC allocated for service monitor usage. Just one mac is allocated
+    /* MAC allocated for service monitor usage. Just one pair is allocated
      * for this purpose and ovn-controller's on each chassis will make use
-     * of this mac when sending out the packets to monitor the services
+     * of this pair when sending out the packets to monitor the services
      * defined in Service_Monitor Southbound table. Since these packets
-     * are locally handled, having just one mac is good enough. */
+     * are locally handled, having just one pair is good enough. */
     char svc_monitor_mac[ETH_ADDR_STRLEN + 1];
     struct eth_addr svc_monitor_mac_ea;
+    char svc_monitor_mac_dst[ETH_ADDR_STRLEN + 1];
+    struct eth_addr svc_monitor_mac_ea_dst;
+
+    /* IP configured for LB and NF service monitor usage. */
+    const char *svc_monitor_ip4;
+    const char *svc_monitor_ip4_dst;
 
     struct chassis_features features;
 
diff --git a/northd/en-northd.c b/northd/en-northd.c
index 2549a6995..35ade6518 100644
--- a/northd/en-northd.c
+++ b/northd/en-northd.c
@@ -113,6 +113,10 @@  northd_get_input_data(struct engine_node *node,
     input_data->sb_options = &global_config->sb_options;
     input_data->svc_monitor_mac = global_config->svc_monitor_mac;
     input_data->svc_monitor_mac_ea = global_config->svc_monitor_mac_ea;
+    input_data->svc_monitor_mac_dst = global_config->svc_monitor_mac_dst;
+    input_data->svc_monitor_mac_ea_dst = global_config->svc_monitor_mac_ea_dst;
+    input_data->svc_monitor_ip4 = global_config->svc_monitor_ip4;
+    input_data->svc_monitor_ip4_dst = global_config->svc_monitor_ip4_dst;
     input_data->features = &global_config->features;
 }
 
diff --git a/northd/en-sync-sb.c b/northd/en-sync-sb.c
index d9dc25eb8..d1030b567 100644
--- a/northd/en-sync-sb.c
+++ b/northd/en-sync-sb.c
@@ -49,7 +49,8 @@  static void sync_addr_sets(struct ovsdb_idl_txn *ovnsb_txn,
                            const struct sbrec_address_set_table *,
                            const struct lr_stateful_table *,
                            const struct ovn_datapaths *,
-                           const char *svc_monitor_macp);
+                           const char *svc_monitor_macp,
+                           const char *svc_monitor_macp_dst);
 static const struct sbrec_address_set *sb_address_set_lookup_by_name(
     struct ovsdb_idl_index *, const char *name);
 static void update_sb_addr_set(struct sorted_array *,
@@ -104,7 +105,8 @@  en_sync_to_sb_addr_set_run(struct engine_node *node, void *data OVS_UNUSED)
                    nb_port_group_table, sb_address_set_table,
                    &lr_stateful_data->table,
                    &northd_data->lr_datapaths,
-                   global_config->svc_monitor_mac);
+                   global_config->svc_monitor_mac,
+                   global_config->svc_monitor_mac_dst);
 
     engine_set_node_state(node, EN_UPDATED);
 }
@@ -446,7 +448,8 @@  sync_addr_sets(struct ovsdb_idl_txn *ovnsb_txn,
                const struct sbrec_address_set_table *sb_address_set_table,
                const struct lr_stateful_table *lr_statefuls,
                const struct ovn_datapaths *lr_datapaths,
-               const char *svc_monitor_macp)
+               const char *svc_monitor_macp,
+               const char *svc_monitor_macp_dst)
 {
     struct shash sb_address_sets = SHASH_INITIALIZER(&sb_address_sets);
 
@@ -456,8 +459,11 @@  sync_addr_sets(struct ovsdb_idl_txn *ovnsb_txn,
         shash_add(&sb_address_sets, sb_address_set->name, sb_address_set);
     }
 
-    /* Service monitor MAC. */
-    struct sorted_array svc = sorted_array_create(&svc_monitor_macp, 1, false);
+    /* Service monitor MACs. */
+    const char *svc_macs[] = {svc_monitor_macp, svc_monitor_macp_dst};
+    size_t n_macs = sizeof(svc_macs) / sizeof(svc_macs[0]);
+    struct sorted_array svc = sorted_array_create(svc_macs, n_macs,
+                                                  false);
     sync_addr_set(ovnsb_txn, "svc_monitor_mac", &svc, &sb_address_sets);
     sorted_array_destroy(&svc);
 
diff --git a/northd/northd.c b/northd/northd.c
index f0e998f55..9f5abcb56 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -3633,7 +3633,9 @@  get_service_mon(const struct hmap *monitor_map,
 static struct service_monitor_info *
 create_or_get_service_mon(struct ovsdb_idl_txn *ovnsb_txn,
                           struct hmap *monitor_map,
-                          const char *ip, const char *logical_port,
+                          const char *type, const char *ip,
+                          const char *logical_port,
+                          const char *logical_input_port,
                           uint16_t service_port, const char *protocol,
                           const char *chassis_name)
 {
@@ -3657,9 +3659,14 @@  create_or_get_service_mon(struct ovsdb_idl_txn *ovnsb_txn,
 
     struct sbrec_service_monitor *sbrec_mon =
         sbrec_service_monitor_insert(ovnsb_txn);
+    sbrec_service_monitor_set_type(sbrec_mon, type);
     sbrec_service_monitor_set_ip(sbrec_mon, ip);
     sbrec_service_monitor_set_port(sbrec_mon, service_port);
     sbrec_service_monitor_set_logical_port(sbrec_mon, logical_port);
+    if (logical_input_port) {
+        sbrec_service_monitor_set_logical_input_port(sbrec_mon,
+            logical_input_port);
+    }
     sbrec_service_monitor_set_protocol(sbrec_mon, protocol);
     if (chassis_name) {
         sbrec_service_monitor_set_chassis_name(sbrec_mon, chassis_name);
@@ -3670,6 +3677,99 @@  create_or_get_service_mon(struct ovsdb_idl_txn *ovnsb_txn,
     return mon_info;
 }
 
+static void
+ovn_nf_svc_create(struct ovsdb_idl_txn *ovnsb_txn,
+                  struct hmap *monitor_map,
+                  struct sset *svc_monitor_lsps,
+                  struct hmap *ls_ports,
+                  const char *mac_src, const char *mac_dst,
+                  const char *ip_src, const char *ip_dst,
+                  const char *logical_port, const char *logical_input_port,
+                  const struct smap *health_check_options)
+{
+    if (!ip_src || !ip_dst || !mac_src || !mac_dst) {
+       static struct vlog_rate_limit rl =
+          VLOG_RATE_LIMIT_INIT(1, 1);
+       VLOG_ERR_RL(&rl, "NetworkFunction: invalid  service monitor src_mac:%s "
+                    "dst_mac:%s src_ip:%s dst_ip:%s\n",
+                     mac_src, mac_dst, ip_src, ip_dst);
+        return;
+    }
+
+    const char *ports[] = {logical_port, logical_input_port};
+    size_t n_ports = sizeof(ports) / sizeof(ports[0]);
+    const char *chassis_name = NULL;
+    bool port_up = true;
+
+    for (int i = 0; i < n_ports; i++) {
+        const char *port = ports[i];
+        sset_add(svc_monitor_lsps, port);
+        struct ovn_port *op = ovn_port_find(ls_ports, port);
+        if (op == NULL) {
+            static struct vlog_rate_limit rl =
+            VLOG_RATE_LIMIT_INIT(1, 1);
+            VLOG_ERR_RL(&rl, "NetworkFunction: skip health check, port:%s "
+                        "not found\n",  port);
+            return;
+        }
+
+        if (op->sb && op->sb->chassis) {
+            if (chassis_name == NULL) {
+                chassis_name = op->sb->chassis->name;
+            } else if (strcmp(chassis_name, op->sb->chassis->name)) {
+                 static struct vlog_rate_limit rl =
+                    VLOG_RATE_LIMIT_INIT(1, 1);
+                 VLOG_ERR_RL(&rl, "NetworkFunction: chassis mismatch "
+                      " chassis:%s port:%s\n", op->sb->chassis->name, port);
+            }
+        }
+        port_up &= (op->sb->n_up && op->sb->up[0]);
+    }
+
+
+    struct service_monitor_info *mon_info =
+        create_or_get_service_mon(ovnsb_txn, monitor_map,
+                                  "network-function", ip_dst,
+                                  logical_port,
+                                  logical_input_port,
+                                  0,
+                                  "icmp",
+                                  chassis_name);
+    ovs_assert(mon_info);
+    sbrec_service_monitor_set_options(
+        mon_info->sbrec_mon, health_check_options);
+
+    if (!mon_info->sbrec_mon->src_mac ||
+        strcmp(mon_info->sbrec_mon->src_mac, mac_src)) {
+        sbrec_service_monitor_set_src_mac(mon_info->sbrec_mon,
+                                          mac_src);
+    }
+
+    if (!mon_info->sbrec_mon->mac ||
+        strcmp(mon_info->sbrec_mon->mac, mac_dst)) {
+        sbrec_service_monitor_set_mac(mon_info->sbrec_mon,
+                                      mac_dst);
+    }
+
+    if (!mon_info->sbrec_mon->src_ip ||
+        strcmp(mon_info->sbrec_mon->src_ip, ip_src)) {
+        sbrec_service_monitor_set_src_ip(mon_info->sbrec_mon, ip_src);
+    }
+
+    if (!mon_info->sbrec_mon->ip ||
+        strcmp(mon_info->sbrec_mon->ip, ip_dst)) {
+        sbrec_service_monitor_set_ip(mon_info->sbrec_mon, ip_dst);
+    }
+
+    if (!port_up
+        && mon_info->sbrec_mon->status
+        && !strcmp(mon_info->sbrec_mon->status, "online")) {
+        sbrec_service_monitor_set_status(mon_info->sbrec_mon,
+                                         "offline");
+    }
+    mon_info->required = true;
+}
+
 static void
 ovn_lb_svc_create(struct ovsdb_idl_txn *ovnsb_txn,
                   const struct ovn_northd_lb *lb,
@@ -3715,8 +3815,10 @@  ovn_lb_svc_create(struct ovsdb_idl_txn *ovnsb_txn,
 
             struct service_monitor_info *mon_info =
                 create_or_get_service_mon(ovnsb_txn, monitor_map,
+                                          "load-balancer",
                                           backend->ip_str,
                                           backend_nb->logical_port,
+                                          NULL,
                                           backend->port,
                                           protocol,
                                           chassis_name);
@@ -3947,12 +4049,16 @@  build_lb_datapaths(const struct hmap *lbs, const struct hmap *lb_groups,
 }
 
 static void
-build_lb_svcs(
+build_svcs(
     struct ovsdb_idl_txn *ovnsb_txn,
     const struct sbrec_service_monitor_table *sbrec_service_monitor_table,
     const char *svc_monitor_mac,
     const struct eth_addr *svc_monitor_mac_ea,
+    const char *svc_monitor_mac_dst,
+    const char *svc_monitor_ip4,
+    const char *svc_monitor_ip4_dst,
     struct hmap *ls_ports, struct hmap *lb_dps_map,
+    const struct nbrec_network_function_table *nbrec_network_function_table,
     struct sset *svc_monitor_lsps,
     struct hmap *svc_monitor_map)
 {
@@ -3975,6 +4081,21 @@  build_lb_svcs(
                           svc_monitor_lsps);
     }
 
+    const struct nbrec_network_function *nbrec_nf;
+    NBREC_NETWORK_FUNCTION_TABLE_FOR_EACH (nbrec_nf,
+                            nbrec_network_function_table) {
+        if (nbrec_nf->health_check) {
+            ovn_nf_svc_create(ovnsb_txn,
+                              svc_monitor_map,
+                              svc_monitor_lsps,
+                              ls_ports,
+                              svc_monitor_mac, svc_monitor_mac_dst,
+                              svc_monitor_ip4, svc_monitor_ip4_dst,
+                              nbrec_nf->outport->name, nbrec_nf->inport->name,
+                              &nbrec_nf->health_check->options);
+        }
+    }
+
     struct service_monitor_info *mon_info;
     HMAP_FOR_EACH_SAFE (mon_info, hmap_node, svc_monitor_map) {
         if (!mon_info->required) {
@@ -4040,18 +4161,9 @@  build_lb_count_dps(struct hmap *lb_dps_map,
  */
 static void
 build_lb_port_related_data(
-    struct ovsdb_idl_txn *ovnsb_txn,
-    const struct sbrec_service_monitor_table *sbrec_service_monitor_table,
-    const char *svc_monitor_mac,
-    const struct eth_addr *svc_monitor_mac_ea,
-    struct ovn_datapaths *lr_datapaths, struct hmap *ls_ports,
-    struct hmap *lb_dps_map, struct hmap *lb_group_dps_map,
-    struct sset *svc_monitor_lsps,
-    struct hmap *svc_monitor_map)
+    struct ovn_datapaths *lr_datapaths,
+    struct hmap *lb_dps_map, struct hmap *lb_group_dps_map)
 {
-    build_lb_svcs(ovnsb_txn, sbrec_service_monitor_table, svc_monitor_mac,
-                  svc_monitor_mac_ea, ls_ports, lb_dps_map,
-                  svc_monitor_lsps, svc_monitor_map);
     build_lswitch_lbs_from_lrouter(lr_datapaths, lb_dps_map, lb_group_dps_map);
 }
 
@@ -17392,16 +17504,6 @@  build_ls_stateful_flows(const struct ls_stateful_record *ls_stateful_rec,
     build_lb_hairpin(ls_stateful_rec, od, lflows, ls_stateful_rec->lflow_ref);
 }
 
-static struct nbrec_network_function *
-network_function_get_active(const struct nbrec_network_function_group *nfg)
-{
-    /* Another patch adds the healthmon support. This is temporary. */
-    if (nfg->n_network_function == 0) {
-        return NULL;
-    }
-    return nfg->network_function[0];
-}
-
 /* For packets received on tunnel and egressing towards a network-function port
  * commit the tunnel interface id in CT. This will be utilized when the packet
  * comes out of the other network-function interface of the service VM. The
@@ -17443,6 +17545,101 @@  build_lswitch_stateful_nf(struct ovn_port *op,
                   ds_cstr(match), ds_cstr(actions), op->lflow_ref);
 }
 
+static struct nbrec_network_function *
+network_function_get_active(const struct nbrec_network_function_group *nfg)
+{
+    return nfg->network_function_active;
+}
+
+static void
+network_function_update_active(const struct nbrec_network_function_group *nfg,
+                            const struct hmap *svc_monitor_map,
+                            const char *svc_monitor_ip4_dst)
+{
+    if (!nfg->n_network_function) {
+        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+        VLOG_ERR_RL(&rl, "NetworkFunction: No network_function found in "
+                         "network_function_group %s", nfg->name);
+        if (nfg->network_function_active) {
+            nbrec_network_function_group_set_network_function_active(nfg,
+                                                                     NULL);
+        }
+        return;
+    }
+    struct nbrec_network_function *nf_active = nfg->network_function[0];
+    struct nbrec_network_function *nf_active_prev = NULL;
+    uint16_t best_score = 0;
+    bool healthy_nf_available = false;
+    if (nfg->network_function_active) {
+        nf_active_prev = nfg->network_function_active;
+    }
+
+    for (int i = 0; i < nfg->n_network_function; i++) {
+        struct nbrec_network_function *nf = nfg->network_function[i];
+        uint16_t curr_score = 0;
+        if (nf->health_check == NULL) {
+            VLOG_DBG("NetworkFunction: Health check is not configured for "
+                     "network_function %s", nf->name);
+            /* Consider network_function as healthy if health_check is
+             * not configured. */
+            curr_score += 3;
+            healthy_nf_available = true;
+        } else {
+            struct service_monitor_info *mon_info =
+            get_service_mon(svc_monitor_map, svc_monitor_ip4_dst,
+                            nf->outport->name, 0, "icmp");
+            if (mon_info == NULL) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+                VLOG_ERR_RL(&rl, "NetworkFunction: Service_monitor is not "
+                            "found for network_function:%s", nf->name);
+            } else if (mon_info->sbrec_mon->status
+                       && !strcmp(mon_info->sbrec_mon->status, "online")) {
+                curr_score += 3;
+                healthy_nf_available = true;
+            }
+        }
+
+        if (nf_active_prev && (nf == nf_active_prev)) {
+            curr_score += 1;
+        }
+
+        if (curr_score > best_score) {
+            nf_active = nf;
+            best_score = curr_score;
+        }
+    }
+
+    if (!healthy_nf_available) {
+        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+        VLOG_WARN_RL(&rl, "NetworkFunction: No healthy network_function found "
+                     "in network_function_group %s, "
+                     "selected network_function %s as active", nfg->name,
+                     nf_active->name);
+    }
+
+    if (nf_active_prev != nf_active) {
+        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+        VLOG_INFO_RL(&rl, "NetworkFunction: Update active network_function %s "
+                     "in network_function_group %s",
+                     nf_active->name, nfg->name);
+        nbrec_network_function_group_set_network_function_active(nfg,
+                                                                 nf_active);
+    }
+}
+
+static void build_network_function_active(
+    const struct nbrec_network_function_group_table *nbrec_nfg_table,
+    struct hmap *svc_monitor_map,
+    const char *svc_monitor_ip4_dst)
+{
+    const struct nbrec_network_function_group *nbrec_nfg;
+    NBREC_NETWORK_FUNCTION_GROUP_TABLE_FOR_EACH (nbrec_nfg,
+                            nbrec_nfg_table) {
+        network_function_update_active(nbrec_nfg, svc_monitor_map,
+                                       svc_monitor_ip4_dst);
+    }
+}
+
 static void
 consider_network_function(
                struct lflow_table *lflows, struct ovn_datapath *od,
@@ -19362,18 +19559,25 @@  ovnnb_db_run(struct northd_input *input_data,
                 input_data->sbrec_ha_chassis_grp_by_name,
                 &data->ls_datapaths.datapaths, &data->lr_datapaths.datapaths,
                 &data->ls_ports, &data->lr_ports);
-    build_lb_port_related_data(ovnsb_txn,
-                               input_data->sbrec_service_monitor_table,
-                               input_data->svc_monitor_mac,
-                               &input_data->svc_monitor_mac_ea,
-                               &data->lr_datapaths, &data->ls_ports,
+    build_lb_port_related_data(&data->lr_datapaths,
                                &data->lb_datapaths_map,
-                               &data->lb_group_datapaths_map,
-                               &data->svc_monitor_lsps,
-                               &data->svc_monitor_map);
+                               &data->lb_group_datapaths_map);
+    build_svcs(ovnsb_txn, input_data->sbrec_service_monitor_table,
+               input_data->svc_monitor_mac,
+               &input_data->svc_monitor_mac_ea,
+               input_data->svc_monitor_mac_dst,
+               input_data->svc_monitor_ip4,
+               input_data->svc_monitor_ip4_dst,
+               &data->ls_ports, &data->lb_datapaths_map,
+               input_data->nbrec_network_function_table,
+               &data->svc_monitor_lsps, &data->svc_monitor_map);
     build_lb_count_dps(&data->lb_datapaths_map,
                        ods_size(&data->ls_datapaths),
                        ods_size(&data->lr_datapaths));
+    build_network_function_active(
+        input_data->nbrec_network_function_group_table,
+        &data->svc_monitor_map,
+        input_data->svc_monitor_ip4_dst);
     build_ipam(&data->ls_datapaths.datapaths, &data->ls_ports);
     build_lrouter_groups(&data->lr_ports, &data->lr_list);
     build_ip_mcast(ovnsb_txn, input_data->sbrec_ip_multicast_table,
diff --git a/northd/northd.h b/northd/northd.h
index 7ec546957..5132e627c 100644
--- a/northd/northd.h
+++ b/northd/northd.h
@@ -65,6 +65,10 @@  struct northd_input {
     const struct smap *sb_options;
     const char *svc_monitor_mac;
     struct eth_addr svc_monitor_mac_ea;
+    const char *svc_monitor_mac_dst;
+    struct eth_addr svc_monitor_mac_ea_dst;
+    const char *svc_monitor_ip4;
+    const char *svc_monitor_ip4_dst;
     const struct chassis_features *features;
 
     /* ACL ID inputs. */
@@ -236,8 +240,8 @@  struct lflow_input {
     const struct hmap *lb_datapaths_map;
     const struct sset *bfd_ports;
     const struct chassis_features *features;
-    const struct hmap *svc_monitor_map;
     bool ovn_internal_version_changed;
+    const struct hmap *svc_monitor_map;
     const char *svc_monitor_mac;
     const struct sampling_app_table *sampling_apps;
     struct group_ecmp_route_data *route_data;
diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
index c6058f42c..db7e71d17 100644
--- a/ovn-sb.ovsschema
+++ b/ovn-sb.ovsschema
@@ -1,7 +1,7 @@ 
 {
     "name": "OVN_Southbound",
-    "version": "20.41.0",
-    "cksum": "2343742948 34719",
+    "version": "20.42.0",
+    "cksum": "3932698824 35046",
     "tables": {
         "SB_Global": {
             "columns": {
@@ -503,14 +503,20 @@ 
             "isRoot": true},
         "Service_Monitor": {
             "columns": {
+                "type": {"type": {"key": {
+                           "type": "string",
+                           "enum": ["set", ["load-balancer",
+                                            "network-function"]]}}},
                 "ip": {"type": "string"},
+                "mac": {"type": "string"},
                 "protocol": {
                     "type": {"key": {"type": "string",
-                             "enum": ["set", ["tcp", "udp"]]},
+                             "enum": ["set", ["tcp", "udp", "icmp"]]},
                              "min": 0, "max": 1}},
                 "port": {"type": {"key": {"type": "integer",
                                           "minInteger": 0,
                                           "maxInteger": 65535}}},
+                "logical_input_port": {"type": "string"},
                 "logical_port": {"type": "string"},
                 "src_mac": {"type": "string"},
                 "src_ip": {"type": "string"},
diff --git a/ovn-sb.xml b/ovn-sb.xml
index 39acb81a4..77dd824be 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -4944,10 +4944,20 @@  tcp.flags = RST;
         service monitor.
       </p>
 
+      <column name="type">
+        The type of the service. Supported values are "load-balancer" and
+        "network-function".
+      </column>
+
       <column name="ip">
+        Destination IP used in monitor packets. For load-balancer this is the
         IP of the service to be monitored. Only IPv4 is supported.
       </column>
 
+      <column name="mac">
+        Destination MAC address used in monitor packets for network-function.
+      </column>
+
       <column name="protocol">
         The protocol of the service.
       </column>
@@ -4956,10 +4966,20 @@  tcp.flags = RST;
         The TCP or UDP port of the service.
       </column>
 
+      <column name="logical_input_port">
+        This is applicable only for network-function type. The VIF of the
+        logical port on which monitor packets have to be sent. The
+        <code>ovn-controller</code> that binds this <code>logical_port</code>
+        monitors the service by sending periodic monitor packets.
+      </column>
+
       <column name="logical_port">
         The VIF of the logical port on which the service is running. The
         <code>ovn-controller</code> that binds this <code>logical_port</code>
-        monitors the service by sending periodic monitor packets.
+        monitors the service by sending periodic monitor packets. For
+        load-balancer this is the port to which monitor packets are sent and
+        from which response packets are received. For network-function this
+        is the port from which the forwarded monitor packets are received.
       </column>
 
       <column name="src_mac">