diff mbox series

[ovs-dev,v3] netdev-dpdk: Use LSC interrupt mode.

Message ID 20240614150837.2027561-1-david.marchand@redhat.com
State Changes Requested
Headers show
Series [ovs-dev,v3] netdev-dpdk: Use LSC interrupt mode. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

David Marchand June 14, 2024, 3:08 p.m. UTC
Querying link status may get delayed for an undeterministic (long) time
with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool
kernel API and getting stuck on the kernel RTNL lock while some other
operation is in progress under this lock.

One impact for long link status query is that it is called under the bond
lock taken in write mode periodically in bond_run().
In parallel, datapath threads may block requesting to read bonding related
info (like for example in bond_check_admissibility()).

The LSC interrupt mode is available with many DPDK drivers and is used by
default with testpmd.

It seems safe enough to switch on this feature by default in OVS.
We keep the per interface option to disable this feature in case of an
unforeseen bug.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Robin Jarry <rjarry@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
---
Changes since v2:
- fixed typo in NEWS,

Changes since v1:
- (early) fail when interrupt lsc is requested by user but not supported
  by the driver,
- otherwise, log a debug message if user did not request interrupt mode,

---
 Documentation/topics/dpdk/phy.rst |  4 ++--
 NEWS                              |  3 +++
 lib/netdev-dpdk.c                 | 13 ++++++++++++-
 vswitchd/vswitch.xml              |  8 ++++----
 4 files changed, 21 insertions(+), 7 deletions(-)

Comments

Ilya Maximets June 14, 2024, 4:22 p.m. UTC | #1
On 6/14/24 17:08, David Marchand wrote:
> Querying link status may get delayed for an undeterministic (long) time
> with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool
> kernel API and getting stuck on the kernel RTNL lock while some other
> operation is in progress under this lock.
> 
> One impact for long link status query is that it is called under the bond
> lock taken in write mode periodically in bond_run().
> In parallel, datapath threads may block requesting to read bonding related
> info (like for example in bond_check_admissibility()).
> 
> The LSC interrupt mode is available with many DPDK drivers and is used by
> default with testpmd.
> 
> It seems safe enough to switch on this feature by default in OVS.
> We keep the per interface option to disable this feature in case of an
> unforeseen bug.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Robin Jarry <rjarry@redhat.com>
> Acked-by: Mike Pattrick <mkp@redhat.com>
> ---
> Changes since v2:
> - fixed typo in NEWS,
> 
> Changes since v1:
> - (early) fail when interrupt lsc is requested by user but not supported
>   by the driver,
> - otherwise, log a debug message if user did not request interrupt mode,
> 
> ---
>  Documentation/topics/dpdk/phy.rst |  4 ++--
>  NEWS                              |  3 +++
>  lib/netdev-dpdk.c                 | 13 ++++++++++++-
>  vswitchd/vswitch.xml              |  8 ++++----
>  4 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst
> index efd168cba8..eefc25613d 100644
> --- a/Documentation/topics/dpdk/phy.rst
> +++ b/Documentation/topics/dpdk/phy.rst
> @@ -546,8 +546,8 @@ the firmware every time to fulfil this request.
>  
>  Note that not all PMD drivers support LSC interrupts.
>  
> -The default configuration is polling mode. To set interrupt mode, option
> -``dpdk-lsc-interrupt`` has to be set to ``true``.
> +The default configuration is interrupt mode. To set polling mode, option
> +``dpdk-lsc-interrupt`` has to be set to ``false``.
>  
>  Command to set interrupt mode for a specific interface::
>      $ ovs-vsctl set interface <iface_name> options:dpdk-lsc-interrupt=true
> diff --git a/NEWS b/NEWS
> index 5ae0108d55..d05f2d0f89 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,6 +9,9 @@ Post-v3.3.0
>       https://github.com/openvswitch/ovs.git
>     - DPDK:
>       * OVS validated with DPDK 23.11.1.
> +     * Link status changes are now handled via interrupt mode if the DPDK
> +       driver supports it.  It is possible to revert to polling mode by setting
> +       per interface 'options:dpdk-lsc-interrupt' to 'false'.
>  
>  
>  v3.3.0 - 16 Feb 2024
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 0fa37d5145..a260bc8485 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>          }
>      }
>  
> -    lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
> +    lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
> +    if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) {
> +        if (smap_get(args, "dpdk-lsc-interrupt")) {
> +            VLOG_ERR("interface '%s': link status interrupt is not supported.",
> +                     netdev_get_name(netdev));

Since we're exiting with an error set, the message should be buffered
into errp instead, so it can be visible in the database record and
returned as a result of the ovs-vsctl.

Also, we're using WARN level for all other configuration issues, so we
should do that here as well.  ERR is usually some sort of internal error.
And we're usually just using "%s: ..." and not "interface '%s': ...".

> +            err = EINVAL;
> +            goto out;
> +        }
> +        VLOG_DBG("interface '%s': not enabling link status interrupt.",
> +                 netdev_get_name(netdev));
> +        lsc_interrupt_mode = false;
> +    }
>      if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
>          dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
>          netdev_request_reconfigure(netdev);
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 8a1b607d71..e3afb78a4e 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -4647,12 +4647,12 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>        <column name="options" key="dpdk-lsc-interrupt"
>                type='{"type": "boolean"}'>
>          <p>
> -          Set this value to <code>true</code> to configure interrupt mode for
> -          Link State Change (LSC) detection instead of poll mode for the DPDK
> -          interface.
> +          Set this value to <code>false</code> to configure poll mode for
> +          Link State Change (LSC) detection instead of interrupt mode for the
> +          DPDK interface.
>          </p>
>          <p>
> -          If this value is not set, poll mode is configured.
> +          If this value is not set, interrupt mode is configured.
>          </p>
>          <p>
>            This parameter has an effect only on netdev dpdk interfaces.
David Marchand June 17, 2024, 7:46 a.m. UTC | #2
On Fri, Jun 14, 2024 at 6:22 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 0fa37d5145..a260bc8485 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
> >          }
> >      }
> >
> > -    lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
> > +    lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
> > +    if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) {
> > +        if (smap_get(args, "dpdk-lsc-interrupt")) {
> > +            VLOG_ERR("interface '%s': link status interrupt is not supported.",
> > +                     netdev_get_name(netdev));
>
> Since we're exiting with an error set, the message should be buffered
> into errp instead, so it can be visible in the database record and
> returned as a result of the ovs-vsctl.
>
> Also, we're using WARN level for all other configuration issues, so we
> should do that here as well.  ERR is usually some sort of internal error.
> And we're usually just using "%s: ..." and not "interface '%s': ...".

Ok for ERR vs WARN.

For the rest, well, I copied the logs right before.

    vf_mac = smap_get(args, "dpdk-vf-mac");
    if (vf_mac) {
        struct eth_addr mac;

        if (!dpdk_port_is_representor(dev)) {
            VLOG_WARN("'%s' is trying to set the VF MAC '%s' "
                      "but 'options:dpdk-vf-mac' is only supported for "
                      "VF representors.",
                      netdev_get_name(netdev), vf_mac);
        } else if (!eth_addr_from_string(vf_mac, &mac)) {
            VLOG_WARN("interface '%s': cannot parse VF MAC '%s'.",
                      netdev_get_name(netdev), vf_mac);
        } else if (eth_addr_is_multicast(mac)) {
            VLOG_WARN("interface '%s': cannot set VF MAC to multicast "
                      "address '%s'.", netdev_get_name(netdev), vf_mac);
        } else if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
            dev->requested_hwaddr = mac;
            netdev_request_reconfigure(netdev);
        }
    }

    lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);


So I'll fix the dpdk-vf-mac stuff (and double check the rest of this
function), then go with your suggestion for this added log of mine.
Ilya Maximets June 17, 2024, 8:11 a.m. UTC | #3
On 6/17/24 09:46, David Marchand wrote:
> On Fri, Jun 14, 2024 at 6:22 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 0fa37d5145..a260bc8485 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>>          }
>>>      }
>>>
>>> -    lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
>>> +    lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
>>> +    if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) {
>>> +        if (smap_get(args, "dpdk-lsc-interrupt")) {
>>> +            VLOG_ERR("interface '%s': link status interrupt is not supported.",
>>> +                     netdev_get_name(netdev));
>>
>> Since we're exiting with an error set, the message should be buffered
>> into errp instead, so it can be visible in the database record and
>> returned as a result of the ovs-vsctl.
>>
>> Also, we're using WARN level for all other configuration issues, so we
>> should do that here as well.  ERR is usually some sort of internal error.
>> And we're usually just using "%s: ..." and not "interface '%s': ...".
> 
> Ok for ERR vs WARN.
> 
> For the rest, well, I copied the logs right before.
> 
>     vf_mac = smap_get(args, "dpdk-vf-mac");
>     if (vf_mac) {
>         struct eth_addr mac;
> 
>         if (!dpdk_port_is_representor(dev)) {
>             VLOG_WARN("'%s' is trying to set the VF MAC '%s' "
>                       "but 'options:dpdk-vf-mac' is only supported for "
>                       "VF representors.",
>                       netdev_get_name(netdev), vf_mac);
>         } else if (!eth_addr_from_string(vf_mac, &mac)) {
>             VLOG_WARN("interface '%s': cannot parse VF MAC '%s'.",
>                       netdev_get_name(netdev), vf_mac);
>         } else if (eth_addr_is_multicast(mac)) {
>             VLOG_WARN("interface '%s': cannot set VF MAC to multicast "
>                       "address '%s'.", netdev_get_name(netdev), vf_mac);
>         } else if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
>             dev->requested_hwaddr = mac;
>             netdev_request_reconfigure(netdev);
>         }
>     }
> 
>     lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
> 
> 
> So I'll fix the dpdk-vf-mac stuff (and double check the rest of this
> function), then go with your suggestion for this added log of mine.
> 
> 

We must not initialize errp if we do not fail with error, otherwise we leak
the memory.  VF mac code does not fail the configuration, so we only log the
warning.  All the paths that fail should set errp instead.

Best regards, Ilya Maximets.
David Marchand June 17, 2024, 8:19 a.m. UTC | #4
On Mon, Jun 17, 2024 at 10:11 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 6/17/24 09:46, David Marchand wrote:
> > On Fri, Jun 14, 2024 at 6:22 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >>> index 0fa37d5145..a260bc8485 100644
> >>> --- a/lib/netdev-dpdk.c
> >>> +++ b/lib/netdev-dpdk.c
> >>> @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
> >>>          }
> >>>      }
> >>>
> >>> -    lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
> >>> +    lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
> >>> +    if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) {
> >>> +        if (smap_get(args, "dpdk-lsc-interrupt")) {
> >>> +            VLOG_ERR("interface '%s': link status interrupt is not supported.",
> >>> +                     netdev_get_name(netdev));
> >>
> >> Since we're exiting with an error set, the message should be buffered
> >> into errp instead, so it can be visible in the database record and
> >> returned as a result of the ovs-vsctl.
> >>
> >> Also, we're using WARN level for all other configuration issues, so we
> >> should do that here as well.  ERR is usually some sort of internal error.
> >> And we're usually just using "%s: ..." and not "interface '%s': ...".
> >
> > Ok for ERR vs WARN.
> >
> > For the rest, well, I copied the logs right before.
> >
> >     vf_mac = smap_get(args, "dpdk-vf-mac");
> >     if (vf_mac) {
> >         struct eth_addr mac;
> >
> >         if (!dpdk_port_is_representor(dev)) {
> >             VLOG_WARN("'%s' is trying to set the VF MAC '%s' "
> >                       "but 'options:dpdk-vf-mac' is only supported for "
> >                       "VF representors.",
> >                       netdev_get_name(netdev), vf_mac);
> >         } else if (!eth_addr_from_string(vf_mac, &mac)) {
> >             VLOG_WARN("interface '%s': cannot parse VF MAC '%s'.",
> >                       netdev_get_name(netdev), vf_mac);
> >         } else if (eth_addr_is_multicast(mac)) {
> >             VLOG_WARN("interface '%s': cannot set VF MAC to multicast "
> >                       "address '%s'.", netdev_get_name(netdev), vf_mac);
> >         } else if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
> >             dev->requested_hwaddr = mac;
> >             netdev_request_reconfigure(netdev);
> >         }
> >     }
> >
> >     lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
> >
> >
> > So I'll fix the dpdk-vf-mac stuff (and double check the rest of this
> > function), then go with your suggestion for this added log of mine.
> >
> >
>
> We must not initialize errp if we do not fail with error, otherwise we leak
> the memory.  VF mac code does not fail the configuration, so we only log the
> warning.  All the paths that fail should set errp instead.
>

Talk about obvious...
I'll fix my stuff and leave the rest untouched.
diff mbox series

Patch

diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst
index efd168cba8..eefc25613d 100644
--- a/Documentation/topics/dpdk/phy.rst
+++ b/Documentation/topics/dpdk/phy.rst
@@ -546,8 +546,8 @@  the firmware every time to fulfil this request.
 
 Note that not all PMD drivers support LSC interrupts.
 
-The default configuration is polling mode. To set interrupt mode, option
-``dpdk-lsc-interrupt`` has to be set to ``true``.
+The default configuration is interrupt mode. To set polling mode, option
+``dpdk-lsc-interrupt`` has to be set to ``false``.
 
 Command to set interrupt mode for a specific interface::
     $ ovs-vsctl set interface <iface_name> options:dpdk-lsc-interrupt=true
diff --git a/NEWS b/NEWS
index 5ae0108d55..d05f2d0f89 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,9 @@  Post-v3.3.0
      https://github.com/openvswitch/ovs.git
    - DPDK:
      * OVS validated with DPDK 23.11.1.
+     * Link status changes are now handled via interrupt mode if the DPDK
+       driver supports it.  It is possible to revert to polling mode by setting
+       per interface 'options:dpdk-lsc-interrupt' to 'false'.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0fa37d5145..a260bc8485 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2397,7 +2397,18 @@  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
         }
     }
 
-    lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
+    lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
+    if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) {
+        if (smap_get(args, "dpdk-lsc-interrupt")) {
+            VLOG_ERR("interface '%s': link status interrupt is not supported.",
+                     netdev_get_name(netdev));
+            err = EINVAL;
+            goto out;
+        }
+        VLOG_DBG("interface '%s': not enabling link status interrupt.",
+                 netdev_get_name(netdev));
+        lsc_interrupt_mode = false;
+    }
     if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
         dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
         netdev_request_reconfigure(netdev);
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 8a1b607d71..e3afb78a4e 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -4647,12 +4647,12 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
       <column name="options" key="dpdk-lsc-interrupt"
               type='{"type": "boolean"}'>
         <p>
-          Set this value to <code>true</code> to configure interrupt mode for
-          Link State Change (LSC) detection instead of poll mode for the DPDK
-          interface.
+          Set this value to <code>false</code> to configure poll mode for
+          Link State Change (LSC) detection instead of interrupt mode for the
+          DPDK interface.
         </p>
         <p>
-          If this value is not set, poll mode is configured.
+          If this value is not set, interrupt mode is configured.
         </p>
         <p>
           This parameter has an effect only on netdev dpdk interfaces.