From patchwork Tue Jul 26 13:48:37 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Ilya Maximets <i.maximets@samsung.com>
X-Patchwork-Id: 652767
Return-Path: <dev-bounces@openvswitch.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from archives.nicira.com (archives.nicira.com [96.126.127.54])
	by ozlabs.org (Postfix) with ESMTP id 3rzKGS19gtz9t15
	for <incoming@patchwork.ozlabs.org>;
	Tue, 26 Jul 2016 23:48:48 +1000 (AEST)
Received: from archives.nicira.com (localhost [127.0.0.1])
	by archives.nicira.com (Postfix) with ESMTP id 354BA10B2B;
	Tue, 26 Jul 2016 06:48:47 -0700 (PDT)
X-Original-To: dev@openvswitch.org
Delivered-To: dev@openvswitch.org
Received: from mx3v3.cudamail.com (mx3.cudamail.com [64.34.241.5])
	by archives.nicira.com (Postfix) with ESMTPS id 81D9810B1B
	for <dev@openvswitch.org>; Tue, 26 Jul 2016 06:48:46 -0700 (PDT)
Received: from bar6.cudamail.com (localhost [127.0.0.1])
	by mx3v3.cudamail.com (Postfix) with ESMTPS id 150D116243B
	for <dev@openvswitch.org>; Tue, 26 Jul 2016 07:48:46 -0600 (MDT)
X-ASG-Debug-ID: 1469540923-0b323703f95a2ba0001-byXFYA
Received: from mx3-pf3.cudamail.com ([192.168.14.3]) by bar6.cudamail.com
	with
	ESMTP id 4tNvT2hi3mxqzthQ (version=TLSv1 cipher=DHE-RSA-AES256-SHA
	bits=256 verify=NO) for <dev@openvswitch.org>;
	Tue, 26 Jul 2016 07:48:43 -0600 (MDT)
X-Barracuda-Envelope-From: i.maximets@samsung.com
X-Barracuda-RBL-Trusted-Forwarder: 192.168.14.3
Received: from unknown (HELO mailout3.w1.samsung.com) (210.118.77.13)
	by mx3-pf3.cudamail.com with ESMTPS (DHE-RSA-AES128-SHA encrypted);
	26 Jul 2016 13:48:42 -0000
Received-SPF: none (mx3-pf3.cudamail.com: domain at samsung.com does not
	designate permitted sender hosts)
X-Barracuda-Apparent-Source-IP: 210.118.77.13
X-Barracuda-RBL-IP: 210.118.77.13
MIME-version: 1.0
Received: from eucpsbgm1.samsung.com (unknown [203.254.199.244])
	by mailout3.w1.samsung.com
	(Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5
	2014)) with ESMTP id <0OAX00088D13VZ60@mailout3.w1.samsung.com> for
	dev@openvswitch.org; Tue, 26 Jul 2016 14:48:39 +0100 (BST)
X-AuditID: cbfec7f4-f796c6d000001486-39-57976a3745f5
Received: from eusync4.samsung.com ( [203.254.199.214])
	by eucpsbgm1.samsung.com (EUCPMTA) with SMTP id 46.ED.05254.73A67975;
	Tue, 26 Jul 2016 14:48:39 +0100 (BST)
Received: from [106.109.129.180] by eusync4.samsung.com
	(Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5
	2014)) with ESMTPA id <0OAX000YED119HA0@eusync4.samsung.com>; Tue,
	26 Jul 2016 14:48:39 +0100 (BST)
X-CudaMail-MID: CM-V3-725013122
X-CudaMail-DTE: 072616
X-CudaMail-Originating-IP: 210.118.77.13
To: Daniele Di Proietto <diproiettod@vmware.com>,
	"dev@openvswitch.org" <dev@openvswitch.org>, Ben Pfaff <blp@ovn.org>
X-ASG-Orig-Subj: [##CM-V3-725013122##]Re: [PATCH v3 3/3] dpif-netdev:
	Introduce pmd-rxq-affinity.
References: <1468583694-12296-1-git-send-email-i.maximets@samsung.com>
	<1468583694-12296-4-git-send-email-i.maximets@samsung.com>
	<2F62F19A-6E9A-4520-8FCB-81B37B51EBA2@vmware.com>
X-CudaMail-Envelope-Sender: i.maximets@samsung.com
From: Ilya Maximets <i.maximets@samsung.com>
Message-id: <57976A35.3020705@samsung.com>
Date: Tue, 26 Jul 2016 16:48:37 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
	Thunderbird/38.8.0
In-reply-to: <2F62F19A-6E9A-4520-8FCB-81B37B51EBA2@vmware.com>
X-Brightmail-Tracker: 
 H4sIAAAAAAAAA+NgFprAIsWRmVeSWpSXmKPExsVy+t/xa7rmWdPDDW42S1n8evOA3eLV5AZG
	i7mrN7BbHD29h9niZfc9dost0yazWkz7fJvd4kr7T3aLPY++s1nMWLqLxWLKnmesFpNnS1ls
	vjiJyYHXY1ZDL5vH4j0vmTye3fzP6PH8Wg+Lx/t9V9k8+rasYvS48mcSq8f/5qusHueu9TF7
	vJv/li2AK4rLJiU1J7MstUjfLoEr49uV78wFF+YwVrS8U2xgPFPSxcjJISFgInFi2zsmCFtM
	4sK99WxdjFwcQgJLGSXm97czgiR4BQQlfky+x9LFyMHBLCAvceRSNoSpLjFlSi5E+QtGie2T
	ZrKCxIUFXCS2fTEB6RQRqJBouLsOauR2RolnzRtYQBxmgQ5mif3rdoMtZhPQkTi1+gjULi2J
	tV/fsYHYLAKqEitf/mcGsUUFIiRmbf8BVs8pYCdxY9IFtgmMArOQnDcL4bxZCOctYGRexSia
	WppcUJyUnmuoV5yYW1yal66XnJ+7iREST192MC4+ZnWIUYCDUYmHlzNlWrgQa2JZcWXuIUYJ
	DmYlEd7vadPDhXhTEiurUovy44tKc1KLDzFKc7AoifPO3fU+REggPbEkNTs1tSC1CCbLxMEp
	1cA479gSrX3ue7M23Hq1V8brlcSLFRGfTneY744yWNxwPnzZ7arZXi+mm13NY+qQPMhQ/JUn
	7rb5ilNnZz0T2XK3bG3d2pdtjstjpZVPTZQz22ifvI5Hq72hIPLPyYsHF9g4v3PJ0eGYOcvN
	UTutUV7OdXax/5LizLAUpkOPZ3aWLffemp1q9H+LEktxRqKhFnNRcSIASPf9UqMCAAA=
X-GBUdb-Analysis: 0, 210.118.77.13, Ugly c=0 p=0 Source New
X-MessageSniffer-Rules: 0-0-0-32767-c
X-Barracuda-Connect: UNKNOWN[192.168.14.3]
X-Barracuda-Start-Time: 1469540923
X-Barracuda-Encrypted: DHE-RSA-AES256-SHA
X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi
X-Virus-Scanned: by bsmtpd at cudamail.com
X-Barracuda-BRTS-Status: 1
X-ASG-Whitelist: EmailCat (corporate)
Cc: Dyasly Sergey <s.dyasly@samsung.com>, Flavio Leitner <fbl@sysclose.org>,
	Kevin Traynor <kevin.traynor@intel.com>
Subject: Re: [ovs-dev] [PATCH v3 3/3] dpif-netdev: Introduce
	pmd-rxq-affinity.
X-BeenThere: dev@openvswitch.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: <dev.openvswitch.org>
List-Unsubscribe: <http://openvswitch.org/mailman/options/dev>,
	<mailto:dev-request@openvswitch.org?subject=unsubscribe>
List-Archive: <http://openvswitch.org/pipermail/dev>
List-Post: <mailto:dev@openvswitch.org>
List-Help: <mailto:dev-request@openvswitch.org?subject=help>
List-Subscribe: <http://openvswitch.org/mailman/listinfo/dev>,
	<mailto:dev-request@openvswitch.org?subject=subscribe>
Errors-To: dev-bounces@openvswitch.org
Sender: "dev" <dev-bounces@openvswitch.org>

On 26.07.2016 04:46, Daniele Di Proietto wrote:
> Thanks for the patch.
> 
> I haven't been able to apply this without the XPS patch.

That was the original idea. Using of this patch with current
tx queue management may lead to performance issues on multiqueue
configurations.

> This looks like a perfect chance to add more tests to pmd.at.  I can do it if you want

Sounds good.

> I started taking a look at this patch and I have a few comments inline.  I'll keep looking at it tomorrow
> 
> Thanks,
> 
> Daniele
> 
> 
> On 15/07/2016 04:54, "Ilya Maximets" <i.maximets@samsung.com> wrote:
> 
>> New 'other_config:pmd-rxq-affinity' field for Interface table to
>> perform manual pinning of RX queues to desired cores.
>>
>> This functionality is required to achieve maximum performance because
>> all kinds of ports have different cost of rx/tx operations and
>> only user can know about expected workload on different ports.
>>
>> Example:
>> 	# ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>> 	                  other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>> 	Queue #0 pinned to core 3;
>> 	Queue #1 pinned to core 7;
>> 	Queue #2 not pinned.
>> 	Queue #3 pinned to core 8;
>>
>> It's decided to automatically isolate cores that have rxq explicitly
>> assigned to them because it's useful to keep constant polling rate on
>> some performance critical ports while adding/deleting other ports
>> without explicit pinning of all ports.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>> ---
>> INSTALL.DPDK.md      |  49 +++++++++++-
>> NEWS                 |   2 +
>> lib/dpif-netdev.c    | 218 ++++++++++++++++++++++++++++++++++++++++++---------
>> tests/pmd.at         |   6 ++
>> vswitchd/vswitch.xml |  23 ++++++
>> 5 files changed, 257 insertions(+), 41 deletions(-)
>>
>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>> index 5407794..7609aa7 100644
>> --- a/INSTALL.DPDK.md
>> +++ b/INSTALL.DPDK.md
>> @@ -289,14 +289,57 @@ advanced install guide [INSTALL.DPDK-ADVANCED.md]
>>      # Check current stats
>>        ovs-appctl dpif-netdev/pmd-stats-show
>>
>> +     # Clear previous stats
>> +       ovs-appctl dpif-netdev/pmd-stats-clear
>> +     ```
>> +
>> +  7. Port/rxq assigment to PMD threads
>> +
>> +     ```
>>      # Show port/rxq assignment
>>        ovs-appctl dpif-netdev/pmd-rxq-show
>> +     ```
>>
>> -     # Clear previous stats
>> -       ovs-appctl dpif-netdev/pmd-stats-clear
>> +     To change default rxq assignment to pmd threads rxqs may be manually
>> +     pinned to desired cores using:
>> +
>> +     ```
>> +     ovs-vsctl set Interface <iface> \
>> +               other_config:pmd-rxq-affinity=<rxq-affinity-list>
>>      ```
>> +     where:
>> +
>> +     ```
>> +     <rxq-affinity-list> ::= NULL | <non-empty-list>
>> +     <non-empty-list> ::= <affinity-pair> |
>> +                          <affinity-pair> , <non-empty-list>
>> +     <affinity-pair> ::= <queue-id> : <core-id>
>> +     ```
>> +
>> +     Example:
>> +
>> +     ```
>> +     ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>> +               other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>> +
>> +     Queue #0 pinned to core 3;
>> +     Queue #1 pinned to core 7;
>> +     Queue #2 not pinned.
>> +     Queue #3 pinned to core 8;
>> +     ```
>> +
>> +     After that PMD threads on cores where RX queues was pinned will become
>> +     `isolated`. This means that this thread will poll only pinned RX queues.
>> +
>> +     WARNING: If there are no `non-isolated` PMD threads, `non-pinned` RX queues
>> +     will not be polled. Also, if provided `core_id` is not available (ex. this
>> +     `core_id` not in `pmd-cpu-mask`), RX queue will not be polled by any
>> +     PMD thread.
>> +
>> +     Isolation of PMD threads also can be checked using
>> +     `ovs-appctl dpif-netdev/pmd-rxq-show` command.
>>
>> -  7. Stop vswitchd & Delete bridge
>> +  8. Stop vswitchd & Delete bridge
>>
>>      ```
>>      ovs-appctl -t ovs-vswitchd exit
>> diff --git a/NEWS b/NEWS
>> index 6496dc1..9ccc1f5 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -44,6 +44,8 @@ Post-v2.5.0
>>        Old 'other_config:n-dpdk-rxqs' is no longer supported.
>>        Not supported by vHost interfaces. For them number of rx and tx queues
>>        is applied from connected virtio device.
>> +     * New 'other_config:pmd-rxq-affinity' field for PMD interfaces, that
>> +       allows to pin port's rx queues to desired cores.
>>      * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
>>        assignment.
>>      * Type of log messages from PMD threads changed from INFO to DBG.
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 18ce316..e5a8dec 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -63,6 +63,7 @@
>> #include "random.h"
>> #include "seq.h"
>> #include "shash.h"
>> +#include "smap.h"
>> #include "sset.h"
>> #include "timeval.h"
>> #include "tnl-neigh-cache.h"
>> @@ -250,6 +251,12 @@ enum pmd_cycles_counter_type {
>>
>> #define XPS_TIMEOUT_MS 500LL
>>
>> +/* Contained by struct dp_netdev_port's 'rxqs' member.  */
>> +struct dp_netdev_rxq {
>> +    struct netdev_rxq *rxq;
>> +    unsigned core_id;           /* Сore to which this queue is pinned. */
>> +};
>> +
>> /* A port in a netdev-based datapath. */
>> struct dp_netdev_port {
>>     odp_port_t port_no;
>> @@ -257,10 +264,11 @@ struct dp_netdev_port {
>>     struct hmap_node node;      /* Node in dp_netdev's 'ports'. */
>>     struct netdev_saved_flags *sf;
>>     unsigned n_rxq;             /* Number of elements in 'rxq' */
>> -    struct netdev_rxq **rxq;
>> +    struct dp_netdev_rxq *rxqs;
>>     unsigned *txq_used;         /* Number of threads that uses each tx queue. */
>>     struct ovs_mutex txq_used_mutex;
>>     char *type;                 /* Port type as requested by user. */
>> +    char *rxq_affinity_list;    /* Requested affinity of rx queues. */
>> };
>>
>> /* Contained by struct dp_netdev_flow's 'stats' member.  */
>> @@ -447,6 +455,7 @@ struct dp_netdev_pmd_thread {
>>     pthread_t thread;
>>     unsigned core_id;               /* CPU core id of this pmd thread. */
>>     int numa_id;                    /* numa node id of this pmd thread. */
>> +    bool isolated;
>>
>>     /* Queue id used by this pmd thread to send packets on all netdevs.
>>      * All tx_qid's are unique and less than 'ovs_numa_get_n_cores() + 1'. */
>> @@ -541,6 +550,8 @@ static struct dp_netdev_pmd_thread *
>> dp_netdev_less_loaded_pmd_on_numa(struct dp_netdev *dp, int numa_id);
>> static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp)
>>     OVS_REQUIRES(dp->port_mutex);
>> +static void reconfigure_pmd_threads(struct dp_netdev *dp)
>> +    OVS_REQUIRES(dp->port_mutex);
>> static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd);
>> static void dp_netdev_pmd_unref(struct dp_netdev_pmd_thread *pmd);
>> static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread *pmd);
>> @@ -731,8 +742,10 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd)
>>         struct rxq_poll *poll;
>>         const char *prev_name = NULL;
>>
>> -        ds_put_format(reply, "pmd thread numa_id %d core_id %u:\n",
>> -                      pmd->numa_id, pmd->core_id);
>> +        ds_put_format(reply,
>> +                      "pmd thread numa_id %d core_id %u:\nisolated : %s\n",
> 
> I think we should put a "\t" before "isolated:"

OK.

>> +                      pmd->numa_id, pmd->core_id, (pmd->isolated)
>> +                                                  ? "true" : "false");
>>
>>         ovs_mutex_lock(&pmd->port_mutex);
>>         LIST_FOR_EACH (poll, node, &pmd->poll_list) {
>> @@ -1196,18 +1209,19 @@ port_create(const char *devname, const char *open_type, const char *type,
>>     port->port_no = port_no;
>>     port->netdev = netdev;
>>     port->n_rxq = netdev_n_rxq(netdev);
>> -    port->rxq = xcalloc(port->n_rxq, sizeof *port->rxq);
>> +    port->rxqs = xcalloc(port->n_rxq, sizeof *port->rxqs);
>>     port->txq_used = xcalloc(netdev_n_txq(netdev), sizeof *port->txq_used);
>>     port->type = xstrdup(type);
>>     ovs_mutex_init(&port->txq_used_mutex);
>>
>>     for (i = 0; i < port->n_rxq; i++) {
>> -        error = netdev_rxq_open(netdev, &port->rxq[i], i);
>> +        error = netdev_rxq_open(netdev, &port->rxqs[i].rxq, i);
>>         if (error) {
>>             VLOG_ERR("%s: cannot receive packets on this network device (%s)",
>>                      devname, ovs_strerror(errno));
>>             goto out_rxq_close;
>>         }
>> +        port->rxqs[i].core_id = -1;
>>         n_open_rxqs++;
>>     }
>>
>> @@ -1223,12 +1237,12 @@ port_create(const char *devname, const char *open_type, const char *type,
>>
>> out_rxq_close:
>>     for (i = 0; i < n_open_rxqs; i++) {
>> -        netdev_rxq_close(port->rxq[i]);
>> +        netdev_rxq_close(port->rxqs[i].rxq);
>>     }
>>     ovs_mutex_destroy(&port->txq_used_mutex);
>>     free(port->type);
>>     free(port->txq_used);
>> -    free(port->rxq);
>> +    free(port->rxqs);
>>     free(port);
>>
>> out:
>> @@ -1365,11 +1379,12 @@ port_destroy(struct dp_netdev_port *port)
>>     netdev_restore_flags(port->sf);
>>
>>     for (unsigned i = 0; i < port->n_rxq; i++) {
>> -        netdev_rxq_close(port->rxq[i]);
>> +        netdev_rxq_close(port->rxqs[i].rxq);
>>     }
>>     ovs_mutex_destroy(&port->txq_used_mutex);
>> +    free(port->rxq_affinity_list);
>>     free(port->txq_used);
>> -    free(port->rxq);
>> +    free(port->rxqs);
>>     free(port->type);
>>     free(port);
>> }
>> @@ -2539,6 +2554,97 @@ dpif_netdev_pmd_set(struct dpif *dpif, const char *cmask)
>>     return 0;
>> }
>>
>> +/* Parses affinity list and returns result in 'core_ids'. */
>> +static int
>> +parse_affinity_list(const char *affinity_list, unsigned *core_ids, int n_rxq)
>> +{
>> +    unsigned i;
>> +    char *list, *pt, *saveptr = NULL;
>> +    int error = 0;
>> +
>> +    for (i = 0; i < n_rxq; i++) {
>> +        core_ids[i] = -1;
>> +    }
>> +
>> +    if (!affinity_list) {
>> +        return 0;
>> +    }
>> +
>> +    list = xstrdup(affinity_list);
>> +    for (pt = strtok_r(list, ",:", &saveptr); pt;
>> +         pt = strtok_r(NULL, ",:", &saveptr)) {
>> +        int rxq_id, core_id;
>> +
>> +        rxq_id = strtol(pt, NULL, 10);
>> +        if (rxq_id < 0) {
>> +            error = EINVAL;
>> +            break;
>> +        }
>> +        pt = strtok_r(NULL, ",:", &saveptr);
>> +        if (!pt || (core_id = strtol(pt, NULL, 10)) < 0) {
>> +            error = EINVAL;
>> +            break;
>> +        }
>> +        core_ids[rxq_id] = core_id;
>> +    }
>> +    free(list);
>> +    return error;
>> +}
>> +
>> +/* Parses 'affinity_list' and applies configuration if it is valid. */
>> +static int
>> +dpif_netdev_port_set_rxq_affinity(struct dp_netdev_port *port,
>> +                                  const char *affinity_list)
>> +{
>> +    unsigned *core_ids, i;
>> +    int error = 0;
>> +
>> +    core_ids = xmalloc(port->n_rxq * sizeof *core_ids);
>> +    if (parse_affinity_list(affinity_list, core_ids, port->n_rxq)) {
>> +        error = EINVAL;
>> +        goto exit;
>> +    }
>> +
>> +    for (i = 0; i < port->n_rxq; i++) {
>> +        port->rxqs[i].core_id = core_ids[i];
>> +    }
>> +
>> +exit:
>> +    free(core_ids);
>> +    return error;
>> +}
>> +
>> +/* Changes the affinity of port's rx queues.  The changes are actually applied
>> + * in dpif_netdev_run(). */
>> +static int
>> +dpif_netdev_port_set_config(struct dpif *dpif, odp_port_t port_no,
>> +                            const struct smap *cfg)
>> +{
>> +    struct dp_netdev *dp = get_dp_netdev(dpif);
>> +    struct dp_netdev_port *port;
>> +    int error = 0;
>> +    const char *affinity_list = smap_get(cfg, "pmd-rxq-affinity");
>> +
>> +    ovs_mutex_lock(&dp->port_mutex);
>> +    error = get_port_by_number(dp, port_no, &port);
>> +    if (error || !netdev_is_pmd(port->netdev)
>> +        || nullable_string_is_equal(affinity_list, port->rxq_affinity_list)) {
>> +        goto unlock;
>> +    }
>> +
>> +    error = dpif_netdev_port_set_rxq_affinity(port, affinity_list);
>> +    if (error) {
>> +        goto unlock;
>> +    }
>> +    free(port->rxq_affinity_list);
>> +    port->rxq_affinity_list = nullable_xstrdup(affinity_list);
>> +
>> +    reconfigure_pmd_threads(dp);
> 
> This will execute reconfigure the threads immediately.
> 
> Can't we postpone the changes to dpif_netdev_run(), so that if multiple ports
> are changed we stop the threads only once?

I guess, we can.
How about implementing of 2 functions:
	* dp_netdev_request_reconfigure()
	* dp_netdev_is_reconf_required()
just like for 'netdev'?

Maybe something like following fixup will fit (not tested):
-----------------------------------------------------------------------

-----------------------------------------------------------------------

>> +unlock:
>> +    ovs_mutex_unlock(&dp->port_mutex);
>> +    return error;
>> +}
>> +
>> static int
>> dpif_netdev_queue_to_priority(const struct dpif *dpif OVS_UNUSED,
>>                               uint32_t queue_id, uint32_t *priority)
>> @@ -2638,7 +2744,7 @@ static int
>> port_reconfigure(struct dp_netdev_port *port)
>> {
>>     struct netdev *netdev = port->netdev;
>> -    int i, err;
>> +    int i, err, old_n_rxq;
>>
>>     if (!netdev_is_reconf_required(netdev)) {
>>         return 0;
>> @@ -2646,9 +2752,10 @@ port_reconfigure(struct dp_netdev_port *port)
>>
>>     /* Closes the existing 'rxq's. */
>>     for (i = 0; i < port->n_rxq; i++) {
>> -        netdev_rxq_close(port->rxq[i]);
>> -        port->rxq[i] = NULL;
>> +        netdev_rxq_close(port->rxqs[i].rxq);
>> +        port->rxqs[i].rxq = NULL;
>>     }
>> +    old_n_rxq = port->n_rxq;
>>     port->n_rxq = 0;
>>
>>     /* Allows 'netdev' to apply the pending configuration changes. */
>> @@ -2659,19 +2766,27 @@ port_reconfigure(struct dp_netdev_port *port)
>>         return err;
>>     }
>>     /* If the netdev_reconfigure() above succeeds, reopens the 'rxq's. */
>> -    port->rxq = xrealloc(port->rxq, sizeof *port->rxq * netdev_n_rxq(netdev));
>> +    port->rxqs = xrealloc(port->rxqs,
>> +                          sizeof *port->rxqs * netdev_n_rxq(netdev));
>>     /* Realloc 'used' counters for tx queues. */
>>     free(port->txq_used);
>>     port->txq_used = xcalloc(netdev_n_txq(netdev), sizeof *port->txq_used);
>>
>>     for (i = 0; i < netdev_n_rxq(netdev); i++) {
>> -        err = netdev_rxq_open(netdev, &port->rxq[i], i);
>> +        err = netdev_rxq_open(netdev, &port->rxqs[i].rxq, i);
>>         if (err) {
>>             return err;
>>         }
>> +        /* Initialization for newly allocated memory. */
>> +        if (i >= old_n_rxq) {
>> +            port->rxqs[i].core_id = -1;
>> +        }
> 
> The above is not necessary, dpif_netdev_port_set_rxq_affinity() will
> set the appropriate affinity, right?

Yes. You're right. Thanks.

>>         port->n_rxq++;
>>     }
>>
>> +    /* Parse affinity list to apply configuration for new queues. */
>> +    dpif_netdev_port_set_rxq_affinity(port, port->rxq_affinity_list);
>> +
>>     return 0;
>> }
>>
>> @@ -2737,7 +2852,7 @@ dpif_netdev_run(struct dpif *dpif)
>>             int i;
>>
>>             for (i = 0; i < port->n_rxq; i++) {
>> -                dp_netdev_process_rxq_port(non_pmd, port, port->rxq[i]);
>> +                dp_netdev_process_rxq_port(non_pmd, port, port->rxqs[i].rxq);
>>             }
>>         }
>>     }
>> @@ -2777,7 +2892,7 @@ dpif_netdev_wait(struct dpif *dpif)
>>             int i;
>>
>>             for (i = 0; i < port->n_rxq; i++) {
>> -                netdev_rxq_wait(port->rxq[i]);
>> +                netdev_rxq_wait(port->rxqs[i].rxq);
>>             }
>>         }
>>     }
>> @@ -3256,9 +3371,9 @@ dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp,
>> }
>>
>>
>> -/* Returns PMD thread from this numa node with fewer rx queues to poll.
>> - * Returns NULL if there is no PMD threads on this numa node.
>> - * Can be called safely only by main thread. */
>> +/* Returns non-isolated PMD thread from this numa node with fewer
>> + * rx queues to poll. Returns NULL if there is no non-isolated  PMD threads
> 
> Double space
> 
> s/threads/thread/

Thanks. I'll fix this in v4.

Best regards, Ilya Maximets.

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 64a4b29..cf93b52 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -224,13 +224,27 @@ struct dp_netdev {
      * 'struct dp_netdev_pmd_thread' in 'per_pmd_key'. */
     ovsthread_key_t per_pmd_key;
 
+    struct seq *reconfigure_seq;
+    uint64_t last_reconfigure_seq;
+
     /* Cpu mask for pin of pmd threads. */
-    char *requested_pmd_cmask;
     char *pmd_cmask;
 
     uint64_t last_tnl_conf_seq;
 };
 
+static void
+dp_netdev_request_reconfigure(struct dp_netdev *dp)
+{
+    seq_change(dp->reconfigure_seq);
+}
+
+static bool
+dp_netdev_is_reconf_required(struct dp_netdev *dp)
+{
+    return seq_read(dp->reconfigure_seq) != dp->last_reconfigure_seq;
+}
+
 static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp,
                                                     odp_port_t)
     OVS_REQUIRES(dp->port_mutex);
@@ -954,6 +968,9 @@ create_dp_netdev(const char *name, const struct dpif_class *class,
     dp->port_seq = seq_create();
     fat_rwlock_init(&dp->upcall_rwlock);
 
+    dp->reconfigure_seq = seq_create();
+    dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
+
     /* Disable upcalls by default. */
     dp_netdev_disable_upcall(dp);
     dp->upcall_aux = NULL;
@@ -2546,11 +2563,13 @@ dpif_netdev_pmd_set(struct dpif *dpif, const char *cmask)
 {
     struct dp_netdev *dp = get_dp_netdev(dpif);
 
-    if (!nullable_string_is_equal(dp->requested_pmd_cmask, cmask)) {
-        free(dp->requested_pmd_cmask);
-        dp->requested_pmd_cmask = nullable_xstrdup(cmask);
+    if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
+        free(dp->pmd_cmask);
+        dp->pmd_cmask = nullable_xstrdup(cmask);
     }
 
+    dp_netdev_request_reconfigure(dp);
+
     return 0;
 }
 
@@ -2639,7 +2658,7 @@ dpif_netdev_port_set_config(struct dpif *dpif, odp_port_t port_no,
     free(port->rxq_affinity_list);
     port->rxq_affinity_list = nullable_xstrdup(affinity_list);
 
-    reconfigure_pmd_threads(dp);
+    dp_netdev_request_reconfigure(dp);
 unlock:
     ovs_mutex_unlock(&dp->port_mutex);
     return error;
@@ -2796,6 +2815,8 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
 {
     struct dp_netdev_port *port, *next;
 
+    dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
+
     dp_netdev_destroy_all_pmds(dp);
 
     HMAP_FOR_EACH_SAFE (port, next, node, &dp->ports) {
@@ -2809,10 +2830,7 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
         }
     }
     /* Reconfigures the cpu mask. */
-    ovs_numa_set_cpu_mask(dp->requested_pmd_cmask);
-    free(dp->pmd_cmask);
-    dp->pmd_cmask = nullable_xstrdup(dp->requested_pmd_cmask);
-
+    ovs_numa_set_cpu_mask(dp->pmd_cmask);
     /* Restores the non-pmd. */
     dp_netdev_set_nonpmd(dp);
     /* Restores all pmd threads. */
@@ -2861,8 +2879,7 @@ dpif_netdev_run(struct dpif *dpif)
 
     dp_netdev_pmd_unref(non_pmd);
 
-    if (!nullable_string_is_equal(dp->pmd_cmask, dp->requested_pmd_cmask)
-        || ports_require_restart(dp)) {
+    if (dp_netdev_is_reconf_required(dp) || ports_require_restart(dp)) {
         reconfigure_pmd_threads(dp);
     }
     ovs_mutex_unlock(&dp->port_mutex);