Message ID | 20180416143026.24561-3-stephen@that.guru |
---|---|
State | Changes Requested |
Delegated to: | Ian Stokes |
Headers | show |
Series | Split up the DPDK how-to | expand |
> This continues the breakup of the huge DPDK "howto" into smaller > components. There are a couple of related changes included, such as using > "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured. > > Signed-off-by: Stephen Finucane <stephen@that.guru> > --- > v2: > - Add cross-references from 'pmd' doc to 'vhost-user' and 'phy' docs > - Add 'versionchanged' warning about automatic assignment of Rx queues > - Add a 'todo' to describe Tx queue behavior > --- > Documentation/howto/dpdk.rst | 86 ----------------- > Documentation/topics/dpdk/index.rst | 1 + > Documentation/topics/dpdk/phy.rst | 12 +++ > Documentation/topics/dpdk/pmd.rst | 156 > +++++++++++++++++++++++++++++++ > Documentation/topics/dpdk/vhost-user.rst | 17 ++-- > 5 files changed, 177 insertions(+), 95 deletions(-) create mode 100644 > Documentation/topics/dpdk/pmd.rst > > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst > index 79b626c76..388728363 100644 > --- a/Documentation/howto/dpdk.rst > +++ b/Documentation/howto/dpdk.rst > @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run:: > $ ovs-appctl -t ovsdb-server exit > $ ovs-vsctl del-br br0 > > -PMD Thread Statistics > ---------------------- > - > -To show current stats:: > - > - $ ovs-appctl dpif-netdev/pmd-stats-show > - > -To clear previous stats:: > - > - $ ovs-appctl dpif-netdev/pmd-stats-clear > - > -Port/RXQ Assigment to PMD Threads > ---------------------------------- > - > -To show port/rxq assignment:: > - > - $ ovs-appctl dpif-netdev/pmd-rxq-show > - > -To change default rxq assignment to pmd threads, rxqs may be manually > pinned to -desired cores using:: > - > - $ ovs-vsctl set Interface <iface> \ > - other_config:pmd-rxq-affinity=<rxq-affinity-list> > - > -where: > - > -- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` > values > - > -For example:: > - > - $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \ > - other_config:pmd-rxq-affinity="0:3,1:7,3:8" > - > -This will ensure: > - > -- Queue #0 pinned to core 3 > -- Queue #1 pinned to core 7 > -- Queue #2 not pinned > -- Queue #3 pinned to core 8 > - > -After that PMD threads on cores where RX queues was pinned will become - > ``isolated``. This means that this thread will poll only pinned RX queues. > - > -.. warning:: > - If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues > will > - not be polled. Also, if provided ``core_id`` is not available (ex. this > - ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by > any PMD > - thread. > - > -If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds > (cores) -automatically. The processing cycles that have been stored for > each rxq -will be used where known to assign rxqs to pmd based on a round > robin of the -sorted rxqs. > - > -For example, in the case where here there are 5 rxqs and 3 cores (e.g. > 3,7,8) -available, and the measured usage of core cycles per rxq over the > last -interval is seen to be: > - > -- Queue #0: 30% > -- Queue #1: 80% > -- Queue #3: 60% > -- Queue #4: 70% > -- Queue #5: 10% > - > -The rxqs will be assigned to cores 3,7,8 in the following order: > - > -Core 3: Q1 (80%) | > -Core 7: Q4 (70%) | Q5 (10%) > -core 8: Q3 (60%) | Q0 (30%) > - > -To see the current measured usage history of pmd core cycles for each > rxq:: > - > - $ ovs-appctl dpif-netdev/pmd-rxq-show > - > -.. note:: > - > - A history of one minute is recorded and shown for each rxq to allow for > - traffic pattern spikes. An rxq's pmd core cycles usage changes due to > traffic > - pattern or reconfig changes will take one minute before they are fully > - reflected in the stats. > - > -Rxq to pmds assignment takes place whenever there are configuration > changes -or can be triggered by using:: > - > - $ ovs-appctl dpif-netdev/pmd-rxq-rebalance > - > QoS > --- > > diff --git a/Documentation/topics/dpdk/index.rst > b/Documentation/topics/dpdk/index.rst > index 5f836a6e9..dfde88377 100644 > --- a/Documentation/topics/dpdk/index.rst > +++ b/Documentation/topics/dpdk/index.rst > @@ -31,3 +31,4 @@ The DPDK Datapath > phy > vhost-user > ring > + pmd > diff --git a/Documentation/topics/dpdk/phy.rst > b/Documentation/topics/dpdk/phy.rst > index a3f8b475c..ad191dad0 100644 > --- a/Documentation/topics/dpdk/phy.rst > +++ b/Documentation/topics/dpdk/phy.rst > @@ -113,3 +113,15 @@ tool:: > For more information, refer to the `DPDK documentation <dpdk-drivers>`__. > > .. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html > + > +.. _dpdk-phy-multiqueue: > + > +Multiqueue > +---------- > + > +Poll Mode Driver (PMD) threads are the threads that do the heavy > +lifting for the DPDK datapath. Correct configuration of PMD threads and > +the Rx queues they utilize is a requirement in order to deliver the > +high-performance possible with DPDK acceleration. It is possible to > +configure multiple Rx queues for ``dpdk`` ports, thus ensuring this is > +not a bottleneck for performance. For information on configuring PMD > threads, refer to :doc:`pmd`. > diff --git a/Documentation/topics/dpdk/pmd.rst > b/Documentation/topics/dpdk/pmd.rst > new file mode 100644 > index 000000000..1be25ade0 > --- /dev/null > +++ b/Documentation/topics/dpdk/pmd.rst Will cause compilation failure, pmd.rst not listed in Documentation/automake.mk > @@ -0,0 +1,156 @@ > +.. > + Licensed under the Apache License, Version 2.0 (the "License"); you > may > + not use this file except in compliance with the License. You may > obtain > + a copy of the License at > + > + http://www.apache.org/licenses/LICENSE-2.0 > + > + Unless required by applicable law or agreed to in writing, software > + distributed under the License is distributed on an "AS IS" BASIS, > WITHOUT > + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > See the > + License for the specific language governing permissions and > limitations > + under the License. > + > + Convention for heading levels in Open vSwitch documentation: > + > + ======= Heading 0 (reserved for the title in a document) > + ------- Heading 1 > + ~~~~~~~ Heading 2 > + +++++++ Heading 3 > + ''''''' Heading 4 > + > + Avoid deeper levels because they do not render well. > + > +=========== > +PMD Threads > +=========== > + > +Poll Mode Driver (PMD) threads are the threads that do the heavy > +lifting for the DPDK datapath and perform tasks such as continuous > +polling of input ports for packets, classifying packets once received, > +and executing actions on the packets once they are classified. > + > +PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly > +known as *rxq*\s and *txq*\s. While Tx queue configuration happens > +automatically, Rx queues can be configured by the user. This can happen > in one of two ways: > + > +- For physical interfaces, configuration is done using the > + :program:`ovs-appctl` utility. > + > +- For virtual interfaces, configuration is done using the > +:program:`ovs-appctl` > + utility, but this configuration must be reflected in the guest > +configuration > + (e.g. QEMU command line arguments). > + > +The :program:`ovs-appctl` utility also provides a number of commands > +for querying PMD threads and their respective queues. This, and all of > +the above, is discussed here. > + > +.. todo:: > + > + Add an overview of Tx queues including numbers created, how they > relate to > + PMD threads, etc. > + > +PMD Thread Statistics > +--------------------- > + > +To show current stats:: > + > + $ ovs-appctl dpif-netdev/pmd-stats-show > + > +To clear previous stats:: > + > + $ ovs-appctl dpif-netdev/pmd-stats-clear > + > +Port/Rx Queue Assigment to PMD Threads > +-------------------------------------- > + > +.. todo:: > + > + This needs a more detailed overview of *why* this should be done, > along with > + the impact on things like NUMA affinity. > + > +Correct configuration of PMD threads and the Rx queues they utilize is > +a requirement in order to achieve maximum performance. This is > +particularly true for enabling things like multiqueue for > +:ref:`physical <dpdk-phy-multiqueue>` and :ref:`vhost-user <dpdk-vhost- > user>` interfaces. > + > +To show port/Rx queue assignment:: > + > + $ ovs-appctl dpif-netdev/pmd-rxq-show > + > +Rx queues may be manually pinned to cores. This will change the default > +Rx queue assignment to PMD threads:: > + > + $ ovs-vsctl set Interface <iface> \ > + other_config:pmd-rxq-affinity=<rxq-affinity-list> > + > +where: > + > +- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` > +values > + > +For example:: > + > + $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \ > + other_config:pmd-rxq-affinity="0:3,1:7,3:8" > + > +This will ensure there are *4* Rx queues and that these queues are > +configured like so: > + > +- Queue #0 pinned to core 3 > +- Queue #1 pinned to core 7 > +- Queue #2 not pinned > +- Queue #3 pinned to core 8 > + > +PMD threads on cores where Rx queues are *pinned* will become > +*isolated*. This means that this thread will only poll the *pinned* Rx > queues. > + > +.. warning:: > + > + If there are no *non-isolated* PMD threads, *non-pinned* RX queues > will not > + be polled. Also, if the provided ``<core-id>`` is not available (e.g. > the > + ``<core-id>`` is not in ``pmd-cpu-mask``), the RX queue will not be > polled > + by any PMD thread. > + > +If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned > +to PMDs > +(cores) automatically. Where known, the processing cycles that have > +been stored for each Rx queue will be used to assign Rx queue to PMDs > +based on a round robin of the sorted Rx queues. For example, take the > +following example, where there are five Rx queues and three cores - 3, > +7, and 8 - available and the measured usage of core cycles per Rx queue > +over the last interval is seen to > +be: > + > +- Queue #0: 30% > +- Queue #1: 80% > +- Queue #3: 60% > +- Queue #4: 70% > +- Queue #5: 10% > + > +The Rx queues will be assigned to the cores in the following order:: > + > + Core 3: Q1 (80%) | > + Core 7: Q4 (70%) | Q5 (10%) > + Core 8: Q3 (60%) | Q0 (30%) > + > +To see the current measured usage history of PMD core cycles for each > +Rx > +queue:: > + > + $ ovs-appctl dpif-netdev/pmd-rxq-show > + > +.. note:: > + > + A history of one minute is recorded and shown for each Rx queue to > allow for > + traffic pattern spikes. Any changes in the Rx queue's PMD core cycles > usage, > + due to traffic pattern or reconfig changes, will take one minute to be > fully > + reflected in the stats. > + > +Rx queue to PMD assignment takes place whenever there are configuration > +changes or can be triggered by using:: > + > + $ ovs-appctl dpif-netdev/pmd-rxq-rebalance > + > +.. versionchanged:: 2.8.0 > + > + Automatic assignment of Rx queues to PMDs and the two related > commands, > + ``pmd-rxq-show`` and ``pmd-rxq-rebalance``, were added in OVS 2.8.0. > Prior > + to this, behavior was round-robin and processing cycles were not taken > into > + consideration. Tracking for stats was not available. In 2.9 the output was changed to include % usage, this wasn't present in 2.8. Could be worth mentioning. Ian > diff --git a/Documentation/topics/dpdk/vhost-user.rst > b/Documentation/topics/dpdk/vhost-user.rst > index ca8a3289f..6f794f296 100644 > --- a/Documentation/topics/dpdk/vhost-user.rst > +++ b/Documentation/topics/dpdk/vhost-user.rst > @@ -130,11 +130,10 @@ an additional set of parameters:: > -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce > -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 > > -In addition, QEMU must allocate the VM's memory on hugetlbfs. > vhost-user > -ports access a virtio-net device's virtual rings and packet buffers > mapping the -VM's physical memory on hugetlbfs. To enable vhost-user ports > to map the VM's -memory into their process address space, pass the > following parameters to > -QEMU:: > +In addition, QEMU must allocate the VM's memory on hugetlbfs. > +vhost-user ports access a virtio-net device's virtual rings and packet > +buffers mapping the VM's physical memory on hugetlbfs. To enable > +vhost-user ports to map the VM's memory into their process address space, > pass the following parameters to QEMU:: > > -object memory-backend-file,id=mem,size=4096M,mem- > path=/dev/hugepages,share=on > -numa node,memdev=mem -mem-prealloc @@ -154,18 +153,18 @@ where: > The number of vectors, which is ``$q`` * 2 + 2 > > The vhost-user interface will be automatically reconfigured with required > -number of rx and tx queues after connection of virtio device. Manual > +number of Rx and Tx queues after connection of virtio device. Manual > configuration of ``n_rxq`` is not supported because OVS will work > properly only if ``n_rxq`` will match number of queues configured in > QEMU. > > -A least 2 PMDs should be configured for the vswitch when using > multiqueue. > +A least two PMDs should be configured for the vswitch when using > multiqueue. > Using a single PMD will cause traffic to be enqueued to the same vhost > queue rather than being distributed among different vhost queues for a > vhost-user interface. > > If traffic destined for a VM configured with multiqueue arrives to the > vswitch -via a physical DPDK port, then the number of rxqs should also be > set to at -least 2 for that physical DPDK port. This is required to > increase the > +via a physical DPDK port, then the number of Rx queues should also be > +set to at least two for that physical DPDK port. This is required to > +increase the > probability that a different PMD will handle the multiqueue transmission > to the guest using a different vhost queue. > > -- > 2.14.3
On Wed, 2018-04-18 at 15:31 +0000, Stokes, Ian wrote: > > This continues the breakup of the huge DPDK "howto" into smaller > > components. There are a couple of related changes included, such as using > > "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured. > > > > Signed-off-by: Stephen Finucane <stephen@that.guru> > > --- > > v2: > > - Add cross-references from 'pmd' doc to 'vhost-user' and 'phy' docs > > - Add 'versionchanged' warning about automatic assignment of Rx queues > > - Add a 'todo' to describe Tx queue behavior > > --- > > Documentation/howto/dpdk.rst | 86 ----------------- > > Documentation/topics/dpdk/index.rst | 1 + > > Documentation/topics/dpdk/phy.rst | 12 +++ > > Documentation/topics/dpdk/pmd.rst | 156 > > +++++++++++++++++++++++++++++++ > > Documentation/topics/dpdk/vhost-user.rst | 17 ++-- > > 5 files changed, 177 insertions(+), 95 deletions(-) create mode 100644 > > Documentation/topics/dpdk/pmd.rst > > > > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst > > index 79b626c76..388728363 100644 > > --- a/Documentation/howto/dpdk.rst > > +++ b/Documentation/howto/dpdk.rst > > @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run:: > > $ ovs-appctl -t ovsdb-server exit > > $ ovs-vsctl del-br br0 > > > > -PMD Thread Statistics > > ---------------------- > > - > > -To show current stats:: > > - > > - $ ovs-appctl dpif-netdev/pmd-stats-show > > - > > -To clear previous stats:: > > - > > - $ ovs-appctl dpif-netdev/pmd-stats-clear > > - > > -Port/RXQ Assigment to PMD Threads > > ---------------------------------- > > - > > -To show port/rxq assignment:: > > - > > - $ ovs-appctl dpif-netdev/pmd-rxq-show > > - > > -To change default rxq assignment to pmd threads, rxqs may be manually > > pinned to -desired cores using:: > > - > > - $ ovs-vsctl set Interface <iface> \ > > - other_config:pmd-rxq-affinity=<rxq-affinity-list> > > - > > -where: > > - > > -- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` > > values > > - > > -For example:: > > - > > - $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \ > > - other_config:pmd-rxq-affinity="0:3,1:7,3:8" > > - > > -This will ensure: > > - > > -- Queue #0 pinned to core 3 > > -- Queue #1 pinned to core 7 > > -- Queue #2 not pinned > > -- Queue #3 pinned to core 8 > > - > > -After that PMD threads on cores where RX queues was pinned will become - > > ``isolated``. This means that this thread will poll only pinned RX queues. > > - > > -.. warning:: > > - If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues > > will > > - not be polled. Also, if provided ``core_id`` is not available (ex. this > > - ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by > > any PMD > > - thread. > > - > > -If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds > > (cores) -automatically. The processing cycles that have been stored for > > each rxq -will be used where known to assign rxqs to pmd based on a round > > robin of the -sorted rxqs. > > - > > -For example, in the case where here there are 5 rxqs and 3 cores (e.g. > > 3,7,8) -available, and the measured usage of core cycles per rxq over the > > last -interval is seen to be: > > - > > -- Queue #0: 30% > > -- Queue #1: 80% > > -- Queue #3: 60% > > -- Queue #4: 70% > > -- Queue #5: 10% > > - > > -The rxqs will be assigned to cores 3,7,8 in the following order: > > - > > -Core 3: Q1 (80%) | > > -Core 7: Q4 (70%) | Q5 (10%) > > -core 8: Q3 (60%) | Q0 (30%) > > - > > -To see the current measured usage history of pmd core cycles for each > > rxq:: > > - > > - $ ovs-appctl dpif-netdev/pmd-rxq-show > > - > > -.. note:: > > - > > - A history of one minute is recorded and shown for each rxq to allow for > > - traffic pattern spikes. An rxq's pmd core cycles usage changes due to > > traffic > > - pattern or reconfig changes will take one minute before they are fully > > - reflected in the stats. > > - > > -Rxq to pmds assignment takes place whenever there are configuration > > changes -or can be triggered by using:: > > - > > - $ ovs-appctl dpif-netdev/pmd-rxq-rebalance > > - > > QoS > > --- > > > > diff --git a/Documentation/topics/dpdk/index.rst > > b/Documentation/topics/dpdk/index.rst > > index 5f836a6e9..dfde88377 100644 > > --- a/Documentation/topics/dpdk/index.rst > > +++ b/Documentation/topics/dpdk/index.rst > > @@ -31,3 +31,4 @@ The DPDK Datapath > > phy > > vhost-user > > ring > > + pmd > > diff --git a/Documentation/topics/dpdk/phy.rst > > b/Documentation/topics/dpdk/phy.rst > > index a3f8b475c..ad191dad0 100644 > > --- a/Documentation/topics/dpdk/phy.rst > > +++ b/Documentation/topics/dpdk/phy.rst > > @@ -113,3 +113,15 @@ tool:: > > For more information, refer to the `DPDK documentation <dpdk-drivers>`__. > > > > .. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html > > + > > +.. _dpdk-phy-multiqueue: > > + > > +Multiqueue > > +---------- > > + > > +Poll Mode Driver (PMD) threads are the threads that do the heavy > > +lifting for the DPDK datapath. Correct configuration of PMD threads and > > +the Rx queues they utilize is a requirement in order to deliver the > > +high-performance possible with DPDK acceleration. It is possible to > > +configure multiple Rx queues for ``dpdk`` ports, thus ensuring this is > > +not a bottleneck for performance. For information on configuring PMD > > threads, refer to :doc:`pmd`. > > diff --git a/Documentation/topics/dpdk/pmd.rst > > b/Documentation/topics/dpdk/pmd.rst > > new file mode 100644 > > index 000000000..1be25ade0 > > --- /dev/null > > +++ b/Documentation/topics/dpdk/pmd.rst > > Will cause compilation failure, pmd.rst not listed in Documentation/automake.mk Done. > > @@ -0,0 +1,156 @@ > > +.. > > + Licensed under the Apache License, Version 2.0 (the "License"); you > > may > > + not use this file except in compliance with the License. You may > > obtain > > + a copy of the License at > > + > > + http://www.apache.org/licenses/LICENSE-2.0 > > + > > + Unless required by applicable law or agreed to in writing, software > > + distributed under the License is distributed on an "AS IS" BASIS, > > WITHOUT > > + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > See the > > + License for the specific language governing permissions and > > limitations > > + under the License. > > + > > + Convention for heading levels in Open vSwitch documentation: > > + > > + ======= Heading 0 (reserved for the title in a document) > > + ------- Heading 1 > > + ~~~~~~~ Heading 2 > > + +++++++ Heading 3 > > + ''''''' Heading 4 > > + > > + Avoid deeper levels because they do not render well. > > + > > +=========== > > +PMD Threads > > +=========== > > + > > +Poll Mode Driver (PMD) threads are the threads that do the heavy > > +lifting for the DPDK datapath and perform tasks such as continuous > > +polling of input ports for packets, classifying packets once received, > > +and executing actions on the packets once they are classified. > > + > > +PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly > > +known as *rxq*\s and *txq*\s. While Tx queue configuration happens > > +automatically, Rx queues can be configured by the user. This can happen > > in one of two ways: > > + > > +- For physical interfaces, configuration is done using the > > + :program:`ovs-appctl` utility. > > + > > +- For virtual interfaces, configuration is done using the > > +:program:`ovs-appctl` > > + utility, but this configuration must be reflected in the guest > > +configuration > > + (e.g. QEMU command line arguments). > > + > > +The :program:`ovs-appctl` utility also provides a number of commands > > +for querying PMD threads and their respective queues. This, and all of > > +the above, is discussed here. > > + > > +.. todo:: > > + > > + Add an overview of Tx queues including numbers created, how they > > relate to > > + PMD threads, etc. > > + > > +PMD Thread Statistics > > +--------------------- > > + > > +To show current stats:: > > + > > + $ ovs-appctl dpif-netdev/pmd-stats-show > > + > > +To clear previous stats:: > > + > > + $ ovs-appctl dpif-netdev/pmd-stats-clear > > + > > +Port/Rx Queue Assigment to PMD Threads > > +-------------------------------------- > > + > > +.. todo:: > > + > > + This needs a more detailed overview of *why* this should be done, > > along with > > + the impact on things like NUMA affinity. > > + > > +Correct configuration of PMD threads and the Rx queues they utilize is > > +a requirement in order to achieve maximum performance. This is > > +particularly true for enabling things like multiqueue for > > +:ref:`physical <dpdk-phy-multiqueue>` and :ref:`vhost-user <dpdk-vhost- > > user>` interfaces. > > + > > +To show port/Rx queue assignment:: > > + > > + $ ovs-appctl dpif-netdev/pmd-rxq-show > > + > > +Rx queues may be manually pinned to cores. This will change the default > > +Rx queue assignment to PMD threads:: > > + > > + $ ovs-vsctl set Interface <iface> \ > > + other_config:pmd-rxq-affinity=<rxq-affinity-list> > > + > > +where: > > + > > +- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` > > +values > > + > > +For example:: > > + > > + $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \ > > + other_config:pmd-rxq-affinity="0:3,1:7,3:8" > > + > > +This will ensure there are *4* Rx queues and that these queues are > > +configured like so: > > + > > +- Queue #0 pinned to core 3 > > +- Queue #1 pinned to core 7 > > +- Queue #2 not pinned > > +- Queue #3 pinned to core 8 > > + > > +PMD threads on cores where Rx queues are *pinned* will become > > +*isolated*. This means that this thread will only poll the *pinned* Rx > > queues. > > + > > +.. warning:: > > + > > + If there are no *non-isolated* PMD threads, *non-pinned* RX queues > > will not > > + be polled. Also, if the provided ``<core-id>`` is not available (e.g. > > the > > + ``<core-id>`` is not in ``pmd-cpu-mask``), the RX queue will not be > > polled > > + by any PMD thread. > > + > > +If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned > > +to PMDs > > +(cores) automatically. Where known, the processing cycles that have > > +been stored for each Rx queue will be used to assign Rx queue to PMDs > > +based on a round robin of the sorted Rx queues. For example, take the > > +following example, where there are five Rx queues and three cores - 3, > > +7, and 8 - available and the measured usage of core cycles per Rx queue > > +over the last interval is seen to > > +be: > > + > > +- Queue #0: 30% > > +- Queue #1: 80% > > +- Queue #3: 60% > > +- Queue #4: 70% > > +- Queue #5: 10% > > + > > +The Rx queues will be assigned to the cores in the following order:: > > + > > + Core 3: Q1 (80%) | > > + Core 7: Q4 (70%) | Q5 (10%) > > + Core 8: Q3 (60%) | Q0 (30%) > > + > > +To see the current measured usage history of PMD core cycles for each > > +Rx > > +queue:: > > + > > + $ ovs-appctl dpif-netdev/pmd-rxq-show > > + > > +.. note:: > > + > > + A history of one minute is recorded and shown for each Rx queue to > > allow for > > + traffic pattern spikes. Any changes in the Rx queue's PMD core cycles > > usage, > > + due to traffic pattern or reconfig changes, will take one minute to be > > fully > > + reflected in the stats. > > + > > +Rx queue to PMD assignment takes place whenever there are configuration > > +changes or can be triggered by using:: > > + > > + $ ovs-appctl dpif-netdev/pmd-rxq-rebalance > > + > > +.. versionchanged:: 2.8.0 > > + > > + Automatic assignment of Rx queues to PMDs and the two related > > commands, > > + ``pmd-rxq-show`` and ``pmd-rxq-rebalance``, were added in OVS 2.8.0. > > Prior > > + to this, behavior was round-robin and processing cycles were not taken > > into > > + consideration. Tracking for stats was not available. > > In 2.9 the output was changed to include % usage, this wasn't present in 2.8. Could be worth mentioning. I assume you're referring to ``pmd-rxq-show``? If not, feel free to correct what I've done in v3 at merge time :) Stephen > Ian > > > diff --git a/Documentation/topics/dpdk/vhost-user.rst > > b/Documentation/topics/dpdk/vhost-user.rst > > index ca8a3289f..6f794f296 100644 > > --- a/Documentation/topics/dpdk/vhost-user.rst > > +++ b/Documentation/topics/dpdk/vhost-user.rst > > @@ -130,11 +130,10 @@ an additional set of parameters:: > > -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce > > -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 > > > > -In addition, QEMU must allocate the VM's memory on hugetlbfs. > > vhost-user > > -ports access a virtio-net device's virtual rings and packet buffers > > mapping the -VM's physical memory on hugetlbfs. To enable vhost-user ports > > to map the VM's -memory into their process address space, pass the > > following parameters to > > -QEMU:: > > +In addition, QEMU must allocate the VM's memory on hugetlbfs. > > +vhost-user ports access a virtio-net device's virtual rings and packet > > +buffers mapping the VM's physical memory on hugetlbfs. To enable > > +vhost-user ports to map the VM's memory into their process address space, > > pass the following parameters to QEMU:: > > > > -object memory-backend-file,id=mem,size=4096M,mem- > > path=/dev/hugepages,share=on > > -numa node,memdev=mem -mem-prealloc @@ -154,18 +153,18 @@ where: > > The number of vectors, which is ``$q`` * 2 + 2 > > > > The vhost-user interface will be automatically reconfigured with required > > -number of rx and tx queues after connection of virtio device. Manual > > +number of Rx and Tx queues after connection of virtio device. Manual > > configuration of ``n_rxq`` is not supported because OVS will work > > properly only if ``n_rxq`` will match number of queues configured in > > QEMU. > > > > -A least 2 PMDs should be configured for the vswitch when using > > multiqueue. > > +A least two PMDs should be configured for the vswitch when using > > multiqueue. > > Using a single PMD will cause traffic to be enqueued to the same vhost > > queue rather than being distributed among different vhost queues for a > > vhost-user interface. > > > > If traffic destined for a VM configured with multiqueue arrives to the > > vswitch -via a physical DPDK port, then the number of rxqs should also be > > set to at -least 2 for that physical DPDK port. This is required to > > increase the > > +via a physical DPDK port, then the number of Rx queues should also be > > +set to at least two for that physical DPDK port. This is required to > > +increase the > > probability that a different PMD will handle the multiqueue transmission > > to the guest using a different vhost queue. > > > > -- > > 2.14.3 > >
diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index 79b626c76..388728363 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run:: $ ovs-appctl -t ovsdb-server exit $ ovs-vsctl del-br br0 -PMD Thread Statistics ---------------------- - -To show current stats:: - - $ ovs-appctl dpif-netdev/pmd-stats-show - -To clear previous stats:: - - $ ovs-appctl dpif-netdev/pmd-stats-clear - -Port/RXQ Assigment to PMD Threads ---------------------------------- - -To show port/rxq assignment:: - - $ ovs-appctl dpif-netdev/pmd-rxq-show - -To change default rxq assignment to pmd threads, rxqs may be manually pinned to -desired cores using:: - - $ ovs-vsctl set Interface <iface> \ - other_config:pmd-rxq-affinity=<rxq-affinity-list> - -where: - -- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values - -For example:: - - $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \ - other_config:pmd-rxq-affinity="0:3,1:7,3:8" - -This will ensure: - -- Queue #0 pinned to core 3 -- Queue #1 pinned to core 7 -- Queue #2 not pinned -- Queue #3 pinned to core 8 - -After that PMD threads on cores where RX queues was pinned will become -``isolated``. This means that this thread will poll only pinned RX queues. - -.. warning:: - If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues will - not be polled. Also, if provided ``core_id`` is not available (ex. this - ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by any PMD - thread. - -If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores) -automatically. The processing cycles that have been stored for each rxq -will be used where known to assign rxqs to pmd based on a round robin of the -sorted rxqs. - -For example, in the case where here there are 5 rxqs and 3 cores (e.g. 3,7,8) -available, and the measured usage of core cycles per rxq over the last -interval is seen to be: - -- Queue #0: 30% -- Queue #1: 80% -- Queue #3: 60% -- Queue #4: 70% -- Queue #5: 10% - -The rxqs will be assigned to cores 3,7,8 in the following order: - -Core 3: Q1 (80%) | -Core 7: Q4 (70%) | Q5 (10%) -core 8: Q3 (60%) | Q0 (30%) - -To see the current measured usage history of pmd core cycles for each rxq:: - - $ ovs-appctl dpif-netdev/pmd-rxq-show - -.. note:: - - A history of one minute is recorded and shown for each rxq to allow for - traffic pattern spikes. An rxq's pmd core cycles usage changes due to traffic - pattern or reconfig changes will take one minute before they are fully - reflected in the stats. - -Rxq to pmds assignment takes place whenever there are configuration changes -or can be triggered by using:: - - $ ovs-appctl dpif-netdev/pmd-rxq-rebalance - QoS --- diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst index 5f836a6e9..dfde88377 100644 --- a/Documentation/topics/dpdk/index.rst +++ b/Documentation/topics/dpdk/index.rst @@ -31,3 +31,4 @@ The DPDK Datapath phy vhost-user ring + pmd diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst index a3f8b475c..ad191dad0 100644 --- a/Documentation/topics/dpdk/phy.rst +++ b/Documentation/topics/dpdk/phy.rst @@ -113,3 +113,15 @@ tool:: For more information, refer to the `DPDK documentation <dpdk-drivers>`__. .. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html + +.. _dpdk-phy-multiqueue: + +Multiqueue +---------- + +Poll Mode Driver (PMD) threads are the threads that do the heavy lifting for +the DPDK datapath. Correct configuration of PMD threads and the Rx queues they +utilize is a requirement in order to deliver the high-performance possible with +DPDK acceleration. It is possible to configure multiple Rx queues for ``dpdk`` +ports, thus ensuring this is not a bottleneck for performance. For information +on configuring PMD threads, refer to :doc:`pmd`. diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst new file mode 100644 index 000000000..1be25ade0 --- /dev/null +++ b/Documentation/topics/dpdk/pmd.rst @@ -0,0 +1,156 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +=========== +PMD Threads +=========== + +Poll Mode Driver (PMD) threads are the threads that do the heavy lifting for +the DPDK datapath and perform tasks such as continuous polling of input ports +for packets, classifying packets once received, and executing actions on the +packets once they are classified. + +PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly known as +*rxq*\s and *txq*\s. While Tx queue configuration happens automatically, Rx +queues can be configured by the user. This can happen in one of two ways: + +- For physical interfaces, configuration is done using the + :program:`ovs-appctl` utility. + +- For virtual interfaces, configuration is done using the :program:`ovs-appctl` + utility, but this configuration must be reflected in the guest configuration + (e.g. QEMU command line arguments). + +The :program:`ovs-appctl` utility also provides a number of commands for +querying PMD threads and their respective queues. This, and all of the above, +is discussed here. + +.. todo:: + + Add an overview of Tx queues including numbers created, how they relate to + PMD threads, etc. + +PMD Thread Statistics +--------------------- + +To show current stats:: + + $ ovs-appctl dpif-netdev/pmd-stats-show + +To clear previous stats:: + + $ ovs-appctl dpif-netdev/pmd-stats-clear + +Port/Rx Queue Assigment to PMD Threads +-------------------------------------- + +.. todo:: + + This needs a more detailed overview of *why* this should be done, along with + the impact on things like NUMA affinity. + +Correct configuration of PMD threads and the Rx queues they utilize is a +requirement in order to achieve maximum performance. This is particularly true +for enabling things like multiqueue for :ref:`physical <dpdk-phy-multiqueue>` +and :ref:`vhost-user <dpdk-vhost-user>` interfaces. + +To show port/Rx queue assignment:: + + $ ovs-appctl dpif-netdev/pmd-rxq-show + +Rx queues may be manually pinned to cores. This will change the default Rx +queue assignment to PMD threads:: + + $ ovs-vsctl set Interface <iface> \ + other_config:pmd-rxq-affinity=<rxq-affinity-list> + +where: + +- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values + +For example:: + + $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \ + other_config:pmd-rxq-affinity="0:3,1:7,3:8" + +This will ensure there are *4* Rx queues and that these queues are configured +like so: + +- Queue #0 pinned to core 3 +- Queue #1 pinned to core 7 +- Queue #2 not pinned +- Queue #3 pinned to core 8 + +PMD threads on cores where Rx queues are *pinned* will become *isolated*. This +means that this thread will only poll the *pinned* Rx queues. + +.. warning:: + + If there are no *non-isolated* PMD threads, *non-pinned* RX queues will not + be polled. Also, if the provided ``<core-id>`` is not available (e.g. the + ``<core-id>`` is not in ``pmd-cpu-mask``), the RX queue will not be polled + by any PMD thread. + +If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to PMDs +(cores) automatically. Where known, the processing cycles that have been stored +for each Rx queue will be used to assign Rx queue to PMDs based on a round +robin of the sorted Rx queues. For example, take the following example, where +there are five Rx queues and three cores - 3, 7, and 8 - available and the +measured usage of core cycles per Rx queue over the last interval is seen to +be: + +- Queue #0: 30% +- Queue #1: 80% +- Queue #3: 60% +- Queue #4: 70% +- Queue #5: 10% + +The Rx queues will be assigned to the cores in the following order:: + + Core 3: Q1 (80%) | + Core 7: Q4 (70%) | Q5 (10%) + Core 8: Q3 (60%) | Q0 (30%) + +To see the current measured usage history of PMD core cycles for each Rx +queue:: + + $ ovs-appctl dpif-netdev/pmd-rxq-show + +.. note:: + + A history of one minute is recorded and shown for each Rx queue to allow for + traffic pattern spikes. Any changes in the Rx queue's PMD core cycles usage, + due to traffic pattern or reconfig changes, will take one minute to be fully + reflected in the stats. + +Rx queue to PMD assignment takes place whenever there are configuration changes +or can be triggered by using:: + + $ ovs-appctl dpif-netdev/pmd-rxq-rebalance + +.. versionchanged:: 2.8.0 + + Automatic assignment of Rx queues to PMDs and the two related commands, + ``pmd-rxq-show`` and ``pmd-rxq-rebalance``, were added in OVS 2.8.0. Prior + to this, behavior was round-robin and processing cycles were not taken into + consideration. Tracking for stats was not available. diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst index ca8a3289f..6f794f296 100644 --- a/Documentation/topics/dpdk/vhost-user.rst +++ b/Documentation/topics/dpdk/vhost-user.rst @@ -130,11 +130,10 @@ an additional set of parameters:: -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 -In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user -ports access a virtio-net device's virtual rings and packet buffers mapping the -VM's physical memory on hugetlbfs. To enable vhost-user ports to map the VM's -memory into their process address space, pass the following parameters to -QEMU:: +In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports +access a virtio-net device's virtual rings and packet buffers mapping the VM's +physical memory on hugetlbfs. To enable vhost-user ports to map the VM's memory +into their process address space, pass the following parameters to QEMU:: -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc @@ -154,18 +153,18 @@ where: The number of vectors, which is ``$q`` * 2 + 2 The vhost-user interface will be automatically reconfigured with required -number of rx and tx queues after connection of virtio device. Manual +number of Rx and Tx queues after connection of virtio device. Manual configuration of ``n_rxq`` is not supported because OVS will work properly only if ``n_rxq`` will match number of queues configured in QEMU. -A least 2 PMDs should be configured for the vswitch when using multiqueue. +A least two PMDs should be configured for the vswitch when using multiqueue. Using a single PMD will cause traffic to be enqueued to the same vhost queue rather than being distributed among different vhost queues for a vhost-user interface. If traffic destined for a VM configured with multiqueue arrives to the vswitch -via a physical DPDK port, then the number of rxqs should also be set to at -least 2 for that physical DPDK port. This is required to increase the +via a physical DPDK port, then the number of Rx queues should also be set to at +least two for that physical DPDK port. This is required to increase the probability that a different PMD will handle the multiqueue transmission to the guest using a different vhost queue.
This continues the breakup of the huge DPDK "howto" into smaller components. There are a couple of related changes included, such as using "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured. Signed-off-by: Stephen Finucane <stephen@that.guru> --- v2: - Add cross-references from 'pmd' doc to 'vhost-user' and 'phy' docs - Add 'versionchanged' warning about automatic assignment of Rx queues - Add a 'todo' to describe Tx queue behavior --- Documentation/howto/dpdk.rst | 86 ----------------- Documentation/topics/dpdk/index.rst | 1 + Documentation/topics/dpdk/phy.rst | 12 +++ Documentation/topics/dpdk/pmd.rst | 156 +++++++++++++++++++++++++++++++ Documentation/topics/dpdk/vhost-user.rst | 17 ++-- 5 files changed, 177 insertions(+), 95 deletions(-) create mode 100644 Documentation/topics/dpdk/pmd.rst