From patchwork Tue Oct 16 11:10:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiao Guangrong X-Patchwork-Id: 984693 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="WQRoGecK"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42ZCL10Gxdz9s3Z for ; Tue, 16 Oct 2018 22:11:19 +1100 (AEDT) Received: from localhost ([::1]:57260 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gCNFs-0000X4-8t for incoming@patchwork.ozlabs.org; Tue, 16 Oct 2018 07:11:16 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36749) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gCNFG-0000Wt-E1 for qemu-devel@nongnu.org; Tue, 16 Oct 2018 07:10:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gCNFB-0002uD-UH for qemu-devel@nongnu.org; Tue, 16 Oct 2018 07:10:37 -0400 Received: from mail-pl1-x644.google.com ([2607:f8b0:4864:20::644]:36702) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gCNF2-0002WU-Ip for qemu-devel@nongnu.org; Tue, 16 Oct 2018 07:10:26 -0400 Received: by mail-pl1-x644.google.com with SMTP id y11-v6so10858092plt.3 for ; Tue, 16 Oct 2018 04:10:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=egtjafjGNx1bd7Aemabc38u1pXJc/hKWa8VoqOGZy84=; b=WQRoGecKOts8p/upuCQF+3lEMzVgRmvjl4p4ZcAjrA9QdKh00BXSNuNy1fQLOPP1hK CqmW8ZKsZ7SiaGw+FabT9sg/zOv3uLNbkpW+MkD2ENua5wjxeYKEwufEAzWqJ5Dv1V37 BWLD0QzYTfgXnIZPTqJUQ0ts0eeFDnRoZP7vHQzwVpRThRbXArMarrvelOWIgNAoeXS/ TwbV+nOwyUg3rOJf7WW6WGcBdj1eMB2ffaPG/RCaFm8KHwrH/mFIt9iNDgnFRLnLqNVl 5pC8h/pw3lSJJwXxmSM7VgVhdf4J+OWpbiN+fIeIuAajEfxUBh8+yN8gwaZkefUwFBqD wLYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=egtjafjGNx1bd7Aemabc38u1pXJc/hKWa8VoqOGZy84=; b=c+SSwx8AwhTzEmRnIlDYUP1+f6IyXXspuS4Xv4GnhvSR9P0bEm3cKaxbPeW+GEoJ2U XGScvrg7f+Vmq6UaNWrArOs0pRczsD6BXMz5uAEkru6HWzkpsqQg/ZIarMwOUnbQ2apV SpvGjyRmOpb7C7CLjwZZies1ltPsRavu6urjaEIya8MBy7IurGH0bydVyq55/yqivI/P 8q8/caNZ01bB5qvs8co1wl7UWWIWxk4GPoHB4AAvlUxuEpWgvUHU+gYV3yqibrNJB6c+ MgbtdMYqXey0HiNofyqQt87gOFI8WU5XO+8VHZIC+Jxncbystv1JhSqe381+df0c1gIt 8LYw== X-Gm-Message-State: ABuFfojoLdhjiCHOaYceZ8L4tdCVYj7TsnipPvE5ycznGFU+zZI1AK58 dha5hI4h8yYwsFY7KcMPBM4= X-Google-Smtp-Source: ACcGV61MYLoeUci/nO6rBgG9ImIZmeZf7TZzZL++OGn7B6ZComONUUIfAuQ/Ih1eC4M4dtMmf+wJWA== X-Received: by 2002:a17:902:720b:: with SMTP id ba11-v6mr20778312plb.199.1539688215962; Tue, 16 Oct 2018 04:10:15 -0700 (PDT) Received: from localhost.localdomain ([203.205.141.40]) by smtp.gmail.com with ESMTPSA id p24-v6sm18054927pgm.70.2018.10.16.04.10.12 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 16 Oct 2018 04:10:15 -0700 (PDT) From: guangrong.xiao@gmail.com X-Google-Original-From: xiaoguangrong@tencent.com To: pbonzini@redhat.com, mst@redhat.com, mtosatti@redhat.com Date: Tue, 16 Oct 2018 19:10:02 +0800 Message-Id: <20181016111006.629-1-xiaoguangrong@tencent.com> X-Mailer: git-send-email 2.14.4 MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::644 Subject: [Qemu-devel] [PATCH 0/4] migration: improve multithreads X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kvm@vger.kernel.org, quintela@redhat.com, Xiao Guangrong , qemu-devel@nongnu.org, peterx@redhat.com, dgilbert@redhat.com, wei.w.wang@intel.com, jiang.biao2@zte.com.cn Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: Xiao Guangrong This is the last part of our previous work: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00526.html This part finally improves the multithreads model used by compression and decompression, that makes the compression feature is really usable in the production. Comparing with the previous version, we 1. port ptr_ring from linux kernel and use it to instead of lockless ring designed by ourself ( Michael, i added myself to the list of author in that file, if you dislike it, i'm fine to drop it. :) ) 2 search all threads to detect if it has free room in its local ring to contain a request instead of RR to reduce busy-ratio Background ---------- Current implementation of compression and decompression are very hard to be enabled on productions. We noticed that too many wait-wakes go to kernel space and CPU usages are very low even if the system is really free The reasons are: 1) there are two many locks used to do synchronous,there   is a global lock and each single thread has its own lock,   migration thread and work threads need to go to sleep if   these locks are busy 2) migration thread separately submits request to the thread however, only one request can be pended, that means, the thread has to go to sleep after finishing the request Our Ideas --------- To make it work better, we introduce a lockless multithread model, the user, currently it is the migration thread, submits request to each thread which has its own ring whose capacity is 4 and puts the result to a global ring where the user fetches result out and do remaining operations for the request, e.g, posting the compressed data out for migration on the source QEMU Performance Result ----------------- We tested live migration on two hosts: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz * 64 + 256G memory to migration a VM between each other, which has 16 vCPUs and 120G memory, during the migration, multiple threads are repeatedly writing the memory in the VM We used 16 threads on the destination to decompress the data and on the source, we tried 4, 8 and 16 threads to compress the data 1) 4 threads, compress-wait-thread = off CPU usages main thread compression threads ----------------------------------------------- before 66.2 32.4~36.8 after 56.5 59.4~60.9 Migration result total time busy-ratio -------------------------------------------------- before 247371 0.54 after 138326 0.55 2) 4 threads, compress-wait-thread = on CPU usages main thread compression threads ----------------------------------------------- before 55.1 51.0~63.3 after 99.9 99.9 Migration result total time busy-ratio -------------------------------------------------- before CAN'T COMPLETE 0 after 338692 0 3) 8 threads, compress-wait-thread = off CPU usages main thread compression threads ----------------------------------------------- before 43.3 17.5~32.5 after 54.5 54.5~56.8 Migration result total time busy-ratio -------------------------------------------------- before 427384 0.19 after 125066 0.38 4) 8 threads, compress-wait-thread = on CPU usages main thread compression threads ----------------------------------------------- before 96.3 2.3~46.8 after 90.6 90.6~91.8 Migration result total time busy-ratio -------------------------------------------------- before CAN'T COMPLETE 0 after 164426 0 5) 16 threads, compress-wait-thread = off CPU usages main thread compression threads ----------------------------------------------- before 56.2 6.2~56.2 after 37.8 37.8~40.2 Migration result total time busy-ratio -------------------------------------------------- before 2317123 0.02 after 149006 0.02 5) 16 threads, compress-wait-thread = on CPU usages main thread compression threads ----------------------------------------------- before 48.3 1.7~31.0 after 43.9 42.1~45.6 Migration result total time busy-ratio -------------------------------------------------- before 1792817 0.00 after 161423 0.00 Xiao Guangrong (4): ptr_ring: port ptr_ring from linux kernel to QEMU migration: introduce lockless multithreads model migration: use lockless Multithread model for compression migration: use lockless Multithread model for decompression include/qemu/lockless-threads.h | 63 +++++ include/qemu/ptr_ring.h | 235 ++++++++++++++++++ migration/ram.c | 535 +++++++++++++++------------------------- util/Makefile.objs | 1 + util/lockless-threads.c | 373 ++++++++++++++++++++++++++++ 5 files changed, 865 insertions(+), 342 deletions(-) create mode 100644 include/qemu/lockless-threads.h create mode 100644 include/qemu/ptr_ring.h create mode 100644 util/lockless-threads.c