From patchwork Tue Oct 16 11:10:02 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Xiao Guangrong <guangrong.xiao@gmail.com>
X-Patchwork-Id: 984693
Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=nongnu.org
	(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;
	envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: ozlabs.org;
	dkim=fail reason="signature verification failed" (2048-bit key;
	unprotected) header.d=gmail.com header.i=@gmail.com
	header.b="WQRoGecK"; dkim-atps=neutral
Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 42ZCL10Gxdz9s3Z
	for <incoming@patchwork.ozlabs.org>;
	Tue, 16 Oct 2018 22:11:19 +1100 (AEDT)
Received: from localhost ([::1]:57260 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from
	<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)
	id 1gCNFs-0000X4-8t
	for incoming@patchwork.ozlabs.org; Tue, 16 Oct 2018 07:11:16 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:36749)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <guangrong.xiao@gmail.com>) id 1gCNFG-0000Wt-E1
	for qemu-devel@nongnu.org; Tue, 16 Oct 2018 07:10:39 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <guangrong.xiao@gmail.com>) id 1gCNFB-0002uD-UH
	for qemu-devel@nongnu.org; Tue, 16 Oct 2018 07:10:37 -0400
Received: from mail-pl1-x644.google.com ([2607:f8b0:4864:20::644]:36702)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <guangrong.xiao@gmail.com>)
	id 1gCNF2-0002WU-Ip
	for qemu-devel@nongnu.org; Tue, 16 Oct 2018 07:10:26 -0400
Received: by mail-pl1-x644.google.com with SMTP id y11-v6so10858092plt.3
	for <qemu-devel@nongnu.org>; Tue, 16 Oct 2018 04:10:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=from:to:cc:subject:date:message-id:mime-version
	:content-transfer-encoding;
	bh=egtjafjGNx1bd7Aemabc38u1pXJc/hKWa8VoqOGZy84=;
	b=WQRoGecKOts8p/upuCQF+3lEMzVgRmvjl4p4ZcAjrA9QdKh00BXSNuNy1fQLOPP1hK
	CqmW8ZKsZ7SiaGw+FabT9sg/zOv3uLNbkpW+MkD2ENua5wjxeYKEwufEAzWqJ5Dv1V37
	BWLD0QzYTfgXnIZPTqJUQ0ts0eeFDnRoZP7vHQzwVpRThRbXArMarrvelOWIgNAoeXS/
	TwbV+nOwyUg3rOJf7WW6WGcBdj1eMB2ffaPG/RCaFm8KHwrH/mFIt9iNDgnFRLnLqNVl
	5pC8h/pw3lSJJwXxmSM7VgVhdf4J+OWpbiN+fIeIuAajEfxUBh8+yN8gwaZkefUwFBqD
	wLYQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version
	:content-transfer-encoding;
	bh=egtjafjGNx1bd7Aemabc38u1pXJc/hKWa8VoqOGZy84=;
	b=c+SSwx8AwhTzEmRnIlDYUP1+f6IyXXspuS4Xv4GnhvSR9P0bEm3cKaxbPeW+GEoJ2U
	XGScvrg7f+Vmq6UaNWrArOs0pRczsD6BXMz5uAEkru6HWzkpsqQg/ZIarMwOUnbQ2apV
	SpvGjyRmOpb7C7CLjwZZies1ltPsRavu6urjaEIya8MBy7IurGH0bydVyq55/yqivI/P
	8q8/caNZ01bB5qvs8co1wl7UWWIWxk4GPoHB4AAvlUxuEpWgvUHU+gYV3yqibrNJB6c+
	MgbtdMYqXey0HiNofyqQt87gOFI8WU5XO+8VHZIC+Jxncbystv1JhSqe381+df0c1gIt
	8LYw==
X-Gm-Message-State: ABuFfojoLdhjiCHOaYceZ8L4tdCVYj7TsnipPvE5ycznGFU+zZI1AK58
	dha5hI4h8yYwsFY7KcMPBM4=
X-Google-Smtp-Source: 
 ACcGV61MYLoeUci/nO6rBgG9ImIZmeZf7TZzZL++OGn7B6ZComONUUIfAuQ/Ih1eC4M4dtMmf+wJWA==
X-Received: by 2002:a17:902:720b:: with SMTP id
	ba11-v6mr20778312plb.199.1539688215962;
	Tue, 16 Oct 2018 04:10:15 -0700 (PDT)
Received: from localhost.localdomain ([203.205.141.40])
	by smtp.gmail.com with ESMTPSA id
	p24-v6sm18054927pgm.70.2018.10.16.04.10.12
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Tue, 16 Oct 2018 04:10:15 -0700 (PDT)
From: guangrong.xiao@gmail.com
X-Google-Original-From: xiaoguangrong@tencent.com
To: pbonzini@redhat.com,
	mst@redhat.com,
	mtosatti@redhat.com
Date: Tue, 16 Oct 2018 19:10:02 +0800
Message-Id: <20181016111006.629-1-xiaoguangrong@tencent.com>
X-Mailer: git-send-email 2.14.4
MIME-Version: 1.0
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:4864:20::644
Subject: [Qemu-devel] [PATCH 0/4] migration: improve multithreads
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: kvm@vger.kernel.org, quintela@redhat.com,
	Xiao Guangrong <xiaoguangrong@tencent.com>,
	qemu-devel@nongnu.org, peterx@redhat.com, dgilbert@redhat.com,
	wei.w.wang@intel.com, jiang.biao2@zte.com.cn
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

From: Xiao Guangrong <xiaoguangrong@tencent.com>

This is the last part of our previous work:
   https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00526.html

This part finally improves the multithreads model used by compression
and decompression, that makes the compression feature is really usable
in the production.

Comparing with the previous version, we
1. port ptr_ring from linux kernel and use it to instead of lockless
   ring designed by ourself

   ( Michael, i added myself to the list of author in that file, if
    you dislike it, i'm fine to drop it. :) )

2  search all threads to detect if it has free room in its local ring
   to contain a request instead of RR to reduce busy-ratio

Background
----------
Current implementation of compression and decompression are very
hard to be enabled on productions. We noticed that too many wait-wakes
go to kernel space and CPU usages are very low even if the system
is really free

The reasons are:
1) there are two many locks used to do synchronous，there
　　is a global lock and each single thread has its own lock,
　　migration thread and work threads need to go to sleep if
　　these locks are busy

2) migration thread separately submits request to the thread
   however, only one request can be pended, that means, the
   thread has to go to sleep after finishing the request

Our Ideas
---------
To make it work better, we introduce a lockless multithread model,
the user, currently it is the migration thread, submits request
to each thread which has its own ring whose capacity is 4 and
puts the result to a global ring where the user fetches result
out and do remaining operations for the request, e.g, posting the
compressed data out for migration on the source QEMU

Performance Result
-----------------
We tested live migration on two hosts:
   Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz * 64 + 256G memory

to migration a VM between each other, which has 16 vCPUs and 120G
memory, during the migration, multiple threads are repeatedly writing
the memory in the VM

We used 16 threads on the destination to decompress the data and on the
source, we tried 4, 8 and 16 threads to compress the data

1) 4 threads， compress-wait-thread = off

CPU usages

         main thread      compression threads
-----------------------------------------------
before    66.2              32.4~36.8
after     56.5              59.4~60.9

Migration result

         total time        busy-ratio
--------------------------------------------------
before   247371             0.54
after    138326             0.55

2) 4 threads， compress-wait-thread = on

CPU usages

         main thread      compression threads
-----------------------------------------------
before    55.1              51.0~63.3
after     99.9              99.9

Migration result

         total time        busy-ratio
--------------------------------------------------
before   CAN'T COMPLETE    0
after    338692            0

3) 8 threads， compress-wait-thread = off

CPU usages

         main thread      compression threads
-----------------------------------------------
before    43.3              17.5~32.5
after     54.5              54.5~56.8

Migration result

         total time        busy-ratio
--------------------------------------------------
before   427384            0.19
after    125066            0.38

4) 8 threads， compress-wait-thread = on
CPU usages

         main thread      compression threads
-----------------------------------------------
before    96.3              2.3~46.8
after     90.6              90.6~91.8

Migration result

         total time        busy-ratio
--------------------------------------------------
before   CAN'T COMPLETE    0
after    164426            0

5) 16 threads， compress-wait-thread = off
CPU usages

         main thread      compression threads
-----------------------------------------------
before    56.2              6.2~56.2
after     37.8              37.8~40.2

Migration result

         total time        busy-ratio
--------------------------------------------------
before   2317123           0.02
after    149006            0.02

5) 16 threads， compress-wait-thread = on
CPU usages

         main thread      compression threads
-----------------------------------------------
before    48.3               1.7~31.0
after     43.9              42.1~45.6

Migration result

         total time        busy-ratio
--------------------------------------------------
before   1792817           0.00
after    161423            0.00

Xiao Guangrong (4):
  ptr_ring: port ptr_ring from linux kernel to QEMU
  migration: introduce lockless multithreads model
  migration: use lockless Multithread model for compression
  migration: use lockless Multithread model for decompression

 include/qemu/lockless-threads.h |  63 +++++
 include/qemu/ptr_ring.h         | 235 ++++++++++++++++++
 migration/ram.c                 | 535 +++++++++++++++-------------------------
 util/Makefile.objs              |   1 +
 util/lockless-threads.c         | 373 ++++++++++++++++++++++++++++
 5 files changed, 865 insertions(+), 342 deletions(-)
 create mode 100644 include/qemu/lockless-threads.h
 create mode 100644 include/qemu/ptr_ring.h
 create mode 100644 util/lockless-threads.c