From patchwork Wed Nov 2 06:17:25 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Liang Z" X-Patchwork-Id: 690267 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3t7z0374P3z9tkk for ; Wed, 2 Nov 2016 17:36:35 +1100 (AEDT) Received: from localhost ([::1]:52874 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c1p9z-0007Wh-RG for incoming@patchwork.ozlabs.org; Wed, 02 Nov 2016 02:36:31 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56523) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c1p4b-00030H-9U for qemu-devel@nongnu.org; Wed, 02 Nov 2016 02:30:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c1p4X-0004pR-89 for qemu-devel@nongnu.org; Wed, 02 Nov 2016 02:30:57 -0400 Received: from mga06.intel.com ([134.134.136.31]:49859) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c1p4W-0004p1-SO for qemu-devel@nongnu.org; Wed, 02 Nov 2016 02:30:53 -0400 Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP; 01 Nov 2016 23:30:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos; i="5.31,583,1473145200"; d="scan'208"; a="1062703253" Received: from ll.sh.intel.com (HELO localhost) ([10.239.13.123]) by fmsmga001.fm.intel.com with ESMTP; 01 Nov 2016 23:30:48 -0700 From: Liang Li To: mst@redhat.com, dave.hansen@intel.com Date: Wed, 2 Nov 2016 14:17:25 +0800 Message-Id: <1478067447-24654-6-git-send-email-liang.z.li@intel.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1478067447-24654-1-git-send-email-liang.z.li@intel.com> References: <1478067447-24654-1-git-send-email-liang.z.li@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.31 Subject: [Qemu-devel] [PATCH kernel v4 5/7] mm: add the related functions to get unused page X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: virtio-dev@lists.oasis-open.org, cornelia.huck@de.ibm.com, kvm@vger.kernel.org, quintela@redhat.com, linux-kernel@vger.kernel.org, Liang Li , qemu-devel@nongnu.org, dgilbert@redhat.com, linux-mm@kvack.org, amit.shah@redhat.com, pbonzini@redhat.com, Andrew Morton , virtualization@lists.linux-foundation.org, mgorman@techsingularity.net Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Save the unused page info into a split page bitmap. The virtio balloon driver will use this new API to get the unused page bitmap and send the bitmap to hypervisor(QEMU) to speed up live migration. During sending the bitmap, some the pages may be modified and are no free anymore, this inaccuracy can be corrected by the dirty page logging mechanism. Signed-off-by: Liang Li Cc: Andrew Morton Cc: Mel Gorman Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Cornelia Huck Cc: Amit Shah Cc: Dave Hansen --- include/linux/mm.h | 2 ++ mm/page_alloc.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index f47862a..7014d8a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1773,6 +1773,8 @@ extern void free_area_init_node(int nid, unsigned long * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); extern unsigned long get_max_pfn(void); +extern int get_unused_pages(unsigned long start_pfn, unsigned long end_pfn, + unsigned long *bitmap[], unsigned long len, unsigned int nr_bmap); /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 12cc8ed..72537cc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4438,6 +4438,91 @@ unsigned long get_max_pfn(void) } EXPORT_SYMBOL(get_max_pfn); +static void mark_unused_pages_bitmap(struct zone *zone, + unsigned long start_pfn, unsigned long end_pfn, + unsigned long *bitmap[], unsigned long bits, + unsigned int nr_bmap) +{ + unsigned long pfn, flags, nr_pg, pos, *bmap; + unsigned int order, i, t, bmap_idx; + struct list_head *curr; + + if (zone_is_empty(zone)) + return; + + end_pfn = min(start_pfn + nr_bmap * bits, end_pfn); + spin_lock_irqsave(&zone->lock, flags); + + for_each_migratetype_order(order, t) { + list_for_each(curr, &zone->free_area[order].free_list[t]) { + pfn = page_to_pfn(list_entry(curr, struct page, lru)); + if (pfn < start_pfn || pfn >= end_pfn) + continue; + nr_pg = 1UL << order; + if (pfn + nr_pg > end_pfn) + nr_pg = end_pfn - pfn; + bmap_idx = (pfn - start_pfn) / bits; + if (bmap_idx == (pfn + nr_pg - start_pfn) / bits) { + bmap = bitmap[bmap_idx]; + pos = (pfn - start_pfn) % bits; + bitmap_set(bmap, pos, nr_pg); + } else + for (i = 0; i < nr_pg; i++) { + pos = pfn - start_pfn + i; + bmap_idx = pos / bits; + bmap = bitmap[bmap_idx]; + pos = pos % bits; + bitmap_set(bmap, pos, 1); + } + } + } + + spin_unlock_irqrestore(&zone->lock, flags); +} + +/* + * During live migration, page is always discardable unless it's + * content is needed by the system. + * get_unused_pages provides an API to get the unused pages, these + * unused pages can be discarded if there is no modification since + * the request. Some other mechanism, like the dirty page logging + * can be used to track the modification. + * + * This function scans the free page list to get the unused pages + * whose pfn are range from start_pfn to end_pfn, and set the + * corresponding bit in the bitmap if an unused page is found. + * + * Allocating a large bitmap may fail because of fragmentation, + * instead of using a single bitmap, we use a scatter/gather bitmap. + * The 'bitmap' is the start address of an array which contains + * 'nr_bmap' separate small bitmaps, each bitmap contains 'bits' bits. + * + * return -1 if parameters are invalid + * return 0 when end_pfn >= max_pfn + * return 1 when end_pfn < max_pfn + */ +int get_unused_pages(unsigned long start_pfn, unsigned long end_pfn, + unsigned long *bitmap[], unsigned long bits, unsigned int nr_bmap) +{ + struct zone *zone; + int ret = 0; + + if (bitmap == NULL || *bitmap == NULL || nr_bmap == 0 || + bits == 0 || start_pfn > end_pfn) + return -1; + if (end_pfn < max_pfn) + ret = 1; + if (end_pfn >= max_pfn) + ret = 0; + + for_each_populated_zone(zone) + mark_unused_pages_bitmap(zone, start_pfn, end_pfn, bitmap, + bits, nr_bmap); + + return ret; +} +EXPORT_SYMBOL(get_unused_pages); + static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) { zoneref->zone = zone;