From patchwork Thu Mar 21 06:18:38 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Michael S. Tsirkin" X-Patchwork-Id: 229561 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 912912C00C3 for ; Thu, 21 Mar 2013 17:20:41 +1100 (EST) Received: from localhost ([::1]:60375 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIYrf-0005NJ-Mt for incoming@patchwork.ozlabs.org; Thu, 21 Mar 2013 02:20:39 -0400 Received: from eggs.gnu.org ([208.118.235.92]:45967) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIYr7-0005Hd-9v for qemu-devel@nongnu.org; Thu, 21 Mar 2013 02:20:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UIYqj-0007KD-Bz for qemu-devel@nongnu.org; Thu, 21 Mar 2013 02:20:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:13625) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIYqi-0007JW-W3 for qemu-devel@nongnu.org; Thu, 21 Mar 2013 02:19:41 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r2L6HxgH011287 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 21 Mar 2013 02:18:00 -0400 Received: from redhat.com (vpn1-4-34.ams2.redhat.com [10.36.4.34]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with SMTP id r2L6HtUI009394; Thu, 21 Mar 2013 02:17:56 -0400 Date: Thu, 21 Mar 2013 08:18:38 +0200 From: "Michael S. Tsirkin" To: "Michael R. Hines" Message-ID: <20130321061838.GA28319@redhat.com> MIME-Version: 1.0 Content-Disposition: inline X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: Roland Dreier , qemu-devel@nongnu.org, linux-rdma@vger.kernel.org, Yishai Hadas , linux-kernel@vger.kernel.org, Hal Rosenstock , Sean Hefty , Christoph Lameter Subject: [Qemu-devel] [PATCH] rdma: don't make pages writeable if not requiested X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org core/umem.c seems to get the arguments to get_user_pages in the reverse order: it sets writeable flag and breaks COW for MAP_SHARED if and only if hardware needs to write the page. This breaks memory overcommit for users such as KVM: each time we try to register a page to send it to remote, this breaks COW. It seems that for applications that only have REMOTE_READ permission, there is no reason to break COW at all. If the page that is COW has lots of copies, this makes the user process quickly exceed the cgroups memory limit. This makes RDMA mostly useless for virtualization, thus the stable tag. Reported-by: "Michael R. Hines" Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin --- Note: compile-tested only, I don't have RDMA hardware at the moment. Michael, could you please try this patch (also fixing your usespace code not to request write access) and report? Note2: grep for get_user_pages in infiniband drivers turns up lots of users who set write to 1 unconditionally. These might be bugs too, should be checked. drivers/infiniband/core/umem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index a841123..5929598 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -152,7 +152,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, ret = get_user_pages(current, current->mm, cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof (struct page *)), - 1, !umem->writable, page_list, vma_list); + !umem->writable, 1, page_list, vma_list); if (ret < 0) goto out;