Bug: mv643xxx fails with highmem

On Wed, Dec 17, 2014 at 06:18:58PM -0300, Ezequiel Garcia wrote:
> On the other side, I haven't been able to reproduce this on my boards. I
> did try to put a hack to hold most lowmem pages, but it didn't make a
> difference. (In fact, I haven't been able to clearly see how the pages
> for the skbuff are allocated from high memory.)

To be honest, I don't know either.  All that I can do is describe what
happened...

I've been running 3.17 since a week after it came out, and never saw a
problem there.

Then I moved forward to 3.18, and ended up with memory corruption, which
seemed to be the GPU scribbling over kernel text (since the oops revealed
pixel values in the Code: line.)

I thought it was a GPU problem - which seemed a reasonable assumption as
I know that the runtime PM I implemented for the GPU doesn't properly
restore the hardware state yet.  So, I rebooted back into 3.18, this
time with all GPU users disabled, intending to download a kernel with
GPU runtime PM disabled.  I'd also added additional debug to my X DDX
driver which logged the GPU command stream to a file on a NFS mount -
this does open(, O_CREAT|O_WRONLY|O_APPEND), write(), close() each
time it submits a block of commands.

However, while my scripts to download the built kernel to the Cubox
were running, the kernel oopsed in the depths of dma_map_single() - the
kernel was trying to access a struct page for phys address 0x40000000,
which didn't exist.  I decided to go back to 3.17 to get the updated
kernel on it, hoping that would sort it out.

With the updated 3.18 kernel (with GPU runtime PM disabled), I found
that I'd still get oopses in from the network driver while X was starting
up, again from dma_map_single().  So, with all GPU users again disabled,
I set about debugging the this issue.

I added a BUG_ON(!addr) after the page_address(), and that fired.  I
added a BUG_ON(PageHighMem(this_frag->page.p)) and that fired too.
(Each time, I had to boot back to 3.17 in order to download the new
kernel, because very time I tried with 3.18, I'd hit this bug.)

It's then when I reported the issue and asked the questions...

I've since done a simple change, taking advantage that on ARM (or any
asm-generic/dma-mapping-common.h user), dma_unmap_single() and
dma_unmap_page() are the same function:

                desc->l4i_chk = 0;
                desc->byte_cnt = skb_frag_size(this_frag);
-               desc->buf_ptr = dma_map_single(mp->dev->dev.parent, addr,
-                                              desc->byte_cnt, DMA_TO_DEVICE);
+               desc->buf_ptr = skb_frag_dma_map(mp->dev->dev.parent,
+                                                this_frag, 0,
+                                                desc->byte_cnt, DMA_TO_DEVICE);        }
 }

I've been running that for the last five days, and I've yet to see
/any/ issues what so ever, and that includes running with the GPU
logging all that time:

-rw-r--r-- 1 root root 17113616 Dec 17 23:52 /shared/etnaviv.bin

During that time, I've been using the device over the network, running
various git commands, running builds, running the occasional build
via NFS, etc.

So, for me it was trivially easy to reproduce (without my fix in place)
and all problems have gone away when I've fixed the apparent problem.

However, exactly how it occurs, I don't know.  My understanding from
reading the various feature flags was that NETIF_F_HIGHDMA was required
for highmem (see illegal_highdma()) so as this isn't set, we shouldn't
be seeing highmem fragments - which is why I asked the question in my
original email.

If you want me to revert my fix above, and reproduce again, I can
certainly try that - or put a WARN_ON_ONCE(PageHighMem(this_frag->page.p))
in there, but I seem to remember that it wasn't particularly useful as
the backtrace didn't show where the memory actually came from.

Message ID	20141218000357.GX11285@n2100.arm.linux.org.uk
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 5898A140079 for <patchwork-incoming@ozlabs.org>; Thu, 18 Dec 2014 11:04:18 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751257AbaLRAEN (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Wed, 17 Dec 2014 19:04:13 -0500 Received: from gw-1.arm.linux.org.uk ([78.32.30.217]:37051 "EHLO pandora.arm.linux.org.uk" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751030AbaLRAEM (ORCPT <rfc822;netdev@vger.kernel.org>); Wed, 17 Dec 2014 19:04:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=arm.linux.org.uk; s=pandora-2014; h=Sender:In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=UJwmm0InONMk6Z1xzOxiPE2QzuHqlVnRuU3hzcQDKsk=; b=NYOuL2wEooUG9NqlcnDujp8thVo64s22kRdEu1keFkMFlegGb5sP9dTZ3SJ0sAcsMKC1NotElhk1kmYsbLTm+D2mErWudVunDnOuiUY8DFwSnfu4knF1nx/WEB49aJw4v36RD4ZXKkDmnh+lLIAXRQtokgPbN/ZWjw+Lhw1DnMs=; Received: from n2100.arm.linux.org.uk ([fd8f:7570:feb6:1:214:fdff:fe10:4f86]:53884) by pandora.arm.linux.org.uk with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.82_1-5b7a7c0-XX) (envelope-from <linux@arm.linux.org.uk>) id 1Y1OZV-00055k-GY; Thu, 18 Dec 2014 00:04:01 +0000 Received: from linux by n2100.arm.linux.org.uk with local (Exim 4.76) (envelope-from <linux@n2100.arm.linux.org.uk>) id 1Y1OZS-0006dz-CT; Thu, 18 Dec 2014 00:03:58 +0000 Date: Thu, 18 Dec 2014 00:03:57 +0000 From: Russell King - ARM Linux <linux@arm.linux.org.uk> To: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Cc: David Miller <davem@davemloft.net>, Nimrod Andy <B38611@freescale.com>, Fabio Estevam <fabio.estevam@freescale.com>, netdev@vger.kernel.org, fugang.duan@freescale.com Subject: Re: Bug: mv643xxx fails with highmem Message-ID: <20141218000357.GX11285@n2100.arm.linux.org.uk> References: <20141211194920.GR11285@n2100.arm.linux.org.uk> <20141211.151055.817876561546126576.davem@davemloft.net> <20141211202507.GS11285@n2100.arm.linux.org.uk> <5491F342.5090301@free-electrons.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5491F342.5090301@free-electrons.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

Bug: mv643xxx fails with highmem

Commit Message

Comments

Patch