From patchwork Sun Sep 6 14:21:53 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Purdie X-Patchwork-Id: 514922 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 0360F1401CD for ; Mon, 7 Sep 2015 00:28:51 +1000 (AEST) Received: from localhost ([::1]:48613 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZYaqJ-0002A2-Ua for incoming@patchwork.ozlabs.org; Sun, 06 Sep 2015 10:22:51 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55334) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZYapn-0001m8-B8 for qemu-devel@nongnu.org; Sun, 06 Sep 2015 10:22:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZYapi-0004wS-UH for qemu-devel@nongnu.org; Sun, 06 Sep 2015 10:22:19 -0400 Received: from 5751f4a1.skybroadband.com ([87.81.244.161]:50174 helo=dan.rpsys.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZYapi-0004uk-JF for qemu-devel@nongnu.org; Sun, 06 Sep 2015 10:22:14 -0400 Received: from localhost (localhost [127.0.0.1]) by dan.rpsys.net (8.14.4/8.14.4/Debian-4.1ubuntu1) with ESMTP id t86EM9ps030571; Sun, 6 Sep 2015 15:22:09 +0100 Received: from dan.rpsys.net ([127.0.0.1]) by localhost (dan.rpsys.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 2wO9UaIj8uFc; Sun, 6 Sep 2015 15:22:09 +0100 (BST) Received: from [192.168.3.10] ([192.168.3.10]) (authenticated bits=0) by dan.rpsys.net (8.14.4/8.14.4/Debian-4.1ubuntu1) with ESMTP id t86ELrsU030556 (version=TLSv1/SSLv3 cipher=AES128-GCM-SHA256 bits=128 verify=NOT); Sun, 6 Sep 2015 15:22:04 +0100 Message-ID: <1441549313.24871.218.camel@linuxfoundation.org> From: Richard Purdie To: Peter Crosthwaite Date: Sun, 06 Sep 2015 15:21:53 +0100 In-Reply-To: References: <1441362357.24871.155.camel@linuxfoundation.org> <1441365880.24871.164.camel@linuxfoundation.org> <1441370585.24871.166.camel@linuxfoundation.org> <1441387258.24871.197.camel@linuxfoundation.org> X-Mailer: Evolution 3.12.11-0ubuntu3 Mime-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 87.81.244.161 Cc: Peter Maydell , qemu-devel Subject: Re: [Qemu-devel] Segfault using qemu-system-arm in smc91c111 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Sat, 2015-09-05 at 13:30 -0700, Peter Crosthwaite wrote: > On Fri, Sep 4, 2015 at 10:30 AM, Peter Maydell wrote: > > On 4 September 2015 at 18:20, Richard Purdie > > wrote: > >> On Fri, 2015-09-04 at 13:43 +0100, Richard Purdie wrote: > >>> On Fri, 2015-09-04 at 12:31 +0100, Peter Maydell wrote: > >>> > On 4 September 2015 at 12:24, Richard Purdie > >>> > wrote: > >>> > > So just based on that, yes, seems that the rx_fifo looks to be > >>> > > overrunning. I can add the asserts but I think it would just confirm > >>> > > this. > >>> > > >>> > Yes, the point of adding assertions is to confirm a hypothesis. > >>> > >>> I've now confirmed that it does indeed trigger the assert in > >>> smc91c111_receive(). > >> > >> I just tried an experiment where I put: > >> > >> if (s->rx_fifo_len >= NUM_PACKETS) > >> return -1; > >> > >> into smc91c111_receive() and my reproducer stops reproducing the > >> problem. > > Does it just stop the crash or does it eliminate the problem > completely with a fully now-working network? It stops the crash, the network works great. > >> I also noticed can_receive() could also have a check on buffer > >> availability. Would one of these changes be the correct fix here? > > > > The interesting question is why smc91c111_allocate_packet() doesn't > > fail in this situation. We only have NUM_PACKETS worth of storage, > > shared between the tx and rx buffers, so how could we both have > > already filled the rx_fifo and have a spare packet for the allocate > > function to return? > > Maybe this: > > case 5: /* Release. */ > smc91c111_release_packet(s, s->packet_num); > break; > > The guest is able to free an allocated packet without the accompanying > pop of tx/rx fifo. This may suggest some sort of guest error? > > The fix depends on the behaviour of the real hardware. If that MMIO op > is supposed to dequeue the corresponding queue entry then we may need > to patch that logic to do search the queues and dequeue it. Otherwise > we need to find out the genuine length of the rx queue, and clamp it > without something like Richards patch. There are a few other bits and > pieces that suggest the guest can have independent control of the > queues and allocated buffers but i'm confused as to how the rx fifo > length can get up to 10 in any case. I think I have a handle on what is going on. smc91c111_release_packet() changes s->allocated() but not rx_fifo. can_receive() only looks at s->allocated. We can trigger new network packets to arrive from smc91c111_release_packet() which calls qemu_flush_queued_packets() *before* we change rx_fifo and this can loop. The patch below which explicitly orders the qemu_flush_queued_packets() call resolved the test case I was able to reproduce this problem in. So there are three ways to fix this, either can_receive() needs to check both s->allocated() and rx_fifo, or the code is more explicit about when qemu_flush_queued_packets() is called (as per my patch below), or the case 4 where smc91c111_release_packet() and then smc91c111_pop_rx_fifo(s) is called is reversed. I also tested the latter which also works, albeit with more ugly code. The problem is much more reproducible with the assert btw, booting a qemu image with this and hitting the network interface with scp of a few large files is usually enough. So which patch would be preferred? :) Cheers, Richard Index: qemu-2.4.0/hw/net/smc91c111.c =================================================================== --- qemu-2.4.0.orig/hw/net/smc91c111.c +++ qemu-2.4.0/hw/net/smc91c111.c @@ -185,7 +185,6 @@ static void smc91c111_release_packet(smc s->allocated &= ~(1 << packet); if (s->tx_alloc == 0x80) smc91c111_tx_alloc(s); - qemu_flush_queued_packets(qemu_get_queue(s->nic)); } /* Flush the TX FIFO. */ @@ -237,9 +236,11 @@ static void smc91c111_do_tx(smc91c111_st } } #endif - if (s->ctr & CTR_AUTO_RELEASE) + if (s->ctr & CTR_AUTO_RELEASE) { /* Race? */ smc91c111_release_packet(s, packetnum); + qemu_flush_queued_packets(qemu_get_queue(s->nic)); + } else if (s->tx_fifo_done_len < NUM_PACKETS) s->tx_fifo_done[s->tx_fifo_done_len++] = packetnum; qemu_send_packet(qemu_get_queue(s->nic), p, len); @@ -379,9 +380,11 @@ static void smc91c111_writeb(void *opaqu smc91c111_release_packet(s, s->rx_fifo[0]); } smc91c111_pop_rx_fifo(s); + qemu_flush_queued_packets(qemu_get_queue(s->nic)); break; case 5: /* Release. */ smc91c111_release_packet(s, s->packet_num); + qemu_flush_queued_packets(qemu_get_queue(s->nic)); break; case 6: /* Add to TX FIFO. */ smc91c111_queue_tx(s, s->packet_num);