oslib: make error handling more reasonable

Message ID	1328884453-1067-1-git-send-email-zwu.kernel@gmail.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> Gateway: Authorized Use Only! Violators will be prosecuted for <qemu-devel@nongnu.org> from <zwu.kernel@gmail.com>; Fri, 10 Feb 2012 09:34:28 -0500 Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 10 Feb 2012 09:34:24 -0500 From: Zhi Yong Wu <zwu.kernel@gmail.com> To: qemu-devel@nongnu.org Date: Fri, 10 Feb 2012 22:34:13 +0800 Message-Id: <1328884453-1067-1-git-send-email-zwu.kernel@gmail.com> Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>, stefanha@linux.vnet.ibm.com Subject: [Qemu-devel] [PATCH] oslib: make error handling more reasonable Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Zhiyong Wu Feb. 10, 2012, 2:34 p.m. UTC

From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 oslib-posix.c |    4 ++--
 oslib-win32.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Daniel P. Berrangé Feb. 10, 2012, 2:41 p.m. UTC | #1

On Fri, Feb 10, 2012 at 10:34:13PM +0800, Zhi Yong Wu wrote:
> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> 
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
>  oslib-posix.c |    4 ++--
>  oslib-win32.c |    4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/oslib-posix.c b/oslib-posix.c
> index b6a3c7f..f978d56 100644
> --- a/oslib-posix.c
> +++ b/oslib-posix.c
> @@ -80,7 +80,7 @@ void *qemu_oom_check(void *ptr)
>  {
>      if (ptr == NULL) {
>          fprintf(stderr, "Failed to allocate memory: %s\n", strerror(errno));
> -        abort();
> +        exit(EXIT_FAILURE);

exit() will call any atexit()/on_exit() handlers, as well as trying
to flush I/O streams. Any of these actions may require further
memory allocations, which will likely fail, or worse cause this
code to re-enter itself if an atexit() handler calls qemu_malloc

The only option other than abort(), is to use  _Exit() which
doesn't try to run cleanup handlers.

Daniel

Zhiyong Wu Feb. 10, 2012, 3:13 p.m. UTC | #2

On Fri, Feb 10, 2012 at 10:41 PM, Daniel P. Berrange
<berrange@redhat.com> wrote:
> On Fri, Feb 10, 2012 at 10:34:13PM +0800, Zhi Yong Wu wrote:
>> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>
>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> ---
>>  oslib-posix.c |    4 ++--
>>  oslib-win32.c |    4 ++--
>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/oslib-posix.c b/oslib-posix.c
>> index b6a3c7f..f978d56 100644
>> --- a/oslib-posix.c
>> +++ b/oslib-posix.c
>> @@ -80,7 +80,7 @@ void *qemu_oom_check(void *ptr)
>>  {
>>      if (ptr == NULL) {
>>          fprintf(stderr, "Failed to allocate memory: %s\n", strerror(errno));
>> -        abort();
>> +        exit(EXIT_FAILURE);
>
> exit() will call any atexit()/on_exit() handlers, as well as trying
> to flush I/O streams. Any of these actions may require further
> memory allocations, which will likely fail, or worse cause this
> code to re-enter itself if an atexit() handler calls qemu_malloc
Nice, very reasonable.
>
> The only option other than abort(), is to use  _Exit() which
> doesn't try to run cleanup handlers.
I will try to send out v2
>
> Daniel
> --
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

Stefan Weil Feb. 10, 2012, 3:53 p.m. UTC | #3

Am 10.02.2012 16:13, schrieb Zhi Yong Wu:
> On Fri, Feb 10, 2012 at 10:41 PM, Daniel P. Berrange
> <berrange@redhat.com> wrote:
>> On Fri, Feb 10, 2012 at 10:34:13PM +0800, Zhi Yong Wu wrote:
>>> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>
>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>> ---
>>>  oslib-posix.c |    4 ++--
>>>  oslib-win32.c |    4 ++--
>>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/oslib-posix.c b/oslib-posix.c
>>> index b6a3c7f..f978d56 100644
>>> --- a/oslib-posix.c
>>> +++ b/oslib-posix.c
>>> @@ -80,7 +80,7 @@ void *qemu_oom_check(void *ptr)
>>>  {
>>>      if (ptr == NULL) {
>>>          fprintf(stderr, "Failed to allocate memory: %s\n", 
>>> strerror(errno));
>>> -        abort();
>>> +        exit(EXIT_FAILURE);
>>
>> exit() will call any atexit()/on_exit() handlers, as well as trying
>> to flush I/O streams. Any of these actions may require further
>> memory allocations, which will likely fail, or worse cause this
>> code to re-enter itself if an atexit() handler calls qemu_malloc
> Nice, very reasonable.
>>
>> The only option other than abort(), is to use  _Exit() which
>> doesn't try to run cleanup handlers.
> I will try to send out v2

Could you please explain why calling exit, _Exit or _exit is more
reasonable than calling abort?

abort can create core dumps or start a debugger which is
useful for me and maybe other developers, too.

Eric Blake Feb. 10, 2012, 6:35 p.m. UTC | #4

On 02/10/2012 07:41 AM, Daniel P. Berrange wrote:

>> @@ -80,7 +80,7 @@ void *qemu_oom_check(void *ptr)
>>  {
>>      if (ptr == NULL) {
>>          fprintf(stderr, "Failed to allocate memory: %s\n", strerror(errno));
>> -        abort();
>> +        exit(EXIT_FAILURE);
> 
> exit() will call any atexit()/on_exit() handlers, as well as trying
> to flush I/O streams. Any of these actions may require further
> memory allocations, which will likely fail, or worse cause this
> code to re-enter itself if an atexit() handler calls qemu_malloc
> 
> The only option other than abort(), is to use  _Exit() which
> doesn't try to run cleanup handlers.

Correct, but in that case, then you need to fflush(stderr) prior to
_Exit(), or else use write() rather than fprintf(), since otherwise your
attempt at a nice oom error message is lost.

Zhiyong Wu Feb. 13, 2012, 2:37 a.m. UTC | #5

On Fri, Feb 10, 2012 at 11:53 PM, Stefan Weil <sw@weilnetz.de> wrote:
> Am 10.02.2012 16:13, schrieb Zhi Yong Wu:
>
>> On Fri, Feb 10, 2012 at 10:41 PM, Daniel P. Berrange
>> <berrange@redhat.com> wrote:
>>>
>>> On Fri, Feb 10, 2012 at 10:34:13PM +0800, Zhi Yong Wu wrote:
>>>>
>>>> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>>
>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>> ---
>>>>  oslib-posix.c |    4 ++--
>>>>  oslib-win32.c |    4 ++--
>>>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/oslib-posix.c b/oslib-posix.c
>>>> index b6a3c7f..f978d56 100644
>>>> --- a/oslib-posix.c
>>>> +++ b/oslib-posix.c
>>>> @@ -80,7 +80,7 @@ void *qemu_oom_check(void *ptr)
>>>>  {
>>>>     if (ptr == NULL) {
>>>>         fprintf(stderr, "Failed to allocate memory: %s\n",
>>>> strerror(errno));
>>>> -        abort();
>>>> +        exit(EXIT_FAILURE);
>>>
>>>
>>> exit() will call any atexit()/on_exit() handlers, as well as trying
>>> to flush I/O streams. Any of these actions may require further
>>> memory allocations, which will likely fail, or worse cause this
>>> code to re-enter itself if an atexit() handler calls qemu_malloc
>>
>> Nice, very reasonable.
>>>
>>>
>>> The only option other than abort(), is to use  _Exit() which
>>> doesn't try to run cleanup handlers.
>>
>> I will try to send out v2
>
>
> Could you please explain why calling exit, _Exit or _exit is more
> reasonable than calling abort?
>
> abort can create core dumps or start a debugger which is
> useful for me and maybe other developers, too.
pls refer to http://lists.gnu.org/archive/html/qemu-devel/2012-02/msg01270.html.
In the scenario, the user should not see core dump, and he perhaps
think that one bug exists in qemu code.
So we hope to use _Exit() instead of abort() here.

>

Zhiyong Wu Feb. 13, 2012, 2:42 a.m. UTC | #6

On Sat, Feb 11, 2012 at 2:35 AM, Eric Blake <eblake@redhat.com> wrote:
> On 02/10/2012 07:41 AM, Daniel P. Berrange wrote:
>
>>> @@ -80,7 +80,7 @@ void *qemu_oom_check(void *ptr)
>>>  {
>>>      if (ptr == NULL) {
>>>          fprintf(stderr, "Failed to allocate memory: %s\n", strerror(errno));
>>> -        abort();
>>> +        exit(EXIT_FAILURE);
>>
>> exit() will call any atexit()/on_exit() handlers, as well as trying
>> to flush I/O streams. Any of these actions may require further
>> memory allocations, which will likely fail, or worse cause this
>> code to re-enter itself if an atexit() handler calls qemu_malloc
>>
>> The only option other than abort(), is to use  _Exit() which
>> doesn't try to run cleanup handlers.
>
> Correct, but in that case, then you need to fflush(stderr) prior to
> _Exit(), or else use write() rather than fprintf(), since otherwise your
> attempt at a nice oom error message is lost.
Great, pls see next version.
>
> --
> Eric Blake   eblake@redhat.com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>

Stefan Weil Feb. 13, 2012, 6:29 a.m. UTC | #7

Am 13.02.2012 03:37, schrieb Zhi Yong Wu:
> On Fri, Feb 10, 2012 at 11:53 PM, Stefan Weil <sw@weilnetz.de> wrote:
>> Am 10.02.2012 16:13, schrieb Zhi Yong Wu:
>>
>>> On Fri, Feb 10, 2012 at 10:41 PM, Daniel P. Berrange
>>> <berrange@redhat.com> wrote:
>>>>
>>>> On Fri, Feb 10, 2012 at 10:34:13PM +0800, Zhi Yong Wu wrote:
>>>>>
>>>>> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>>>
>>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>>> ---
>>>>>  oslib-posix.c |    4 ++--
>>>>>  oslib-win32.c |    4 ++--
>>>>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/oslib-posix.c b/oslib-posix.c
>>>>> index b6a3c7f..f978d56 100644
>>>>> --- a/oslib-posix.c
>>>>> +++ b/oslib-posix.c
>>>>> @@ -80,7 +80,7 @@ void *qemu_oom_check(void *ptr)
>>>>>  {
>>>>>     if (ptr == NULL) {
>>>>>         fprintf(stderr, "Failed to allocate memory: %s\n",
>>>>> strerror(errno));
>>>>> -        abort();
>>>>> +        exit(EXIT_FAILURE);
>>>>
>>>>
>>>> exit() will call any atexit()/on_exit() handlers, as well as trying
>>>> to flush I/O streams. Any of these actions may require further
>>>> memory allocations, which will likely fail, or worse cause this
>>>> code to re-enter itself if an atexit() handler calls qemu_malloc
>>>
>>> Nice, very reasonable.
>>>>
>>>>
>>>> The only option other than abort(), is to use  _Exit() which
>>>> doesn't try to run cleanup handlers.
>>>
>>> I will try to send out v2
>>
>>
>> Could you please explain why calling exit, _Exit or _exit is more
>> reasonable than calling abort?
>>
>> abort can create core dumps or start a debugger which is
>> useful for me and maybe other developers, too.
> pls refer to 
> http://lists.gnu.org/archive/html/qemu-devel/2012-02/msg01270.html.
> In the scenario, the user should not see core dump, and he perhaps
> think that one bug exists in qemu code.
> So we hope to use _Exit() instead of abort() here.

So you say that you don't want a core dump just because the
user called QEMU with -m 4000 or some other large value.

Allocating RAM for the emulated machine is perhaps the only
scenario where a core dump is indeed not reasonable. In most
other cases, out-of-memory is an indication of a QEMU internal
problem, so a core dump should be written.

I therefore suggest to restrict any modification to the handling
of -m. In that case you could even improve the error message by
telling the user how much memory would be possible.
Simply call the allocating function with decreasing values until
it no longer fails.

Regards,
Stefan Weil

Daniel P. Berrangé Feb. 13, 2012, 9:17 a.m. UTC | #8

On Fri, Feb 10, 2012 at 11:35:11AM -0700, Eric Blake wrote:
> On 02/10/2012 07:41 AM, Daniel P. Berrange wrote:
> 
> >> @@ -80,7 +80,7 @@ void *qemu_oom_check(void *ptr)
> >>  {
> >>      if (ptr == NULL) {
> >>          fprintf(stderr, "Failed to allocate memory: %s\n", strerror(errno));
> >> -        abort();
> >> +        exit(EXIT_FAILURE);
> > 
> > exit() will call any atexit()/on_exit() handlers, as well as trying
> > to flush I/O streams. Any of these actions may require further
> > memory allocations, which will likely fail, or worse cause this
> > code to re-enter itself if an atexit() handler calls qemu_malloc
> > 
> > The only option other than abort(), is to use  _Exit() which
> > doesn't try to run cleanup handlers.
> 
> Correct, but in that case, then you need to fflush(stderr) prior to
> _Exit(), or else use write() rather than fprintf(), since otherwise your
> attempt at a nice oom error message is lost.

IIRC, stderr is not buffered, so should not need to be flushed.

Daniel

Stefan Hajnoczi Feb. 13, 2012, 11:16 a.m. UTC | #9

On Mon, Feb 13, 2012 at 6:29 AM, Stefan Weil <sw@weilnetz.de> wrote:
> Am 13.02.2012 03:37, schrieb Zhi Yong Wu:
>
>> On Fri, Feb 10, 2012 at 11:53 PM, Stefan Weil <sw@weilnetz.de> wrote:
>>>
>>> Am 10.02.2012 16:13, schrieb Zhi Yong Wu:
>>>
>>>> On Fri, Feb 10, 2012 at 10:41 PM, Daniel P. Berrange
>>>> <berrange@redhat.com> wrote:
>>>>>
>>>>>
>>>>> On Fri, Feb 10, 2012 at 10:34:13PM +0800, Zhi Yong Wu wrote:
>>>>>>
>>>>>>
>>>>>> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>>>>
>>>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>>>> ---
>>>>>>  oslib-posix.c |    4 ++--
>>>>>>  oslib-win32.c |    4 ++--
>>>>>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/oslib-posix.c b/oslib-posix.c
>>>>>> index b6a3c7f..f978d56 100644
>>>>>> --- a/oslib-posix.c
>>>>>> +++ b/oslib-posix.c
>>>>>> @@ -80,7 +80,7 @@ void *qemu_oom_check(void *ptr)
>>>>>>  {
>>>>>>    if (ptr == NULL) {
>>>>>>        fprintf(stderr, "Failed to allocate memory: %s\n",
>>>>>> strerror(errno));
>>>>>> -        abort();
>>>>>> +        exit(EXIT_FAILURE);
>>>>>
>>>>>
>>>>>
>>>>> exit() will call any atexit()/on_exit() handlers, as well as trying
>>>>> to flush I/O streams. Any of these actions may require further
>>>>> memory allocations, which will likely fail, or worse cause this
>>>>> code to re-enter itself if an atexit() handler calls qemu_malloc
>>>>
>>>>
>>>> Nice, very reasonable.
>>>>>
>>>>>
>>>>>
>>>>> The only option other than abort(), is to use  _Exit() which
>>>>> doesn't try to run cleanup handlers.
>>>>
>>>>
>>>> I will try to send out v2
>>>
>>>
>>>
>>> Could you please explain why calling exit, _Exit or _exit is more
>>> reasonable than calling abort?
>>>
>>> abort can create core dumps or start a debugger which is
>>> useful for me and maybe other developers, too.
>>
>> pls refer to
>> http://lists.gnu.org/archive/html/qemu-devel/2012-02/msg01270.html.
>> In the scenario, the user should not see core dump, and he perhaps
>> think that one bug exists in qemu code.
>> So we hope to use _Exit() instead of abort() here.
>
>
> So you say that you don't want a core dump just because the
> user called QEMU with -m 4000 or some other large value.
>
> Allocating RAM for the emulated machine is perhaps the only
> scenario where a core dump is indeed not reasonable. In most
> other cases, out-of-memory is an indication of a QEMU internal
> problem, so a core dump should be written.

Allocating guest memory could fail and we should give a reasonable
error and exit with a failure.  I think this might be the one case
where we *do* want to handle memory allocation NULL return.  In other
words, perhaps we should call memory allocating functions directly
here instead of using the typical QEMU abort-on-failure wrappers.

Stefan

Markus Armbruster Feb. 13, 2012, 2:04 p.m. UTC | #10

Stefan Weil <sw@weilnetz.de> writes:

> Am 10.02.2012 16:13, schrieb Zhi Yong Wu:
>> On Fri, Feb 10, 2012 at 10:41 PM, Daniel P. Berrange
>> <berrange@redhat.com> wrote:
>>> On Fri, Feb 10, 2012 at 10:34:13PM +0800, Zhi Yong Wu wrote:
>>>> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>>
>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>> ---
>>>>  oslib-posix.c |    4 ++--
>>>>  oslib-win32.c |    4 ++--
>>>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/oslib-posix.c b/oslib-posix.c
>>>> index b6a3c7f..f978d56 100644
>>>> --- a/oslib-posix.c
>>>> +++ b/oslib-posix.c
>>>> @@ -80,7 +80,7 @@ void *qemu_oom_check(void *ptr)
>>>>  {
>>>>      if (ptr == NULL) {
>>>>          fprintf(stderr, "Failed to allocate memory: %s\n",
>>>> strerror(errno));
>>>> -        abort();
>>>> +        exit(EXIT_FAILURE);
>>>
>>> exit() will call any atexit()/on_exit() handlers, as well as trying
>>> to flush I/O streams. Any of these actions may require further
>>> memory allocations, which will likely fail, or worse cause this
>>> code to re-enter itself if an atexit() handler calls qemu_malloc
>> Nice, very reasonable.
>>>
>>> The only option other than abort(), is to use  _Exit() which
>>> doesn't try to run cleanup handlers.
>> I will try to send out v2
>
> Could you please explain why calling exit, _Exit or _exit is more
> reasonable than calling abort?
>
> abort can create core dumps or start a debugger which is
> useful for me and maybe other developers, too.

I consider abort() on OOM somewhat eccentric.  abort() is for
programming errors.  Resource shortage is an environmental error that is
sometimes (but not always) caused by a programming error.

I'd rather inconvenience programmers (by making it a little bit harder
to debug programming errors that cause OOM) than confuse users with
inappropriate scary "crashes".

Peter Maydell Feb. 13, 2012, 2:30 p.m. UTC | #11

On 13 February 2012 14:04, Markus Armbruster <armbru@redhat.com> wrote:
> I consider abort() on OOM somewhat eccentric.  abort() is for
> programming errors.  Resource shortage is an environmental error that is
> sometimes (but not always) caused by a programming error.
>
> I'd rather inconvenience programmers (by making it a little bit harder
> to debug programming errors that cause OOM) than confuse users with
> inappropriate scary "crashes".

I think the rationale for aborting here is that you're already
accepting "program just dies" behaviour for out-of-memory errors
via the kernel's OOM-killer...

-- PMM

Paul Brook Feb. 14, 2012, 12:42 p.m. UTC | #12

> > abort can create core dumps or start a debugger which is
> > useful for me and maybe other developers, too.
> 
> I consider abort() on OOM somewhat eccentric.  abort() is for
> programming errors.  Resource shortage is an environmental error that is
> sometimes (but not always) caused by a programming error.
> 
> I'd rather inconvenience programmers (by making it a little bit harder
> to debug programming errors that cause OOM) than confuse users with
> inappropriate scary "crashes".

While I agree that abort() is not the most friendly failure method, I don't 
tthink it's worth trying to handle OOM gracefully.  Once we hit OOM I'd say 
we're pretty much beyond hope.  The best thing we can do is exist as quickly 
as possible.  For the vast majority of systems there isn't any reason to 
believe things will somehow get better if we try again later.

Initial guest RAM allocation is maybe a special case worth a polite error.  
OTOH if you're near the limit then there's a fair chance the -m allocation 
will succeed, but some later allocation will not.

The only way to handle this rebustly is to pre-allocate all the memory we're 
ever going to need[1].  I don't see that happening.

Paul

[1] And make sure the kernel isn't lying about how much ram we can have.

Anthony Liguori Feb. 14, 2012, 12:45 p.m. UTC | #13

On 02/13/2012 12:29 AM, Stefan Weil wrote:
> Am 13.02.2012 03:37, schrieb Zhi Yong Wu:
>> On Fri, Feb 10, 2012 at 11:53 PM, Stefan Weil <sw@weilnetz.de> wrote:
>>> Am 10.02.2012 16:13, schrieb Zhi Yong Wu:
>>>
>>>> On Fri, Feb 10, 2012 at 10:41 PM, Daniel P. Berrange
>>>> <berrange@redhat.com> wrote:
>>>>>
>>>>> On Fri, Feb 10, 2012 at 10:34:13PM +0800, Zhi Yong Wu wrote:
>>>>>>
>>>>>> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>>>>
>>>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>>>> ---
>>>>>> oslib-posix.c | 4 ++--
>>>>>> oslib-win32.c | 4 ++--
>>>>>> 2 files changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/oslib-posix.c b/oslib-posix.c
>>>>>> index b6a3c7f..f978d56 100644
>>>>>> --- a/oslib-posix.c
>>>>>> +++ b/oslib-posix.c
>>>>>> @@ -80,7 +80,7 @@ void *qemu_oom_check(void *ptr)
>>>>>> {
>>>>>> if (ptr == NULL) {
>>>>>> fprintf(stderr, "Failed to allocate memory: %s\n",
>>>>>> strerror(errno));
>>>>>> - abort();
>>>>>> + exit(EXIT_FAILURE);
>>>>>
>>>>>
>>>>> exit() will call any atexit()/on_exit() handlers, as well as trying
>>>>> to flush I/O streams. Any of these actions may require further
>>>>> memory allocations, which will likely fail, or worse cause this
>>>>> code to re-enter itself if an atexit() handler calls qemu_malloc
>>>>
>>>> Nice, very reasonable.
>>>>>
>>>>>
>>>>> The only option other than abort(), is to use _Exit() which
>>>>> doesn't try to run cleanup handlers.
>>>>
>>>> I will try to send out v2
>>>
>>>
>>> Could you please explain why calling exit, _Exit or _exit is more
>>> reasonable than calling abort?
>>>
>>> abort can create core dumps or start a debugger which is
>>> useful for me and maybe other developers, too.
>> pls refer to http://lists.gnu.org/archive/html/qemu-devel/2012-02/msg01270.html.
>> In the scenario, the user should not see core dump, and he perhaps
>> think that one bug exists in qemu code.
>> So we hope to use _Exit() instead of abort() here.
>
> So you say that you don't want a core dump just because the
> user called QEMU with -m 4000 or some other large value.

Then use g_try_malloc() when allocating ram and give a nice error message.

Normal malloc failures should call abort().

Regards,

Anthony Liguori

>
> Allocating RAM for the emulated machine is perhaps the only
> scenario where a core dump is indeed not reasonable. In most
> other cases, out-of-memory is an indication of a QEMU internal
> problem, so a core dump should be written.
>
> I therefore suggest to restrict any modification to the handling
> of -m. In that case you could even improve the error message by
> telling the user how much memory would be possible.
> Simply call the allocating function with decreasing values until
> it no longer fails.
>
> Regards,
> Stefan Weil
>
>

Anthony Liguori Feb. 14, 2012, 12:46 p.m. UTC | #14

On 02/13/2012 05:16 AM, Stefan Hajnoczi wrote:
> On Mon, Feb 13, 2012 at 6:29 AM, Stefan Weil<sw@weilnetz.de>  wrote:
>> Allocating RAM for the emulated machine is perhaps the only
>> scenario where a core dump is indeed not reasonable. In most
>> other cases, out-of-memory is an indication of a QEMU internal
>> problem, so a core dump should be written.
>
> Allocating guest memory could fail and we should give a reasonable
> error and exit with a failure.  I think this might be the one case
> where we *do* want to handle memory allocation NULL return.  In other
> words, perhaps we should call memory allocating functions directly
> here instead of using the typical QEMU abort-on-failure wrappers.

g_try_malloc

glib already has a suite of functions for this.

Regards,

Anthony Liguori

>
> Stefan
>

Daniel P. Berrangé Feb. 14, 2012, 12:46 p.m. UTC | #15

On Tue, Feb 14, 2012 at 12:42:58PM +0000, Paul Brook wrote:
> > > abort can create core dumps or start a debugger which is
> > > useful for me and maybe other developers, too.
> > 
> > I consider abort() on OOM somewhat eccentric.  abort() is for
> > programming errors.  Resource shortage is an environmental error that is
> > sometimes (but not always) caused by a programming error.
> > 
> > I'd rather inconvenience programmers (by making it a little bit harder
> > to debug programming errors that cause OOM) than confuse users with
> > inappropriate scary "crashes".
> 
> While I agree that abort() is not the most friendly failure method, I don't 
> tthink it's worth trying to handle OOM gracefully.  Once we hit OOM I'd say 
> we're pretty much beyond hope.  The best thing we can do is exist as quickly 
> as possible.  For the vast majority of systems there isn't any reason to 
> believe things will somehow get better if we try again later.
> 
> Initial guest RAM allocation is maybe a special case worth a polite error.  
> OTOH if you're near the limit then there's a fair chance the -m allocation 
> will succeed, but some later allocation will not.
> 
> The only way to handle this rebustly is to pre-allocate all the memory we're 
> ever going to need[1].  I don't see that happening.

FWIW, users can already opt-in to pre-allocation if running KVM enabled QEMU

   -mem-path /dev/shm  -mem-prealloc   (or /dev/hugepages more usefully)

Regards,
Daniel

Anthony Liguori Feb. 14, 2012, 12:47 p.m. UTC | #16

On 02/13/2012 08:04 AM, Markus Armbruster wrote:
> Stefan Weil<sw@weilnetz.de>  writes:
>
>> Am 10.02.2012 16:13, schrieb Zhi Yong Wu:
>>> On Fri, Feb 10, 2012 at 10:41 PM, Daniel P. Berrange
>>> <berrange@redhat.com>  wrote:
>>>> On Fri, Feb 10, 2012 at 10:34:13PM +0800, Zhi Yong Wu wrote:
>>>>> From: Zhi Yong Wu<wuzhy@linux.vnet.ibm.com>
>>>>>
>>>>> Signed-off-by: Zhi Yong Wu<wuzhy@linux.vnet.ibm.com>
>>>>> ---
>>>>>   oslib-posix.c |    4 ++--
>>>>>   oslib-win32.c |    4 ++--
>>>>>   2 files changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/oslib-posix.c b/oslib-posix.c
>>>>> index b6a3c7f..f978d56 100644
>>>>> --- a/oslib-posix.c
>>>>> +++ b/oslib-posix.c
>>>>> @@ -80,7 +80,7 @@ void *qemu_oom_check(void *ptr)
>>>>>   {
>>>>>       if (ptr == NULL) {
>>>>>           fprintf(stderr, "Failed to allocate memory: %s\n",
>>>>> strerror(errno));
>>>>> -        abort();
>>>>> +        exit(EXIT_FAILURE);
>>>>
>>>> exit() will call any atexit()/on_exit() handlers, as well as trying
>>>> to flush I/O streams. Any of these actions may require further
>>>> memory allocations, which will likely fail, or worse cause this
>>>> code to re-enter itself if an atexit() handler calls qemu_malloc
>>> Nice, very reasonable.
>>>>
>>>> The only option other than abort(), is to use  _Exit() which
>>>> doesn't try to run cleanup handlers.
>>> I will try to send out v2
>>
>> Could you please explain why calling exit, _Exit or _exit is more
>> reasonable than calling abort?
>>
>> abort can create core dumps or start a debugger which is
>> useful for me and maybe other developers, too.
>
> I consider abort() on OOM somewhat eccentric.  abort() is for
> programming errors.  Resource shortage is an environmental error that is
> sometimes (but not always) caused by a programming error.
>
> I'd rather inconvenience programmers (by making it a little bit harder
> to debug programming errors that cause OOM) than confuse users with
> inappropriate scary "crashes".

OOM is a going to 99% of the time be a bug in QEMU.

For the rare exceptions (like a bad -m argument), we should handle those as 
special cases.

Regards,

Anthony Liguori

>

Paul Brook Feb. 14, 2012, 1:07 p.m. UTC | #17

> > The only way to handle this rebustly is to pre-allocate all the memory
> > we're ever going to need[1].  I don't see that happening.
> 
> FWIW, users can already opt-in to pre-allocation if running KVM enabled
> QEMU
> 
>    -mem-path /dev/shm  -mem-prealloc   (or /dev/hugepages more usefully)

No, that's something different.  -mem-prealloc causes MAP_POPULATE to be 
passed when allocating guest ram, working around the fact that most modern 
implementations of mmap lie.  It has no effect on how all the other memory 
qemu uses is allocated.

Paul

oslib: make error handling more reasonable

Commit Message

Comments

Patch