diff mbox series

chardev/char-pty: Avoid losing bytes when the other side just (re-)connected

Message ID 20230816210743.1319018-1-thuth@redhat.com
State New
Headers show
Series chardev/char-pty: Avoid losing bytes when the other side just (re-)connected | expand

Commit Message

Thomas Huth Aug. 16, 2023, 9:07 p.m. UTC
When starting a guest via libvirt with "virsh start --console ...",
the first second of the console output is missing. This is especially
annoying on s390x that only has a text console by default and no graphical
output - if the bios fails to boot here, the information about what went
wrong is completely lost.

One part of the problem (there is also some things to be done on the
libvirt side) is that QEMU only checks with a 1 second timer whether
the other side of the pty is already connected, so the first second of
the console output is always lost.

This likely used to work better in the past, since the code once checked
for a re-connection during write, but this has been removed in commit
f8278c7d74 ("char-pty: remove the check for connection on write") to avoid
some locking.

To ease the situation here at least a little bit, let's check with g_poll()
whether we could send out the data anyway, even if the connection has not
been marked as "connected" yet. The file descriptor is marked as non-blocking
anyway since commit fac6688a18 ("Do not hang on full PTY"), so this should
not cause any trouble if the other side is not ready for receiving yet.

With this patch applied, I can now successfully see the bios output of
a s390x guest when running it with "virsh start --console" (with a patched
version of virsh that fixes the remaining issues there, too).

Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 chardev/char-pty.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

Comments

Daniel P. Berrangé Aug. 17, 2023, 10:32 a.m. UTC | #1
On Wed, Aug 16, 2023 at 11:07:43PM +0200, Thomas Huth wrote:
> When starting a guest via libvirt with "virsh start --console ...",
> the first second of the console output is missing. This is especially
> annoying on s390x that only has a text console by default and no graphical
> output - if the bios fails to boot here, the information about what went
> wrong is completely lost.
> 
> One part of the problem (there is also some things to be done on the
> libvirt side) is that QEMU only checks with a 1 second timer whether
> the other side of the pty is already connected, so the first second of
> the console output is always lost.
> 
> This likely used to work better in the past, since the code once checked
> for a re-connection during write, but this has been removed in commit
> f8278c7d74 ("char-pty: remove the check for connection on write") to avoid
> some locking.
> 
> To ease the situation here at least a little bit, let's check with g_poll()
> whether we could send out the data anyway, even if the connection has not
> been marked as "connected" yet. The file descriptor is marked as non-blocking
> anyway since commit fac6688a18 ("Do not hang on full PTY"), so this should
> not cause any trouble if the other side is not ready for receiving yet.
> 
> With this patch applied, I can now successfully see the bios output of
> a s390x guest when running it with "virsh start --console" (with a patched
> version of virsh that fixes the remaining issues there, too).
> 
> Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  chardev/char-pty.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/chardev/char-pty.c b/chardev/char-pty.c
> index 4e5deac18a..fad12dfef3 100644
> --- a/chardev/char-pty.c
> +++ b/chardev/char-pty.c
> @@ -106,11 +106,27 @@ static void pty_chr_update_read_handler(Chardev *chr)
>  static int char_pty_chr_write(Chardev *chr, const uint8_t *buf, int len)
>  {
>      PtyChardev *s = PTY_CHARDEV(chr);
> +    GPollFD pfd;
> +    int rc;
>  
> -    if (!s->connected) {
> -        return len;
> +    if (s->connected) {
> +        return io_channel_send(s->ioc, buf, len);
>      }
> -    return io_channel_send(s->ioc, buf, len);
> +
> +    /*
> +     * The other side might already be re-connected, but the timer might
> +     * not have fired yet. So let's check here whether we can write again:
> +     */
> +    pfd.fd = QIO_CHANNEL_FILE(s->ioc)->fd;
> +    pfd.events = G_IO_OUT;
> +    pfd.revents = 0;
> +    rc = RETRY_ON_EINTR(g_poll(&pfd, 1, 0));
> +    g_assert(rc >= 0);
> +    if (!(pfd.revents & G_IO_HUP) && (pfd.revents & G_IO_OUT)) {

Should (can?) we call

   pty_chr_state(chr, 1);

here ?


> +        io_channel_send(s->ioc, buf, len);

As it feels a little dirty to be sending data before setting the
'connected == 1' and thus issuing the 'CHR_EVENT_OPENED' event 

> +    }
> +
> +    return len;
>  }
>  
>  static GSource *pty_chr_add_watch(Chardev *chr, GIOCondition cond)

With regards,
Daniel
Thomas Huth Aug. 17, 2023, noon UTC | #2
On 17/08/2023 12.32, Daniel P. Berrangé wrote:
> On Wed, Aug 16, 2023 at 11:07:43PM +0200, Thomas Huth wrote:
>> When starting a guest via libvirt with "virsh start --console ...",
>> the first second of the console output is missing. This is especially
>> annoying on s390x that only has a text console by default and no graphical
>> output - if the bios fails to boot here, the information about what went
>> wrong is completely lost.
>>
>> One part of the problem (there is also some things to be done on the
>> libvirt side) is that QEMU only checks with a 1 second timer whether
>> the other side of the pty is already connected, so the first second of
>> the console output is always lost.
>>
>> This likely used to work better in the past, since the code once checked
>> for a re-connection during write, but this has been removed in commit
>> f8278c7d74 ("char-pty: remove the check for connection on write") to avoid
>> some locking.
>>
>> To ease the situation here at least a little bit, let's check with g_poll()
>> whether we could send out the data anyway, even if the connection has not
>> been marked as "connected" yet. The file descriptor is marked as non-blocking
>> anyway since commit fac6688a18 ("Do not hang on full PTY"), so this should
>> not cause any trouble if the other side is not ready for receiving yet.
>>
>> With this patch applied, I can now successfully see the bios output of
>> a s390x guest when running it with "virsh start --console" (with a patched
>> version of virsh that fixes the remaining issues there, too).
>>
>> Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> ---
>>   chardev/char-pty.c | 22 +++++++++++++++++++---
>>   1 file changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/chardev/char-pty.c b/chardev/char-pty.c
>> index 4e5deac18a..fad12dfef3 100644
>> --- a/chardev/char-pty.c
>> +++ b/chardev/char-pty.c
>> @@ -106,11 +106,27 @@ static void pty_chr_update_read_handler(Chardev *chr)
>>   static int char_pty_chr_write(Chardev *chr, const uint8_t *buf, int len)
>>   {
>>       PtyChardev *s = PTY_CHARDEV(chr);
>> +    GPollFD pfd;
>> +    int rc;
>>   
>> -    if (!s->connected) {
>> -        return len;
>> +    if (s->connected) {
>> +        return io_channel_send(s->ioc, buf, len);
>>       }
>> -    return io_channel_send(s->ioc, buf, len);
>> +
>> +    /*
>> +     * The other side might already be re-connected, but the timer might
>> +     * not have fired yet. So let's check here whether we can write again:
>> +     */
>> +    pfd.fd = QIO_CHANNEL_FILE(s->ioc)->fd;
>> +    pfd.events = G_IO_OUT;
>> +    pfd.revents = 0;
>> +    rc = RETRY_ON_EINTR(g_poll(&pfd, 1, 0));
>> +    g_assert(rc >= 0);
>> +    if (!(pfd.revents & G_IO_HUP) && (pfd.revents & G_IO_OUT)) {
> 
> Should (can?) we call
> 
>     pty_chr_state(chr, 1);
> 
> here ?

As far as I understood commit f8278c7d74c6 and f7ea2038bea04628, this is not 
possible anymore since the lock has been removed.

>> +        io_channel_send(s->ioc, buf, len);
> 
> As it feels a little dirty to be sending data before setting the
> 'connected == 1' and thus issuing the 'CHR_EVENT_OPENED' event

I didn't find a really better solution so far. We could maybe introduce a 
buffer in the char-pty code and store the last second of guest output, but 
IMHO that's way more complex and thus somewhat ugly, too?

  Thomas
Daniel P. Berrangé Aug. 17, 2023, 1:06 p.m. UTC | #3
On Thu, Aug 17, 2023 at 02:00:26PM +0200, Thomas Huth wrote:
> On 17/08/2023 12.32, Daniel P. Berrangé wrote:
> > On Wed, Aug 16, 2023 at 11:07:43PM +0200, Thomas Huth wrote:
> > > When starting a guest via libvirt with "virsh start --console ...",
> > > the first second of the console output is missing. This is especially
> > > annoying on s390x that only has a text console by default and no graphical
> > > output - if the bios fails to boot here, the information about what went
> > > wrong is completely lost.
> > > 
> > > One part of the problem (there is also some things to be done on the
> > > libvirt side) is that QEMU only checks with a 1 second timer whether
> > > the other side of the pty is already connected, so the first second of
> > > the console output is always lost.
> > > 
> > > This likely used to work better in the past, since the code once checked
> > > for a re-connection during write, but this has been removed in commit
> > > f8278c7d74 ("char-pty: remove the check for connection on write") to avoid
> > > some locking.
> > > 
> > > To ease the situation here at least a little bit, let's check with g_poll()
> > > whether we could send out the data anyway, even if the connection has not
> > > been marked as "connected" yet. The file descriptor is marked as non-blocking
> > > anyway since commit fac6688a18 ("Do not hang on full PTY"), so this should
> > > not cause any trouble if the other side is not ready for receiving yet.
> > > 
> > > With this patch applied, I can now successfully see the bios output of
> > > a s390x guest when running it with "virsh start --console" (with a patched
> > > version of virsh that fixes the remaining issues there, too).
> > > 
> > > Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
> > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > > ---
> > >   chardev/char-pty.c | 22 +++++++++++++++++++---
> > >   1 file changed, 19 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/chardev/char-pty.c b/chardev/char-pty.c
> > > index 4e5deac18a..fad12dfef3 100644
> > > --- a/chardev/char-pty.c
> > > +++ b/chardev/char-pty.c
> > > @@ -106,11 +106,27 @@ static void pty_chr_update_read_handler(Chardev *chr)
> > >   static int char_pty_chr_write(Chardev *chr, const uint8_t *buf, int len)
> > >   {
> > >       PtyChardev *s = PTY_CHARDEV(chr);
> > > +    GPollFD pfd;
> > > +    int rc;
> > > -    if (!s->connected) {
> > > -        return len;
> > > +    if (s->connected) {
> > > +        return io_channel_send(s->ioc, buf, len);
> > >       }
> > > -    return io_channel_send(s->ioc, buf, len);
> > > +
> > > +    /*
> > > +     * The other side might already be re-connected, but the timer might
> > > +     * not have fired yet. So let's check here whether we can write again:
> > > +     */
> > > +    pfd.fd = QIO_CHANNEL_FILE(s->ioc)->fd;
> > > +    pfd.events = G_IO_OUT;
> > > +    pfd.revents = 0;
> > > +    rc = RETRY_ON_EINTR(g_poll(&pfd, 1, 0));
> > > +    g_assert(rc >= 0);
> > > +    if (!(pfd.revents & G_IO_HUP) && (pfd.revents & G_IO_OUT)) {
> > 
> > Should (can?) we call
> > 
> >     pty_chr_state(chr, 1);
> > 
> > here ?
> 
> As far as I understood commit f8278c7d74c6 and f7ea2038bea04628, this is not
> possible anymore since the lock has been removed.
> 
> > > +        io_channel_send(s->ioc, buf, len);
> > 
> > As it feels a little dirty to be sending data before setting the
> > 'connected == 1' and thus issuing the 'CHR_EVENT_OPENED' event
> 
> I didn't find a really better solution so far. We could maybe introduce a
> buffer in the char-pty code and store the last second of guest output, but
> IMHO that's way more complex and thus somewhat ugly, too?

The orignal commit f8278c7d74c6 said

[quote]
    char-pty: remove the check for connection on write
    
    This doesn't help much compared to the 1 second poll PTY
    timer. I can't think of a use case where this would help.
[/quote]

We've now identified a use case where it is actually important.

IOW, there's a justification to revert both f7ea2038bea04628 and
f8278c7d74c6, re-adding the locking and write update logic.

With regards,
Daniel
Marc-André Lureau Aug. 17, 2023, 1:47 p.m. UTC | #4
Hi

On Thu, Aug 17, 2023 at 5:06 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Thu, Aug 17, 2023 at 02:00:26PM +0200, Thomas Huth wrote:
> > On 17/08/2023 12.32, Daniel P. Berrangé wrote:
> > > On Wed, Aug 16, 2023 at 11:07:43PM +0200, Thomas Huth wrote:
> > > > When starting a guest via libvirt with "virsh start --console ...",
> > > > the first second of the console output is missing. This is especially
> > > > annoying on s390x that only has a text console by default and no graphical
> > > > output - if the bios fails to boot here, the information about what went
> > > > wrong is completely lost.
> > > >
> > > > One part of the problem (there is also some things to be done on the
> > > > libvirt side) is that QEMU only checks with a 1 second timer whether
> > > > the other side of the pty is already connected, so the first second of
> > > > the console output is always lost.
> > > >
> > > > This likely used to work better in the past, since the code once checked
> > > > for a re-connection during write, but this has been removed in commit
> > > > f8278c7d74 ("char-pty: remove the check for connection on write") to avoid
> > > > some locking.
> > > >
> > > > To ease the situation here at least a little bit, let's check with g_poll()
> > > > whether we could send out the data anyway, even if the connection has not
> > > > been marked as "connected" yet. The file descriptor is marked as non-blocking
> > > > anyway since commit fac6688a18 ("Do not hang on full PTY"), so this should
> > > > not cause any trouble if the other side is not ready for receiving yet.
> > > >
> > > > With this patch applied, I can now successfully see the bios output of
> > > > a s390x guest when running it with "virsh start --console" (with a patched
> > > > version of virsh that fixes the remaining issues there, too).
> > > >
> > > > Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
> > > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > > > ---
> > > >   chardev/char-pty.c | 22 +++++++++++++++++++---
> > > >   1 file changed, 19 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/chardev/char-pty.c b/chardev/char-pty.c
> > > > index 4e5deac18a..fad12dfef3 100644
> > > > --- a/chardev/char-pty.c
> > > > +++ b/chardev/char-pty.c
> > > > @@ -106,11 +106,27 @@ static void pty_chr_update_read_handler(Chardev *chr)
> > > >   static int char_pty_chr_write(Chardev *chr, const uint8_t *buf, int len)
> > > >   {
> > > >       PtyChardev *s = PTY_CHARDEV(chr);
> > > > +    GPollFD pfd;
> > > > +    int rc;
> > > > -    if (!s->connected) {
> > > > -        return len;
> > > > +    if (s->connected) {
> > > > +        return io_channel_send(s->ioc, buf, len);
> > > >       }
> > > > -    return io_channel_send(s->ioc, buf, len);
> > > > +
> > > > +    /*
> > > > +     * The other side might already be re-connected, but the timer might
> > > > +     * not have fired yet. So let's check here whether we can write again:
> > > > +     */
> > > > +    pfd.fd = QIO_CHANNEL_FILE(s->ioc)->fd;
> > > > +    pfd.events = G_IO_OUT;
> > > > +    pfd.revents = 0;
> > > > +    rc = RETRY_ON_EINTR(g_poll(&pfd, 1, 0));
> > > > +    g_assert(rc >= 0);
> > > > +    if (!(pfd.revents & G_IO_HUP) && (pfd.revents & G_IO_OUT)) {
> > >
> > > Should (can?) we call
> > >
> > >     pty_chr_state(chr, 1);
> > >
> > > here ?
> >
> > As far as I understood commit f8278c7d74c6 and f7ea2038bea04628, this is not
> > possible anymore since the lock has been removed.
> >
> > > > +        io_channel_send(s->ioc, buf, len);
> > >
> > > As it feels a little dirty to be sending data before setting the
> > > 'connected == 1' and thus issuing the 'CHR_EVENT_OPENED' event
> >
> > I didn't find a really better solution so far. We could maybe introduce a
> > buffer in the char-pty code and store the last second of guest output, but
> > IMHO that's way more complex and thus somewhat ugly, too?
>
> The orignal commit f8278c7d74c6 said
>
> [quote]
>     char-pty: remove the check for connection on write
>
>     This doesn't help much compared to the 1 second poll PTY
>     timer. I can't think of a use case where this would help.
> [/quote]
>
> We've now identified a use case where it is actually important.
>
> IOW, there's a justification to revert both f7ea2038bea04628 and
> f8278c7d74c6, re-adding the locking and write update logic.

Indeed. But isn't it possible to watch for IO_OUT and get rid of the timer?

Other thing I question is whether the serial shouldn't return BUSY if
the chardev is disconnected..
Thomas Huth Aug. 17, 2023, 5:09 p.m. UTC | #5
On 17/08/2023 15.47, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Aug 17, 2023 at 5:06 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>>
>> On Thu, Aug 17, 2023 at 02:00:26PM +0200, Thomas Huth wrote:
>>> On 17/08/2023 12.32, Daniel P. Berrangé wrote:
>>>> On Wed, Aug 16, 2023 at 11:07:43PM +0200, Thomas Huth wrote:
>>>>> When starting a guest via libvirt with "virsh start --console ...",
>>>>> the first second of the console output is missing. This is especially
>>>>> annoying on s390x that only has a text console by default and no graphical
>>>>> output - if the bios fails to boot here, the information about what went
>>>>> wrong is completely lost.
>>>>>
>>>>> One part of the problem (there is also some things to be done on the
>>>>> libvirt side) is that QEMU only checks with a 1 second timer whether
>>>>> the other side of the pty is already connected, so the first second of
>>>>> the console output is always lost.
>>>>>
>>>>> This likely used to work better in the past, since the code once checked
>>>>> for a re-connection during write, but this has been removed in commit
>>>>> f8278c7d74 ("char-pty: remove the check for connection on write") to avoid
>>>>> some locking.
>>>>>
>>>>> To ease the situation here at least a little bit, let's check with g_poll()
>>>>> whether we could send out the data anyway, even if the connection has not
>>>>> been marked as "connected" yet. The file descriptor is marked as non-blocking
>>>>> anyway since commit fac6688a18 ("Do not hang on full PTY"), so this should
>>>>> not cause any trouble if the other side is not ready for receiving yet.
>>>>>
>>>>> With this patch applied, I can now successfully see the bios output of
>>>>> a s390x guest when running it with "virsh start --console" (with a patched
>>>>> version of virsh that fixes the remaining issues there, too).
>>>>>
>>>>> Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
>>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>>> ---
>>>>>    chardev/char-pty.c | 22 +++++++++++++++++++---
>>>>>    1 file changed, 19 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/chardev/char-pty.c b/chardev/char-pty.c
>>>>> index 4e5deac18a..fad12dfef3 100644
>>>>> --- a/chardev/char-pty.c
>>>>> +++ b/chardev/char-pty.c
>>>>> @@ -106,11 +106,27 @@ static void pty_chr_update_read_handler(Chardev *chr)
>>>>>    static int char_pty_chr_write(Chardev *chr, const uint8_t *buf, int len)
>>>>>    {
>>>>>        PtyChardev *s = PTY_CHARDEV(chr);
>>>>> +    GPollFD pfd;
>>>>> +    int rc;
>>>>> -    if (!s->connected) {
>>>>> -        return len;
>>>>> +    if (s->connected) {
>>>>> +        return io_channel_send(s->ioc, buf, len);
>>>>>        }
>>>>> -    return io_channel_send(s->ioc, buf, len);
>>>>> +
>>>>> +    /*
>>>>> +     * The other side might already be re-connected, but the timer might
>>>>> +     * not have fired yet. So let's check here whether we can write again:
>>>>> +     */
>>>>> +    pfd.fd = QIO_CHANNEL_FILE(s->ioc)->fd;
>>>>> +    pfd.events = G_IO_OUT;
>>>>> +    pfd.revents = 0;
>>>>> +    rc = RETRY_ON_EINTR(g_poll(&pfd, 1, 0));
>>>>> +    g_assert(rc >= 0);
>>>>> +    if (!(pfd.revents & G_IO_HUP) && (pfd.revents & G_IO_OUT)) {
>>>>
>>>> Should (can?) we call
>>>>
>>>>      pty_chr_state(chr, 1);
>>>>
>>>> here ?
>>>
>>> As far as I understood commit f8278c7d74c6 and f7ea2038bea04628, this is not
>>> possible anymore since the lock has been removed.
>>>
>>>>> +        io_channel_send(s->ioc, buf, len);
>>>>
>>>> As it feels a little dirty to be sending data before setting the
>>>> 'connected == 1' and thus issuing the 'CHR_EVENT_OPENED' event
>>>
>>> I didn't find a really better solution so far. We could maybe introduce a
>>> buffer in the char-pty code and store the last second of guest output, but
>>> IMHO that's way more complex and thus somewhat ugly, too?
>>
>> The orignal commit f8278c7d74c6 said
>>
>> [quote]
>>      char-pty: remove the check for connection on write
>>
>>      This doesn't help much compared to the 1 second poll PTY
>>      timer. I can't think of a use case where this would help.
>> [/quote]
>>
>> We've now identified a use case where it is actually important.
>>
>> IOW, there's a justification to revert both f7ea2038bea04628 and
>> f8278c7d74c6, re-adding the locking and write update logic.
> 
> Indeed. But isn't it possible to watch for IO_OUT and get rid of the timer?

It might be possible - Marc Hartmayer just sent me a draft patch today that 
uses qio_channel_add_watch() and gets rid of the timer ... I'll do some 
experiments with that and send it out if it works reliably.

  Thomas
Amaury Pouly Aug. 23, 2023, 9:44 a.m. UTC | #6
>> When starting a guest via libvirt with "virsh start --console ...",
>> the first second of the console output is missing. This is especially
>> annoying on s390x that only has a text console by default and no graphical
>> output - if the bios fails to boot here, the information about what went
>> wrong is completely lost.

Hi, we recently ran into this problem in a different scenario where we
manually invoke QEMU and connect to the PTY before starting the machine.
With the existing code, we have to add a one second delay before starting just
to make sure that we capture the early boot message. We are now running a 
patched version of QEMU with pretty much the same change as yours and it seems
to work well. We are very interested in seeing this change, or a variant
thereof merged.
Thomas Huth Aug. 28, 2023, 12:23 p.m. UTC | #7
On 17/08/2023 19.09, Thomas Huth wrote:
> On 17/08/2023 15.47, Marc-André Lureau wrote:
>> Hi
>>
>> On Thu, Aug 17, 2023 at 5:06 PM Daniel P. Berrangé <berrange@redhat.com> 
>> wrote:
>>>
>>> On Thu, Aug 17, 2023 at 02:00:26PM +0200, Thomas Huth wrote:
>>>> On 17/08/2023 12.32, Daniel P. Berrangé wrote:
>>>>> On Wed, Aug 16, 2023 at 11:07:43PM +0200, Thomas Huth wrote:
>>>>>> When starting a guest via libvirt with "virsh start --console ...",
>>>>>> the first second of the console output is missing. This is especially
>>>>>> annoying on s390x that only has a text console by default and no 
>>>>>> graphical
>>>>>> output - if the bios fails to boot here, the information about what went
>>>>>> wrong is completely lost.
>>>>>>
>>>>>> One part of the problem (there is also some things to be done on the
>>>>>> libvirt side) is that QEMU only checks with a 1 second timer whether
>>>>>> the other side of the pty is already connected, so the first second of
>>>>>> the console output is always lost.
>>>>>>
>>>>>> This likely used to work better in the past, since the code once checked
>>>>>> for a re-connection during write, but this has been removed in commit
>>>>>> f8278c7d74 ("char-pty: remove the check for connection on write") to 
>>>>>> avoid
>>>>>> some locking.
>>>>>>
>>>>>> To ease the situation here at least a little bit, let's check with 
>>>>>> g_poll()
>>>>>> whether we could send out the data anyway, even if the connection has not
>>>>>> been marked as "connected" yet. The file descriptor is marked as 
>>>>>> non-blocking
>>>>>> anyway since commit fac6688a18 ("Do not hang on full PTY"), so this 
>>>>>> should
>>>>>> not cause any trouble if the other side is not ready for receiving yet.
>>>>>>
>>>>>> With this patch applied, I can now successfully see the bios output of
>>>>>> a s390x guest when running it with "virsh start --console" (with a 
>>>>>> patched
>>>>>> version of virsh that fixes the remaining issues there, too).
>>>>>>
>>>>>> Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
>>>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>>>> ---
>>>>>>    chardev/char-pty.c | 22 +++++++++++++++++++---
>>>>>>    1 file changed, 19 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/chardev/char-pty.c b/chardev/char-pty.c
>>>>>> index 4e5deac18a..fad12dfef3 100644
>>>>>> --- a/chardev/char-pty.c
>>>>>> +++ b/chardev/char-pty.c
>>>>>> @@ -106,11 +106,27 @@ static void pty_chr_update_read_handler(Chardev 
>>>>>> *chr)
>>>>>>    static int char_pty_chr_write(Chardev *chr, const uint8_t *buf, int 
>>>>>> len)
>>>>>>    {
>>>>>>        PtyChardev *s = PTY_CHARDEV(chr);
>>>>>> +    GPollFD pfd;
>>>>>> +    int rc;
>>>>>> -    if (!s->connected) {
>>>>>> -        return len;
>>>>>> +    if (s->connected) {
>>>>>> +        return io_channel_send(s->ioc, buf, len);
>>>>>>        }
>>>>>> -    return io_channel_send(s->ioc, buf, len);
>>>>>> +
>>>>>> +    /*
>>>>>> +     * The other side might already be re-connected, but the timer might
>>>>>> +     * not have fired yet. So let's check here whether we can write 
>>>>>> again:
>>>>>> +     */
>>>>>> +    pfd.fd = QIO_CHANNEL_FILE(s->ioc)->fd;
>>>>>> +    pfd.events = G_IO_OUT;
>>>>>> +    pfd.revents = 0;
>>>>>> +    rc = RETRY_ON_EINTR(g_poll(&pfd, 1, 0));
>>>>>> +    g_assert(rc >= 0);
>>>>>> +    if (!(pfd.revents & G_IO_HUP) && (pfd.revents & G_IO_OUT)) {
>>>>>
>>>>> Should (can?) we call
>>>>>
>>>>>      pty_chr_state(chr, 1);
>>>>>
>>>>> here ?
>>>>
>>>> As far as I understood commit f8278c7d74c6 and f7ea2038bea04628, this is 
>>>> not
>>>> possible anymore since the lock has been removed.
>>>>
>>>>>> +        io_channel_send(s->ioc, buf, len);
>>>>>
>>>>> As it feels a little dirty to be sending data before setting the
>>>>> 'connected == 1' and thus issuing the 'CHR_EVENT_OPENED' event
>>>>
>>>> I didn't find a really better solution so far. We could maybe introduce a
>>>> buffer in the char-pty code and store the last second of guest output, but
>>>> IMHO that's way more complex and thus somewhat ugly, too?
>>>
>>> The orignal commit f8278c7d74c6 said
>>>
>>> [quote]
>>>      char-pty: remove the check for connection on write
>>>
>>>      This doesn't help much compared to the 1 second poll PTY
>>>      timer. I can't think of a use case where this would help.
>>> [/quote]
>>>
>>> We've now identified a use case where it is actually important.
>>>
>>> IOW, there's a justification to revert both f7ea2038bea04628 and
>>> f8278c7d74c6, re-adding the locking and write update logic.
>>
>> Indeed. But isn't it possible to watch for IO_OUT and get rid of the timer?
> 
> It might be possible - Marc Hartmayer just sent me a draft patch today that 
> uses qio_channel_add_watch() and gets rid of the timer ... I'll do some 
> experiments with that and send it out if it works reliably.

I did quite a bunch of experiments with the code, but as far as I can see, 
it's not so straight forward: IO_OUT keeps being enabled once IO_HUP occurs 
since the kernel can still buffer some output before it de-asserts IO_OUT. 
So there is no easy way to check for a re-connection without some kind of 
regular polling, as far as I can see...

Well, maybe something like this might be possible: Add a watch for IO_HUP. 
Once HUP occurs, don't signal CHR_EVENT_CLOSED yet, but set a flag variable 
and in case the guest still writes something to the chardev and the flag is 
true, we write the bytes to the pty one by one (to avoid locking in case the 
buffer gets full) and check for IO_OUT via g_poll() after each write. Once 
we see IO_OUT being gone, we can signal CHR_EVENT_CLOSED and add a watch for 
IO_OUT being enabled again. Once that watch triggers, we know that the other 
side re-connected and we can signal CHR_EVENT_OPENED and continue normally. 
... sounds all quite complicated, is it worth the effort? What do you think?

  Thomas
Thomas Huth Sept. 28, 2023, 5:18 a.m. UTC | #8
On 28/08/2023 14.23, Thomas Huth wrote:
> On 17/08/2023 19.09, Thomas Huth wrote:
>> On 17/08/2023 15.47, Marc-André Lureau wrote:
>>> Hi
>>>
>>> On Thu, Aug 17, 2023 at 5:06 PM Daniel P. Berrangé <berrange@redhat.com> 
>>> wrote:
>>>>
>>>> On Thu, Aug 17, 2023 at 02:00:26PM +0200, Thomas Huth wrote:
>>>>> On 17/08/2023 12.32, Daniel P. Berrangé wrote:
>>>>>> On Wed, Aug 16, 2023 at 11:07:43PM +0200, Thomas Huth wrote:
>>>>>>> When starting a guest via libvirt with "virsh start --console ...",
>>>>>>> the first second of the console output is missing. This is especially
>>>>>>> annoying on s390x that only has a text console by default and no 
>>>>>>> graphical
>>>>>>> output - if the bios fails to boot here, the information about what went
>>>>>>> wrong is completely lost.
>>>>>>>
>>>>>>> One part of the problem (there is also some things to be done on the
>>>>>>> libvirt side) is that QEMU only checks with a 1 second timer whether
>>>>>>> the other side of the pty is already connected, so the first second of
>>>>>>> the console output is always lost.
>>>>>>>
>>>>>>> This likely used to work better in the past, since the code once checked
>>>>>>> for a re-connection during write, but this has been removed in commit
>>>>>>> f8278c7d74 ("char-pty: remove the check for connection on write") to 
>>>>>>> avoid
>>>>>>> some locking.
>>>>>>>
>>>>>>> To ease the situation here at least a little bit, let's check with 
>>>>>>> g_poll()
>>>>>>> whether we could send out the data anyway, even if the connection has 
>>>>>>> not
>>>>>>> been marked as "connected" yet. The file descriptor is marked as 
>>>>>>> non-blocking
>>>>>>> anyway since commit fac6688a18 ("Do not hang on full PTY"), so this 
>>>>>>> should
>>>>>>> not cause any trouble if the other side is not ready for receiving yet.
>>>>>>>
>>>>>>> With this patch applied, I can now successfully see the bios output of
>>>>>>> a s390x guest when running it with "virsh start --console" (with a 
>>>>>>> patched
>>>>>>> version of virsh that fixes the remaining issues there, too).
>>>>>>>
>>>>>>> Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
>>>>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>>>>> ---
>>>>>>>    chardev/char-pty.c | 22 +++++++++++++++++++---
>>>>>>>    1 file changed, 19 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/chardev/char-pty.c b/chardev/char-pty.c
>>>>>>> index 4e5deac18a..fad12dfef3 100644
>>>>>>> --- a/chardev/char-pty.c
>>>>>>> +++ b/chardev/char-pty.c
>>>>>>> @@ -106,11 +106,27 @@ static void pty_chr_update_read_handler(Chardev 
>>>>>>> *chr)
>>>>>>>    static int char_pty_chr_write(Chardev *chr, const uint8_t *buf, 
>>>>>>> int len)
>>>>>>>    {
>>>>>>>        PtyChardev *s = PTY_CHARDEV(chr);
>>>>>>> +    GPollFD pfd;
>>>>>>> +    int rc;
>>>>>>> -    if (!s->connected) {
>>>>>>> -        return len;
>>>>>>> +    if (s->connected) {
>>>>>>> +        return io_channel_send(s->ioc, buf, len);
>>>>>>>        }
>>>>>>> -    return io_channel_send(s->ioc, buf, len);
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * The other side might already be re-connected, but the timer 
>>>>>>> might
>>>>>>> +     * not have fired yet. So let's check here whether we can write 
>>>>>>> again:
>>>>>>> +     */
>>>>>>> +    pfd.fd = QIO_CHANNEL_FILE(s->ioc)->fd;
>>>>>>> +    pfd.events = G_IO_OUT;
>>>>>>> +    pfd.revents = 0;
>>>>>>> +    rc = RETRY_ON_EINTR(g_poll(&pfd, 1, 0));
>>>>>>> +    g_assert(rc >= 0);
>>>>>>> +    if (!(pfd.revents & G_IO_HUP) && (pfd.revents & G_IO_OUT)) {
>>>>>>
>>>>>> Should (can?) we call
>>>>>>
>>>>>>      pty_chr_state(chr, 1);
>>>>>>
>>>>>> here ?
>>>>>
>>>>> As far as I understood commit f8278c7d74c6 and f7ea2038bea04628, this 
>>>>> is not
>>>>> possible anymore since the lock has been removed.
>>>>>
>>>>>>> +        io_channel_send(s->ioc, buf, len);
>>>>>>
>>>>>> As it feels a little dirty to be sending data before setting the
>>>>>> 'connected == 1' and thus issuing the 'CHR_EVENT_OPENED' event
>>>>>
>>>>> I didn't find a really better solution so far. We could maybe introduce a
>>>>> buffer in the char-pty code and store the last second of guest output, but
>>>>> IMHO that's way more complex and thus somewhat ugly, too?
>>>>
>>>> The orignal commit f8278c7d74c6 said
>>>>
>>>> [quote]
>>>>      char-pty: remove the check for connection on write
>>>>
>>>>      This doesn't help much compared to the 1 second poll PTY
>>>>      timer. I can't think of a use case where this would help.
>>>> [/quote]
>>>>
>>>> We've now identified a use case where it is actually important.
>>>>
>>>> IOW, there's a justification to revert both f7ea2038bea04628 and
>>>> f8278c7d74c6, re-adding the locking and write update logic.
>>>
>>> Indeed. But isn't it possible to watch for IO_OUT and get rid of the timer?
>>
>> It might be possible - Marc Hartmayer just sent me a draft patch today 
>> that uses qio_channel_add_watch() and gets rid of the timer ... I'll do 
>> some experiments with that and send it out if it works reliably.
> 
> I did quite a bunch of experiments with the code, but as far as I can see, 
> it's not so straight forward: IO_OUT keeps being enabled once IO_HUP occurs 
> since the kernel can still buffer some output before it de-asserts IO_OUT. 
> So there is no easy way to check for a re-connection without some kind of 
> regular polling, as far as I can see...
> 
> Well, maybe something like this might be possible: Add a watch for IO_HUP. 
> Once HUP occurs, don't signal CHR_EVENT_CLOSED yet, but set a flag variable 
> and in case the guest still writes something to the chardev and the flag is 
> true, we write the bytes to the pty one by one (to avoid locking in case the 
> buffer gets full) and check for IO_OUT via g_poll() after each write. Once 
> we see IO_OUT being gone, we can signal CHR_EVENT_CLOSED and add a watch for 
> IO_OUT being enabled again. Once that watch triggers, we know that the other 
> side re-connected and we can signal CHR_EVENT_OPENED and continue normally. 
> ... sounds all quite complicated, is it worth the effort? What do you think?

Ping!

Marc-André, Daniel, any thoughts on this? Would such a rewrite be worth the 
effort? Or could we just go ahead with my simple patch here? Or would you 
prefer reverting the locking patches?

  Thomas
Daniel P. Berrangé Sept. 28, 2023, 11:10 a.m. UTC | #9
On Wed, Aug 16, 2023 at 11:07:43PM +0200, Thomas Huth wrote:
> When starting a guest via libvirt with "virsh start --console ...",
> the first second of the console output is missing. This is especially
> annoying on s390x that only has a text console by default and no graphical
> output - if the bios fails to boot here, the information about what went
> wrong is completely lost.
> 
> One part of the problem (there is also some things to be done on the
> libvirt side) is that QEMU only checks with a 1 second timer whether
> the other side of the pty is already connected, so the first second of
> the console output is always lost.
> 
> This likely used to work better in the past, since the code once checked
> for a re-connection during write, but this has been removed in commit
> f8278c7d74 ("char-pty: remove the check for connection on write") to avoid
> some locking.
> 
> To ease the situation here at least a little bit, let's check with g_poll()
> whether we could send out the data anyway, even if the connection has not
> been marked as "connected" yet. The file descriptor is marked as non-blocking
> anyway since commit fac6688a18 ("Do not hang on full PTY"), so this should
> not cause any trouble if the other side is not ready for receiving yet.
> 
> With this patch applied, I can now successfully see the bios output of
> a s390x guest when running it with "virsh start --console" (with a patched
> version of virsh that fixes the remaining issues there, too).
> 
> Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  chardev/char-pty.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
diff mbox series

Patch

diff --git a/chardev/char-pty.c b/chardev/char-pty.c
index 4e5deac18a..fad12dfef3 100644
--- a/chardev/char-pty.c
+++ b/chardev/char-pty.c
@@ -106,11 +106,27 @@  static void pty_chr_update_read_handler(Chardev *chr)
 static int char_pty_chr_write(Chardev *chr, const uint8_t *buf, int len)
 {
     PtyChardev *s = PTY_CHARDEV(chr);
+    GPollFD pfd;
+    int rc;
 
-    if (!s->connected) {
-        return len;
+    if (s->connected) {
+        return io_channel_send(s->ioc, buf, len);
     }
-    return io_channel_send(s->ioc, buf, len);
+
+    /*
+     * The other side might already be re-connected, but the timer might
+     * not have fired yet. So let's check here whether we can write again:
+     */
+    pfd.fd = QIO_CHANNEL_FILE(s->ioc)->fd;
+    pfd.events = G_IO_OUT;
+    pfd.revents = 0;
+    rc = RETRY_ON_EINTR(g_poll(&pfd, 1, 0));
+    g_assert(rc >= 0);
+    if (!(pfd.revents & G_IO_HUP) && (pfd.revents & G_IO_OUT)) {
+        io_channel_send(s->ioc, buf, len);
+    }
+
+    return len;
 }
 
 static GSource *pty_chr_add_watch(Chardev *chr, GIOCondition cond)