diff mbox

[tpmdd-devel] tpm: vtpm_proxy: Do not access host's event log

Message ID ef1f954d-fc52-0522-01f7-b0e31ea14c59@linux.vnet.ibm.com
State New
Headers show

Commit Message

Stefan Berger Nov. 17, 2016, 12:35 p.m. UTC
On 11/16/2016 03:07 PM, Jason Gunthorpe wrote:
> On Wed, Nov 16, 2016 at 12:07:23PM -0500, Stefan Berger wrote:
>> The culprit seems to be 'tpm: fix the missing .owner in
>> tpm_bios_measurements_ops'
> That is unlikely, it is probably the patch before which calls read_log
> unconditionally now. That suggests the crashing is a little random..

I ran the vtpm driver test suite (with -j32) a few times at that patch 
and it didn't crash. It crashes severely with later patches applied. 
Here's the current experimental patch that fixes these problems:

iff --git a/drivers/char/tpm/tpm_acpi.c b/drivers/char/tpm/tpm_acpi.c
index 0cb43ef..a73295a 100644
don't want to attempt to read the log there, either. I think the most 
straight-forward way would be to gate this whole function with a flag 
that only the vtpm proxy driver has: TPM_CHIP_FLAG_NO_FIRMWARE_LOG.

      return rc;


    Stefan

>
> tpm_read_log_acpi should check if the chip has a acpi_dev_handle
> before running, but it also shouldn't crash - so there are two bugs
> here I guess.. Please do that instead of the checking the virual flag.
>
> Jarkko, you know acpi better, we switched ppi to search starting from
> the acpi_dev_handle for its data - can we do the same for event log?
>
>> The crash looks like this:
>> [  173.608722]  [<ffffffff8140ca11>] dump_stack+0x63/0x82
>> [  173.608722]  [<ffffffff8106b99f>] iounmap.part.1+0x7f/0x90
>> [  173.608722]  [<ffffffff8106b9dc>] iounmap+0x2c/0x30
>> [  173.608722]  [<ffffffff81496c75>] acpi_os_map_cleanup.part.10+0x31/0x3e
>> [  173.608722]  [<ffffffff8179629c>] acpi_os_unmap_iomem+0xcb/0xd2
>> [  173.608722]  [<ffffffffa00e1b28>] read_log+0xc8/0x19e [tpm]
> This seems really strange ACPI should not crash like this - yes it
> will wrongly read the log for the system into the vtpm, but that
> *should* work.
>
> Are you doing anything special with this test like high parallism or
> something?  Any chance you can look at little more? The tpm code looks
> OK to me, the map and unmap are properly paired - but the bad address
> from the iounmap suggests the refcounting in acpi is not working or
> something along those lines?
>
> Jason
>


------------------------------------------------------------------------------

Comments

Jason Gunthorpe Nov. 17, 2016, 6:10 p.m. UTC | #1
1;2802;0cOn Thu, Nov 17, 2016 at 07:35:05AM -0500, Stefan Berger wrote:
> I ran the vtpm driver test suite (with -j32) a few times at that patch and
> it didn't crash. It crashes severely with later patches applied. Here's the
> current experimental patch that fixes these problems:

I can't see how setting owner has any bearing on this.. I also don't
see why it should ever fail at all... It would be great to get a root
cause here - could it be memory corruption????

Getting a really bad feeling from this  :(

> iff --git a/drivers/char/tpm/tpm_acpi.c b/drivers/char/tpm/tpm_acpi.c
> index 0cb43ef..a73295a 100644
> +++ b/drivers/char/tpm/tpm_acpi.c
> @@ -56,6 +56,9 @@ int tpm_read_log_acpi(struct tpm_chip *chip)
> 
>      log = &chip->log;
> 
> +    if (!chip->acpi_dev_handle)
> +        return 0;
> +
> 
> // So ACPI is not supported on this device, but ACPI support is compiled in.
> I am returning 0 here, assuming it's not an OF device and the corresponding
> OF function need not be called (see below).

Return -ENODEV

> +    if (!(chip->flags & TPM_CHIP_FLAG_VIRTUAL))
> +        rc = tpm_read_log_of(chip);
> 
> // I am not sure how to handle this case, in case we get here, which would
> only be on an OF device (following 'return 0;' above), but we don't want to
> attempt to read the log there, either. I think the most straight-forward way
> would be to gate this whole function with a flag that only the vtpm proxy
> driver has: TPM_CHIP_FLAG_NO_FIRMWARE_LOG.

OF is already fine, it checks chip->dev.parent->of_node so it will
exit properly for vtpm, no need for this.

Jason

------------------------------------------------------------------------------
Stefan Berger Nov. 17, 2016, 6:25 p.m. UTC | #2
On 11/17/2016 01:10 PM, Jason Gunthorpe wrote:
> 1;2802;0cOn Thu, Nov 17, 2016 at 07:35:05AM -0500, Stefan Berger wrote:
>> I ran the vtpm driver test suite (with -j32) a few times at that patch and
>> it didn't crash. It crashes severely with later patches applied. Here's the
>> current experimental patch that fixes these problems:
> I can't see how setting owner has any bearing on this.. I also don't
> see why it should ever fail at all... It would be great to get a root
> cause here - could it be memory corruption????
>
> Getting a really bad feeling from this  :(
>
>> iff --git a/drivers/char/tpm/tpm_acpi.c b/drivers/char/tpm/tpm_acpi.c
>> index 0cb43ef..a73295a 100644
>> +++ b/drivers/char/tpm/tpm_acpi.c
>> @@ -56,6 +56,9 @@ int tpm_read_log_acpi(struct tpm_chip *chip)
>>
>>       log = &chip->log;
>>
>> +    if (!chip->acpi_dev_handle)
>> +        return 0;
>> +
>>
>> // So ACPI is not supported on this device, but ACPI support is compiled in.
>> I am returning 0 here, assuming it's not an OF device and the corresponding
>> OF function need not be called (see below).
> Return -ENODEV
>
>> +    if (!(chip->flags & TPM_CHIP_FLAG_VIRTUAL))
>> +        rc = tpm_read_log_of(chip);
>>
>> // I am not sure how to handle this case, in case we get here, which would
>> only be on an OF device (following 'return 0;' above), but we don't want to
>> attempt to read the log there, either. I think the most straight-forward way
>> would be to gate this whole function with a flag that only the vtpm proxy
>> driver has: TPM_CHIP_FLAG_NO_FIRMWARE_LOG.
> OF is already fine, it checks chip->dev.parent->of_node so it will
> exit properly for vtpm, no need for this.

In the case of x86, tpm_read_log_of() is a stub return -ENODEV, which in 
turn fails the whole device:

http://git.infradead.org/users/jjs/linux-tpmdd.git/blob/4d388433e85f8257f5a9344a7acf6f499ba2b29e:/drivers/char/tpm/tpm_eventlog.h#l87


http://git.infradead.org/users/jjs/linux-tpmdd.git/blob/4d388433e85f8257f5a9344a7acf6f499ba2b29e:/drivers/char/tpm/tpm-chip.c#l348

So we must not run into that or handle -ENODEV differently or return a 
different error code in the stub function.

I think the OF log reading code will also need to check for 
chip->dev.parent being NULL.

Further, I had the impression that the error unwinding following -ENODEV 
has an issue related to sysfs.

     Stefan
>
> Jason
>


------------------------------------------------------------------------------
Jason Gunthorpe Nov. 17, 2016, 6:33 p.m. UTC | #3
On Thu, Nov 17, 2016 at 01:25:54PM -0500, Stefan Berger wrote:
> 
> In the case of x86, tpm_read_log_of() is a stub return -ENODEV, which in
> turn fails the whole device:

Somehow this got screwed up during the lengthy review. ENODEV is the
right return from the leaf routines but the tests in tpm_eventlog di
not get fixed:

> http://git.infradead.org/users/jjs/linux-tpmdd.git/blob/4d388433e85f8257f5a9344a7acf6f499ba2b29e:/drivers/char/tpm/tpm_eventlog.h#l87

Is wrong, should be:

if (rc != -ENODEV)
   return rc;

And the one in tpm_bios_log_setup should be

if (rc != 0 && rc != -ENODEV)
    return rc;
    
> I think the OF log reading code will also need to check for chip->dev.parent
> being NULL.

Currect! Lets get that fixed too. :(

> Further, I had the impression that the error unwinding following -ENODEV has
> an issue related to sysfs.

I don't follow this comment..

Jason

------------------------------------------------------------------------------
Jarkko Sakkinen Nov. 17, 2016, 8:37 p.m. UTC | #4
On Thu, Nov 17, 2016 at 07:35:05AM -0500, Stefan Berger wrote:
> On 11/16/2016 03:07 PM, Jason Gunthorpe wrote:
> > On Wed, Nov 16, 2016 at 12:07:23PM -0500, Stefan Berger wrote:
> > > The culprit seems to be 'tpm: fix the missing .owner in
> > > tpm_bios_measurements_ops'
> > That is unlikely, it is probably the patch before which calls read_log
> > unconditionally now. That suggests the crashing is a little random..
> 
> I ran the vtpm driver test suite (with -j32) a few times at that patch and
> it didn't crash. It crashes severely with later patches applied. Here's the
> current experimental patch that fixes these problems:
> 
> iff --git a/drivers/char/tpm/tpm_acpi.c b/drivers/char/tpm/tpm_acpi.c
> index 0cb43ef..a73295a 100644
> --- a/drivers/char/tpm/tpm_acpi.c
> +++ b/drivers/char/tpm/tpm_acpi.c
> @@ -56,6 +56,9 @@ int tpm_read_log_acpi(struct tpm_chip *chip)
> 
>      log = &chip->log;
> 
> +    if (!chip->acpi_dev_handle)
> +        return 0;
> +

If there is a problem in the TPM driver, this does not fix the
problem. It will mask the problem. Maybe there's an ACPI regression
in the rc tree?

This is a funky situation because those lines need to be there but
I do not want them before it is root caused that it is not a TPM
bug.

/Jarkko

------------------------------------------------------------------------------
Stefan Berger Nov. 17, 2016, 11:15 p.m. UTC | #5
On 11/17/2016 01:33 PM, Jason Gunthorpe wrote:
> On Thu, Nov 17, 2016 at 01:25:54PM -0500, Stefan Berger wrote:
>> In the case of x86, tpm_read_log_of() is a stub return -ENODEV, which in
>> turn fails the whole device:
> Somehow this got screwed up during the lengthy review. ENODEV is the
> right return from the leaf routines but the tests in tpm_eventlog di
> not get fixed:
>
>> http://git.infradead.org/users/jjs/linux-tpmdd.git/blob/4d388433e85f8257f5a9344a7acf6f499ba2b29e:/drivers/char/tpm/tpm_eventlog.h#l87
> Is wrong, should be:
>
> if (rc != -ENODEV)
>     return rc;
>
> And the one in tpm_bios_log_setup should be
>
> if (rc != 0 && rc != -ENODEV)
>      return rc;


Can you show a patch that shows where to place these two?

>      
>> I think the OF log reading code will also need to check for chip->dev.parent
>> being NULL.
> Currect! Lets get that fixed too. :(
>
>> Further, I had the impression that the error unwinding following -ENODEV has
>> an issue related to sysfs.
> I don't follow this comment..

I have encountered this error here, which gets masked when applying the 
previously shown patch.

[   58.270643] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000088
[   58.271017] IP: [<ffffffff812e9b7f>] kernfs_find_ns+0x1f/0x140
[   58.271017] PGD 0
[   58.271017] Oops: 0000 [#1] SMP
[   58.271017] Modules linked in: tpm_vtpm_proxy nf_conntrack_netbios_ns 
nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 
xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter 
ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 
ip6table_mangle ip6table_security ip6table_raw ip6table_filter 
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw 
crc32c_intel tpm_tis joydev virtio_balloon i2c_piix4 pcspkr i2c_core 
tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc 8139too 
virtio_pci 8139cp virtio_ring ata_generic mii serio_raw pata_acpi virtio 
floppy
[   58.271017] CPU: 10 PID: 1420 Comm: vtpmctrl Not tainted 4.9.0-rc5+ #607
[   58.271017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS rel-1.9.0-156-g3560877 04/01/2014
[   58.271017] task: ffff8802abb58000 task.stack: ffffc90002604000
[   58.271017] RIP: 0010:[<ffffffff812e9b7f>] [<ffffffff812e9b7f>] 
kernfs_find_ns+0x1f/0x140
[   58.271017] RSP: 0018:ffffc90002607af8  EFLAGS: 00010292
[   58.271017] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
ffff8802abb58820
[   58.271017] RDX: 0000000000000000 RSI: ffffffff81885960 RDI: 
0000000000000000
[   58.271017] RBP: ffffc90002607b28 R08: 0000000000000000 R09: 
b39d2cf200000000
[   58.271017] R10: 0000000000000000 R11: 0000000000000000 R12: 
ffffffff81885960
[   58.271017] R13: 0000000000000000 R14: ffffffff81885960 R15: 
ffff8802acaf4360
[   58.271017] FS:  0000000000000000(0000) GS:ffff8802b2480000(0000) 
knlGS:0000000000000000
[   58.271017] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   58.271017] CR2: 0000000000000088 CR3: 0000000001c0a000 CR4: 
00000000000006e0
[   58.271017] Stack:
[   58.271017]  000000001efbca09 0000000000000000 ffffffff81885960 
0000000000000000
[   58.271017]  ffff8802b0168fe0 ffff8802acaf4360 ffffc90002607b50 
ffffffff812e9cd3
[   58.271017]  ffffffff81cf3dc0 ffff8802a8654000 0000000000000000 
ffffc90002607b70
[   58.271017] Call Trace:
[   58.271017]  [<ffffffff812e9cd3>] kernfs_find_and_get_ns+0x33/0x60
[   58.271017]  [<ffffffff812ed5dd>] sysfs_unmerge_group+0x1d/0x60
[   58.271017]  [<ffffffff8155bd32>] dpm_sysfs_remove+0x22/0x60
[   58.271017]  [<ffffffff8154e438>] device_del+0x58/0x280
[   58.271017]  [<ffffffffa024c020>] tpm_chip_unregister+0x40/0xb0 [tpm]
[   58.271017]  [<ffffffffa0292360>] vtpm_proxy_fops_release+0x40/0x60 
[tpm_vtpm_proxy]
[   58.271017]  [<ffffffff812671ff>] __fput+0xdf/0x1f0
[   58.271017]  [<ffffffff8126734e>] ____fput+0xe/0x10
[   58.271017]  [<ffffffff810c8cde>] task_work_run+0x7e/0xa0
[   58.271017]  [<ffffffff810ad5d8>] do_exit+0x2f8/0xb20
[   58.271017]  [<ffffffff810b98a2>] ? get_signal+0xc2/0x6d0
[   58.271017]  [<ffffffff810ade90>] do_group_exit+0x50/0xd0
[   58.271017]  [<ffffffff810b9a6f>] get_signal+0x28f/0x6d0
[   58.271017]  [<ffffffff8102f0b7>] do_signal+0x37/0x6a0
[   58.271017]  [<ffffffffa02924bb>] ? vtpm_proxy_fops_read+0x13b/0x1b0 
[tpm_vtpm_proxy]
[   58.271017]  [<ffffffff810f1570>] ? wake_atomic_t_function+0x70/0x70
[   58.271017]  [<ffffffff8137e74b>] ? security_file_permission+0x9b/0xc0
[   58.271017]  [<ffffffff810032b6>] exit_to_usermode_loop+0x76/0xb0
[   58.271017]  [<ffffffff81003c5f>] syscall_return_slowpath+0xaf/0xc0
[   58.271017]  [<ffffffff817a1a49>] entry_SYSCALL_64_fastpath+0xac/0xae
[   58.271017] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 
55 49 89 f8 48 89 e5 41 57 41 56 41 55 41 54 49 89 f6 53 49 89 d5 48 83 
ec 08 <44> 0f b7 a7 88 00 00 00 8b 05 13 af 9e 00 48 8b 5f 68 66 41 83
[   58.271017] RIP  [<ffffffff812e9b7f>] kernfs_find_ns+0x1f/0x140
[   58.271017]  RSP <ffffc90002607af8>
[   58.271017] CR2: 0000000000000088
[   58.271017] ---[ end trace 88bc09bcfa89f874 ]---
[   58.271017] Fixing recursive fault but reboot is needed!





------------------------------------------------------------------------------
Jarkko Sakkinen Nov. 17, 2016, 11:43 p.m. UTC | #6
On Thu, Nov 17, 2016 at 06:15:20PM -0500, Stefan Berger wrote:
> On 11/17/2016 01:33 PM, Jason Gunthorpe wrote:
> > On Thu, Nov 17, 2016 at 01:25:54PM -0500, Stefan Berger wrote:
> > > In the case of x86, tpm_read_log_of() is a stub return -ENODEV, which in
> > > turn fails the whole device:
> > Somehow this got screwed up during the lengthy review. ENODEV is the
> > right return from the leaf routines but the tests in tpm_eventlog di
> > not get fixed:
> > 
> > > http://git.infradead.org/users/jjs/linux-tpmdd.git/blob/4d388433e85f8257f5a9344a7acf6f499ba2b29e:/drivers/char/tpm/tpm_eventlog.h#l87
> > Is wrong, should be:
> > 
> > if (rc != -ENODEV)
> >     return rc;
> > 
> > And the one in tpm_bios_log_setup should be
> > 
> > if (rc != 0 && rc != -ENODEV)
> >      return rc;
> 
> 
> Can you show a patch that shows where to place these two?

The disasterous error handling for cases that you mentioned:

http://git.infradead.org/users/jjs/linux-tpmdd.git/commitdiff/d660a91a1b9dd80f5c2c973e3369400d3c9f9848

I'm sorry I let these slip in the code review.

/Jarkko

------------------------------------------------------------------------------
Nayna Nov. 18, 2016, 12:23 p.m. UTC | #7
On 11/18/2016 12:03 AM, Jason Gunthorpe wrote:
> On Thu, Nov 17, 2016 at 01:25:54PM -0500, Stefan Berger wrote:
>>
>> In the case of x86, tpm_read_log_of() is a stub return -ENODEV, which in
>> turn fails the whole device:
>
> Somehow this got screwed up during the lengthy review. ENODEV is the
> right return from the leaf routines but the tests in tpm_eventlog di
> not get fixed:

Isn't the idea is that if event_log specific properties are not 
supported in ACPI/OF , then probe should continue except for 
ENOMEM/ENODEV error ?

Error conditions from read_log_acpi()/read_log_of():

        - rc =  EIO (Probe continue) -  TCPA is not supported/event log 
area empty/failed acpi_os_map_iomem/event log device tree properties not 
exist. This implies error log is not supported by the platform.
        - rc = ENOMEM (Probe fail) - event log is supported but kmalloc 
failed.
        - rc = ENODEV (Probe fail) -  represents device not supported.

I think because of assumption that if event log is not supported on 
ACPI, it might be OF platform and so do tpm_read_log_of(), the EIO error 
from tpm_read_log_acpi() is getting masked.

Please let me know if I am missing something.

>
>> http://git.infradead.org/users/jjs/linux-tpmdd.git/blob/4d388433e85f8257f5a9344a7acf6f499ba2b29e:/drivers/char/tpm/tpm_eventlog.h#l87
>
> Is wrong, should be:
>
> if (rc != -ENODEV)
>     return rc;

Did you mean this check in tpm_chip_register() ?

Thanks & Regards,
   - Nayna

>
> And the one in tpm_bios_log_setup should be
>
> if (rc != 0 && rc != -ENODEV)
>      return rc;
>
>> I think the OF log reading code will also need to check for chip->dev.parent
>> being NULL.
>
> Currect! Lets get that fixed too. :(
>
>> Further, I had the impression that the error unwinding following -ENODEV has
>> an issue related to sysfs.
>
> I don't follow this comment..
>
> Jason
>


------------------------------------------------------------------------------
Stefan Berger Nov. 18, 2016, 2:11 p.m. UTC | #8
On 11/17/2016 03:37 PM, Jarkko Sakkinen wrote:
> On Thu, Nov 17, 2016 at 07:35:05AM -0500, Stefan Berger wrote:
>> On 11/16/2016 03:07 PM, Jason Gunthorpe wrote:
>>> On Wed, Nov 16, 2016 at 12:07:23PM -0500, Stefan Berger wrote:
>>>> The culprit seems to be 'tpm: fix the missing .owner in
>>>> tpm_bios_measurements_ops'
>>> That is unlikely, it is probably the patch before which calls read_log
>>> unconditionally now. That suggests the crashing is a little random..
>> I ran the vtpm driver test suite (with -j32) a few times at that patch and
>> it didn't crash. It crashes severely with later patches applied. Here's the
>> current experimental patch that fixes these problems:
>>
>> iff --git a/drivers/char/tpm/tpm_acpi.c b/drivers/char/tpm/tpm_acpi.c
>> index 0cb43ef..a73295a 100644
>> --- a/drivers/char/tpm/tpm_acpi.c
>> +++ b/drivers/char/tpm/tpm_acpi.c
>> @@ -56,6 +56,9 @@ int tpm_read_log_acpi(struct tpm_chip *chip)
>>
>>       log = &chip->log;
>>
>> +    if (!chip->acpi_dev_handle)
>> +        return 0;
>> +
> If there is a problem in the TPM driver, this does not fix the
> problem. It will mask the problem. Maybe there's an ACPI regression
> in the rc tree?

Following the path from here :

http://lxr.free-electrons.com/source/drivers/acpi/acpica/tbxface.c#L282

acpi_get_table_with_size -> acpi_tb_validate_table -> acpi_tb_acquire_table

I see acpi_os_map_memory being called in acpi_tb_acquire_table but not 
the corresponding acpi_os_unmap_memory...

     Stefan


>
> This is a funky situation because those lines need to be there but
> I do not want them before it is root caused that it is not a TPM
> bug.
>
> /Jarkko
>


------------------------------------------------------------------------------
Stefan Berger Nov. 18, 2016, 2:15 p.m. UTC | #9
On 11/18/2016 09:11 AM, Stefan Berger wrote:
> On 11/17/2016 03:37 PM, Jarkko Sakkinen wrote:
>> On Thu, Nov 17, 2016 at 07:35:05AM -0500, Stefan Berger wrote:
>>> On 11/16/2016 03:07 PM, Jason Gunthorpe wrote:
>>>> On Wed, Nov 16, 2016 at 12:07:23PM -0500, Stefan Berger wrote:
>>>>> The culprit seems to be 'tpm: fix the missing .owner in
>>>>> tpm_bios_measurements_ops'
>>>> That is unlikely, it is probably the patch before which calls read_log
>>>> unconditionally now. That suggests the crashing is a little random..
>>> I ran the vtpm driver test suite (with -j32) a few times at that 
>>> patch and
>>> it didn't crash. It crashes severely with later patches applied. 
>>> Here's the
>>> current experimental patch that fixes these problems:
>>>
>>> iff --git a/drivers/char/tpm/tpm_acpi.c b/drivers/char/tpm/tpm_acpi.c
>>> index 0cb43ef..a73295a 100644
>>> --- a/drivers/char/tpm/tpm_acpi.c
>>> +++ b/drivers/char/tpm/tpm_acpi.c
>>> @@ -56,6 +56,9 @@ int tpm_read_log_acpi(struct tpm_chip *chip)
>>>
>>>       log = &chip->log;
>>>
>>> +    if (!chip->acpi_dev_handle)
>>> +        return 0;
>>> +
>> If there is a problem in the TPM driver, this does not fix the
>> problem. It will mask the problem. Maybe there's an ACPI regression
>> in the rc tree?
>
> Following the path from here :
>
> http://lxr.free-electrons.com/source/drivers/acpi/acpica/tbxface.c#L282
>
> acpi_get_table_with_size -> acpi_tb_validate_table -> 
> acpi_tb_acquire_table
>
> I see acpi_os_map_memory being called in acpi_tb_acquire_table but not 
> the corresponding acpi_os_unmap_memory...
>

And to add to this: the call to acpi_get_table() in tpm_acpi.c alone is 
causing the crash. I am fairly sure that it has something to do with the 
memory mapping call above not being matched by an unmapping call.


>     Stefan
>
>
>>
>> This is a funky situation because those lines need to be there but
>> I do not want them before it is root caused that it is not a TPM
>> bug.
>>
>> /Jarkko
>>
>


------------------------------------------------------------------------------
Stefan Berger Nov. 18, 2016, 4:58 p.m. UTC | #10
On 11/18/2016 09:15 AM, Stefan Berger wrote:
> On 11/18/2016 09:11 AM, Stefan Berger wrote:
>> On 11/17/2016 03:37 PM, Jarkko Sakkinen wrote:
>>> On Thu, Nov 17, 2016 at 07:35:05AM -0500, Stefan Berger wrote:
>>>> On 11/16/2016 03:07 PM, Jason Gunthorpe wrote:
>>>>> On Wed, Nov 16, 2016 at 12:07:23PM -0500, Stefan Berger wrote:
>>>>>> The culprit seems to be 'tpm: fix the missing .owner in
>>>>>> tpm_bios_measurements_ops'
>>>>> That is unlikely, it is probably the patch before which calls 
>>>>> read_log
>>>>> unconditionally now. That suggests the crashing is a little random..
>>>> I ran the vtpm driver test suite (with -j32) a few times at that 
>>>> patch and
>>>> it didn't crash. It crashes severely with later patches applied. 
>>>> Here's the
>>>> current experimental patch that fixes these problems:
>>>>
>>>> iff --git a/drivers/char/tpm/tpm_acpi.c b/drivers/char/tpm/tpm_acpi.c
>>>> index 0cb43ef..a73295a 100644
>>>> --- a/drivers/char/tpm/tpm_acpi.c
>>>> +++ b/drivers/char/tpm/tpm_acpi.c
>>>> @@ -56,6 +56,9 @@ int tpm_read_log_acpi(struct tpm_chip *chip)
>>>>
>>>>       log = &chip->log;
>>>>
>>>> +    if (!chip->acpi_dev_handle)
>>>> +        return 0;
>>>> +
>>> If there is a problem in the TPM driver, this does not fix the
>>> problem. It will mask the problem. Maybe there's an ACPI regression
>>> in the rc tree?
>>
>> Following the path from here :
>>
>> http://lxr.free-electrons.com/source/drivers/acpi/acpica/tbxface.c#L282
>>
>> acpi_get_table_with_size -> acpi_tb_validate_table -> 
>> acpi_tb_acquire_table
>>
>> I see acpi_os_map_memory being called in acpi_tb_acquire_table but 
>> not the corresponding acpi_os_unmap_memory...
>>
>
> And to add to this: the call to acpi_get_table() in tpm_acpi.c alone 
> is causing the crash. I am fairly sure that it has something to do 
> with the memory mapping call above not being matched by an unmapping 
> call.

Have to take that all back. This is not the reason. What seems to be the 
reason is just the acpi_os_map_iomem() call. Without calling the 
acpi_os_unmap_iomem() later on, which I assume should be allowed, the 
driver will crash when the device terminates.

     Stefan

>
>
>>     Stefan
>>
>>
>>>
>>> This is a funky situation because those lines need to be there but
>>> I do not want them before it is root caused that it is not a TPM
>>> bug.
>>>
>>> /Jarkko
>>>
>>
>


------------------------------------------------------------------------------
Jason Gunthorpe Nov. 21, 2016, 6:32 p.m. UTC | #11
On Fri, Nov 18, 2016 at 11:58:08AM -0500, Stefan Berger wrote:

> reason is just the acpi_os_map_iomem() call. Without calling the
> acpi_os_unmap_iomem() later on, which I assume should be allowed, the driver
> will crash when the device terminates.

The oops looks like unbalacned map/unmap, can you add some tracing
around the failing address and see if that is possibly true, and who
does it?

Jason

------------------------------------------------------------------------------
diff mbox

Patch

--- a/drivers/char/tpm/tpm_acpi.c
+++ b/drivers/char/tpm/tpm_acpi.c
@@ -56,6 +56,9 @@  int tpm_read_log_acpi(struct tpm_chip *chip)

      log = &chip->log;

+    if (!chip->acpi_dev_handle)
+        return 0;
+

// So ACPI is not supported on this device, but ACPI support is compiled 
in. I am returning 0 here, assuming it's not an OF device and the 
corresponding OF function need not be called (see below).

      /* Find TCPA entry in RSDT (ACPI_LOGICAL_ADDRESSING) */
      status = acpi_get_table(ACPI_SIG_TCPA, 1,
                  (struct acpi_table_header **)&buff);
diff --git a/drivers/char/tpm/tpm_eventlog.c 
b/drivers/char/tpm/tpm_eventlog.c
index fb603a7..12b0356 100644
--- a/drivers/char/tpm/tpm_eventlog.c
+++ b/drivers/char/tpm/tpm_eventlog.c
@@ -380,7 +380,8 @@  static int tpm_read_log(struct tpm_chip *chip)
      if ((rc == 0) || (rc == -ENOMEM))
          return rc;

-    rc = tpm_read_log_of(chip);
+    if (!(chip->flags & TPM_CHIP_FLAG_VIRTUAL))
+        rc = tpm_read_log_of(chip);

// I am not sure how to handle this case, in case we get here, which 
would only be on an OF device (following 'return 0;' above), but we