Message ID | 20240328060009.650859-1-kconsul@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [v4] slof/fs/packages/disk-label.fs: improve checking for DOS boot partitions | expand |
First, sorry I am late into the discussion. Comments below. On Thu, 28 Mar 2024, at 17:00, Kautuk Consul wrote: > While testing with a qcow2 with a DOS boot partition it was found that > when we set the logical_block_size in the guest XML to >512 then the > boot would fail in the following interminable loop: Why would anyone tweak this? And when you do, what happens inside the SLOF, does it keep using 512? > <SNIP> > Trying to load: from: /pci@800000020000000/scsi@3 ... > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > </SNIP> > > Change the "read-sector" Forth subroutine to throw an exception whenever > it fails to read a full block-size length of sector from the disk. Why not throwing an exception from the "beyond end" message point? Or fail to open a device if SLOF does not like the block size? I forgot the internals :( > Also change the "open" method to initiate CATCH exception handling for the calls to > try-partitions/try-files which will also call read-sector which could potentially > now throw this new exception. > > After making the above changes, it fails properly with the correct error > message as follows: > <SNIP> > Trying to load: from: /pci@800000020000000/scsi@3 ... > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > virtioblk_transfer: Access beyond end of device! > > E3404: Not a bootable device! > > E3407: Load failed > > Type 'boot' and press return to continue booting the system. > Type 'reset-all' and press return to reboot the system. > > Ready! > 0 > > </SNIP> > > Signed-off-by: Kautuk Consul <kconsul@linux.ibm.com> > --- > slof/fs/packages/disk-label.fs | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs > index 661c6b0..a6fb231 100644 > --- a/slof/fs/packages/disk-label.fs > +++ b/slof/fs/packages/disk-label.fs > @@ -136,7 +136,8 @@ CONSTANT /gpt-part-entry > : read-sector ( sector-number -- ) > \ block-size is 0x200 on disks, 0x800 on cdrom drives > block-size * 0 seek drop \ seek to sector > - block block-size read drop \ read sector > + block block-size read \ read sector > + block-size < IF throw THEN \ if we read less than the block-size then throw an exception When it fails, is the number of bytes ever non zero? Thanks, > ; > > : (.part-entry) ( part-entry ) > @@ -723,10 +724,15 @@ CREATE GPT-LINUX-PARTITION 10 allot > THEN > > partition IF > - try-partitions > + ['] try-partitions > ELSE > - try-files > + ['] try-files > THEN > + > + \ Catch any exception that might happen due to read-sector failing to read > + \ block-size number of bytes from any sector of the disk. > + CATCH IF false THEN > + > dup 0= IF debug-disk-label? IF ." not found." cr THEN close THEN \ free memory again > ; > > -- > 2.31.1 > >
On 2024-04-04 11:35:43, Alexey Kardashevskiy wrote: > First, sorry I am late into the discussion. Comments below. > > > On Thu, 28 Mar 2024, at 17:00, Kautuk Consul wrote: > > While testing with a qcow2 with a DOS boot partition it was found that > > when we set the logical_block_size in the guest XML to >512 then the > > boot would fail in the following interminable loop: > > Why would anyone tweak this? And when you do, what happens inside the SLOF, does it keep using 512? Well, we had an image with DOS boot partition and we tested it with logical_block_size = 1024 and got this infinite loop. In SLOF the block-size becomes what we configure in the logical_block_size parameter. This same issue doesn't arise with GPT. > > > <SNIP> > > Trying to load: from: /pci@800000020000000/scsi@3 ... > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > </SNIP> > > > > Change the "read-sector" Forth subroutine to throw an exception whenever > > it fails to read a full block-size length of sector from the disk. > > Why not throwing an exception from the "beyond end" message point? > Or fail to open a device if SLOF does not like the block size? I forgot the internals :( This loop is interminable and this "Access beyond end of device!" message continues forever. SLOF doesn't have any option other than to use the block-size that was set in the logical_block_size parameter. It doesn't have any preference as the code is very generic for both DOS as well as GPT. > > > Also change the "open" method to initiate CATCH exception handling for the calls to > > try-partitions/try-files which will also call read-sector which could potentially > > now throw this new exception. > > > > After making the above changes, it fails properly with the correct error > > message as follows: > > <SNIP> > > Trying to load: from: /pci@800000020000000/scsi@3 ... > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > virtioblk_transfer: Access beyond end of device! > > > > E3404: Not a bootable device! > > > > E3407: Load failed > > > > Type 'boot' and press return to continue booting the system. > > Type 'reset-all' and press return to reboot the system. > > > > Ready! > > 0 > > > </SNIP> > > > > Signed-off-by: Kautuk Consul <kconsul@linux.ibm.com> > > --- > > slof/fs/packages/disk-label.fs | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs > > index 661c6b0..a6fb231 100644 > > --- a/slof/fs/packages/disk-label.fs > > +++ b/slof/fs/packages/disk-label.fs > > @@ -136,7 +136,8 @@ CONSTANT /gpt-part-entry > > : read-sector ( sector-number -- ) > > \ block-size is 0x200 on disks, 0x800 on cdrom drives > > block-size * 0 seek drop \ seek to sector > > - block block-size read drop \ read sector > > + block block-size read \ read sector > > + block-size < IF throw THEN \ if we read less than the block-size then throw an exception > > When it fails, is the number of bytes ever non zero? Thanks, No, it doesn't reach 0. It is lesser than the block-size. For example if we set the logcial_block_size to 1024, the block-size is that much. if we are reading the last sector which is physically only 512 bytes long then we read that 512 bytes which is lesser than 1024, which should be regarded as an error. > > > ; > > > > : (.part-entry) ( part-entry ) > > @@ -723,10 +724,15 @@ CREATE GPT-LINUX-PARTITION 10 allot > > THEN > > > > partition IF > > - try-partitions > > + ['] try-partitions > > ELSE > > - try-files > > + ['] try-files > > THEN > > + > > + \ Catch any exception that might happen due to read-sector failing to read > > + \ block-size number of bytes from any sector of the disk. > > + CATCH IF false THEN Segher/Alexey, can we keep this CATCH block or should I remove it ? > > + > > dup 0= IF debug-disk-label? IF ." not found." cr THEN close THEN \ free memory again > > ; > > > > -- > > 2.31.1 > > > >
On Thu, 4 Apr 2024, at 18:18, Kautuk Consul wrote: > On 2024-04-04 11:35:43, Alexey Kardashevskiy wrote: > > First, sorry I am late into the discussion. Comments below. > > > > > > On Thu, 28 Mar 2024, at 17:00, Kautuk Consul wrote: > > > While testing with a qcow2 with a DOS boot partition it was found that > > > when we set the logical_block_size in the guest XML to >512 then the > > > boot would fail in the following interminable loop: > > > > Why would anyone tweak this? And when you do, what happens inside the SLOF, does it keep using 512? > Well, we had an image with DOS boot partition and we tested it with > logical_block_size = 1024 and got this infinite loop. This does not really answer to "why" ;) > In SLOF the block-size becomes what we configure in the > logical_block_size parameter. This same issue doesn't arise with GPT. How is GPT different in this regard? > > > <SNIP> > > > Trying to load: from: /pci@800000020000000/scsi@3 ... > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > </SNIP> > > > > > > Change the "read-sector" Forth subroutine to throw an exception whenever > > > it fails to read a full block-size length of sector from the disk. > > > > Why not throwing an exception from the "beyond end" message point? > > Or fail to open a device if SLOF does not like the block size? I forgot the internals :( > This loop is interminable and this "Access beyond end of device!" > message continues forever. Where is that loop exactly? Put CATCH in there. > SLOF doesn't have any option other than to use the block-size that was > set in the logical_block_size parameter. It doesn't have any preference > as the code is very generic for both DOS as well as GPT. > > > > > Also change the "open" method to initiate CATCH exception handling for the calls to > > > try-partitions/try-files which will also call read-sector which could potentially > > > now throw this new exception. > > > > > > After making the above changes, it fails properly with the correct error > > > message as follows: > > > <SNIP> > > > Trying to load: from: /pci@800000020000000/scsi@3 ... > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > virtioblk_transfer: Access beyond end of device! > > > > > > E3404: Not a bootable device! > > > > > > E3407: Load failed > > > > > > Type 'boot' and press return to continue booting the system. > > > Type 'reset-all' and press return to reboot the system. > > > > > > Ready! > > > 0 > > > > </SNIP> > > > > > > Signed-off-by: Kautuk Consul <kconsul@linux.ibm.com> > > > --- > > > slof/fs/packages/disk-label.fs | 12 +++++++++--- > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs > > > index 661c6b0..a6fb231 100644 > > > --- a/slof/fs/packages/disk-label.fs > > > +++ b/slof/fs/packages/disk-label.fs > > > @@ -136,7 +136,8 @@ CONSTANT /gpt-part-entry > > > : read-sector ( sector-number -- ) > > > \ block-size is 0x200 on disks, 0x800 on cdrom drives > > > block-size * 0 seek drop \ seek to sector > > > - block block-size read drop \ read sector > > > + block block-size read \ read sector > > > + block-size < IF throw THEN \ if we read less than the block-size then throw an exception > > > > When it fails, is the number of bytes ever non zero? Thanks, > No, it doesn't reach 0. It is lesser than the block-size. For example if > we set the logcial_block_size to 1024, the block-size is that much. if > we are reading the last sector which is physically only 512 bytes long > then we read that 512 bytes which is lesser than 1024, which should be > regarded as an error. Ah so it only happens when there is an odd number of 512 sectors so reading the last one with block-size==1024 only reads a half => failure, is that right? > > > > > ; > > > > > > : (.part-entry) ( part-entry ) > > > @@ -723,10 +724,15 @@ CREATE GPT-LINUX-PARTITION 10 allot > > > THEN > > > > > > partition IF > > > - try-partitions > > > + ['] try-partitions > > > ELSE > > > - try-files > > > + ['] try-files > > > THEN > > > + > > > + \ Catch any exception that might happen due to read-sector failing to read > > > + \ block-size number of bytes from any sector of the disk. > > > + CATCH IF false THEN > Segher/Alexey, can we keep this CATCH block or should I remove it ? imho the original bug should be handled more gracefully. Seeing exceptions in the code just triggers exceptions in my small brain :) Thanks, > > > + > > > dup 0= IF debug-disk-label? IF ." not found." cr THEN close THEN \ free memory again > > > ; > > > > > > -- > > > 2.31.1 > > > > > > >
Hi, On 2024-04-05 14:46:14, Alexey Kardashevskiy wrote: > > > On Thu, 4 Apr 2024, at 18:18, Kautuk Consul wrote: > > On 2024-04-04 11:35:43, Alexey Kardashevskiy wrote: > > > First, sorry I am late into the discussion. Comments below. > > > > > > > > > On Thu, 28 Mar 2024, at 17:00, Kautuk Consul wrote: > > > > While testing with a qcow2 with a DOS boot partition it was found that > > > > when we set the logical_block_size in the guest XML to >512 then the > > > > boot would fail in the following interminable loop: > > > > > > Why would anyone tweak this? And when you do, what happens inside the SLOF, does it keep using 512? > > Well, we had an image with DOS boot partition and we tested it with > > logical_block_size = 1024 and got this infinite loop. > > This does not really answer to "why" ;) Well, my point is that it is tweakable in virsh/qemu and maybe we should be handling this error in configuration in SLOF properly ? There shouldn't be the possibility of an interminable loop in the software anywhere, right ? :-) > > > In SLOF the block-size becomes what we configure in the > > logical_block_size parameter. This same issue doesn't arise with GPT. > > How is GPT different in this regard? In GPT the sector number 1 contains the GPT headers, not sector 0 as in the case of DOS boot partition. So if the block-size is incorrect, the GPT magic number itself isn't found and the "E3404: Not a bootable device!" error is immediately thrown. > > > > > <SNIP> > > > > Trying to load: from: /pci@800000020000000/scsi@3 ... > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > </SNIP> > > > > > > > > Change the "read-sector" Forth subroutine to throw an exception whenever > > > > it fails to read a full block-size length of sector from the disk. > > > > > > Why not throwing an exception from the "beyond end" message point? > > > Or fail to open a device if SLOF does not like the block size? I forgot the internals :( > > This loop is interminable and this "Access beyond end of device!" > > message continues forever. > > Where is that loop exactly? Put CATCH in there. That loop is in count-dos-logical-partitions. The reason why I didn't put a CATCH in there is because there is already another CATCH statement in do-load in slof/fs/boot.fs which covers this throw. For the other path that doesn't have a CATCH I have inserted a CATCH in the open subroutine in disk-label.fs. > > > SLOF doesn't have any option other than to use the block-size that was > > set in the logical_block_size parameter. It doesn't have any preference > > as the code is very generic for both DOS as well as GPT. > > > > > > > Also change the "open" method to initiate CATCH exception handling for the calls to > > > > try-partitions/try-files which will also call read-sector which could potentially > > > > now throw this new exception. > > > > > > > > After making the above changes, it fails properly with the correct error > > > > message as follows: > > > > <SNIP> > > > > Trying to load: from: /pci@800000020000000/scsi@3 ... > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > virtioblk_transfer: Access beyond end of device! > > > > > > > > E3404: Not a bootable device! > > > > > > > > E3407: Load failed > > > > > > > > Type 'boot' and press return to continue booting the system. > > > > Type 'reset-all' and press return to reboot the system. > > > > > > > > Ready! > > > > 0 > > > > > </SNIP> > > > > > > > > Signed-off-by: Kautuk Consul <kconsul@linux.ibm.com> > > > > --- > > > > slof/fs/packages/disk-label.fs | 12 +++++++++--- > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs > > > > index 661c6b0..a6fb231 100644 > > > > --- a/slof/fs/packages/disk-label.fs > > > > +++ b/slof/fs/packages/disk-label.fs > > > > @@ -136,7 +136,8 @@ CONSTANT /gpt-part-entry > > > > : read-sector ( sector-number -- ) > > > > \ block-size is 0x200 on disks, 0x800 on cdrom drives > > > > block-size * 0 seek drop \ seek to sector > > > > - block block-size read drop \ read sector > > > > + block block-size read \ read sector > > > > + block-size < IF throw THEN \ if we read less than the block-size then throw an exception > > > > > > When it fails, is the number of bytes ever non zero? Thanks, > > No, it doesn't reach 0. It is lesser than the block-size. For example if > > we set the logcial_block_size to 1024, the block-size is that much. if > > we are reading the last sector which is physically only 512 bytes long > > then we read that 512 bytes which is lesser than 1024, which should be > > regarded as an error. > > Ah so it only happens when there is an odd number of 512 sectors so reading the last one with block-size==1024 only reads a half => failure, is that right? Yes. Or, block-size if set to 2048 or 4096 will also show the same problem. > > > > > > > > ; > > > > > > > > : (.part-entry) ( part-entry ) > > > > @@ -723,10 +724,15 @@ CREATE GPT-LINUX-PARTITION 10 allot > > > > THEN > > > > > > > > partition IF > > > > - try-partitions > > > > + ['] try-partitions > > > > ELSE > > > > - try-files > > > > + ['] try-files > > > > THEN > > > > + > > > > + \ Catch any exception that might happen due to read-sector failing to read > > > > + \ block-size number of bytes from any sector of the disk. > > > > + CATCH IF false THEN > > Segher/Alexey, can we keep this CATCH block or should I remove it ? > > imho the original bug should be handled more gracefully. Seeing exceptions in the code just triggers exceptions in my small brain :) Thanks, :-). But this is the only other path that doesn't have a CATCH like the do-load subroutine in slof/fs/boot.fs. According to Segher there shouldn't ever be a problem with throw because if nothing else the outer-most interpreter loop's CATCH will catch the exception. But I thought to cover this throw in read-sector more locally in places near to this functionality. Because the outermost FORTH SLOF interpreter loop is not really so related to reading a sector in disk-label.fs. > > > > > + > > > > dup 0= IF debug-disk-label? IF ." not found." cr THEN close THEN \ free memory again > > > > ; > > > > > > > > -- > > > > 2.31.1 > > > > > > > > > >
Hi, On 2024-04-05 13:46:44, Kautuk Consul wrote: > Hi, > > On 2024-04-05 14:46:14, Alexey Kardashevskiy wrote: > > > > > > On Thu, 4 Apr 2024, at 18:18, Kautuk Consul wrote: > > > On 2024-04-04 11:35:43, Alexey Kardashevskiy wrote: > > > > First, sorry I am late into the discussion. Comments below. > > > > > > > > > > > > On Thu, 28 Mar 2024, at 17:00, Kautuk Consul wrote: > > > > > While testing with a qcow2 with a DOS boot partition it was found that > > > > > when we set the logical_block_size in the guest XML to >512 then the > > > > > boot would fail in the following interminable loop: > > > > > > > > Why would anyone tweak this? And when you do, what happens inside the SLOF, does it keep using 512? > > > Well, we had an image with DOS boot partition and we tested it with > > > logical_block_size = 1024 and got this infinite loop. > > > > This does not really answer to "why" ;) > > Well, my point is that it is tweakable in virsh/qemu and maybe we should be > handling this error in configuration in SLOF properly ? There shouldn't > be the possibility of an interminable loop in the software anywhere, > right ? :-) > > > > > > In SLOF the block-size becomes what we configure in the > > > logical_block_size parameter. This same issue doesn't arise with GPT. > > > > How is GPT different in this regard? > > In GPT the sector number 1 contains the GPT headers, not > sector 0 as in the case of DOS boot partition. So if the block-size is > incorrect, the GPT magic number itself isn't found and the "E3404: Not a > bootable device!" error is immediately thrown. > > > > > > > > <SNIP> > > > > > Trying to load: from: /pci@800000020000000/scsi@3 ... > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > </SNIP> > > > > > > > > > > Change the "read-sector" Forth subroutine to throw an exception whenever > > > > > it fails to read a full block-size length of sector from the disk. > > > > > > > > Why not throwing an exception from the "beyond end" message point? > > > > Or fail to open a device if SLOF does not like the block size? I forgot the internals :( > > > This loop is interminable and this "Access beyond end of device!" > > > message continues forever. > > > > Where is that loop exactly? Put CATCH in there. > > That loop is in count-dos-logical-partitions. The reason why I didn't > put a CATCH in there is because there is already another CATCH statement > in do-load in slof/fs/boot.fs which covers this throw. For the other > path that doesn't have a CATCH I have inserted a CATCH in the open > subroutine in disk-label.fs. > > > > > > SLOF doesn't have any option other than to use the block-size that was > > > set in the logical_block_size parameter. It doesn't have any preference > > > as the code is very generic for both DOS as well as GPT. > > > > > > > > > Also change the "open" method to initiate CATCH exception handling for the calls to > > > > > try-partitions/try-files which will also call read-sector which could potentially > > > > > now throw this new exception. > > > > > > > > > > After making the above changes, it fails properly with the correct error > > > > > message as follows: > > > > > <SNIP> > > > > > Trying to load: from: /pci@800000020000000/scsi@3 ... > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > virtioblk_transfer: Access beyond end of device! > > > > > > > > > > E3404: Not a bootable device! > > > > > > > > > > E3407: Load failed > > > > > > > > > > Type 'boot' and press return to continue booting the system. > > > > > Type 'reset-all' and press return to reboot the system. > > > > > > > > > > Ready! > > > > > 0 > > > > > > </SNIP> > > > > > > > > > > Signed-off-by: Kautuk Consul <kconsul@linux.ibm.com> > > > > > --- > > > > > slof/fs/packages/disk-label.fs | 12 +++++++++--- > > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs > > > > > index 661c6b0..a6fb231 100644 > > > > > --- a/slof/fs/packages/disk-label.fs > > > > > +++ b/slof/fs/packages/disk-label.fs > > > > > @@ -136,7 +136,8 @@ CONSTANT /gpt-part-entry > > > > > : read-sector ( sector-number -- ) > > > > > \ block-size is 0x200 on disks, 0x800 on cdrom drives > > > > > block-size * 0 seek drop \ seek to sector > > > > > - block block-size read drop \ read sector > > > > > + block block-size read \ read sector > > > > > + block-size < IF throw THEN \ if we read less than the block-size then throw an exception > > > > > > > > When it fails, is the number of bytes ever non zero? Thanks, > > > No, it doesn't reach 0. It is lesser than the block-size. For example if > > > we set the logcial_block_size to 1024, the block-size is that much. if > > > we are reading the last sector which is physically only 512 bytes long > > > then we read that 512 bytes which is lesser than 1024, which should be > > > regarded as an error. > > > > Ah so it only happens when there is an odd number of 512 sectors so reading the last one with block-size==1024 only reads a half => failure, is that right? > > Yes. Or, block-size if set to 2048 or 4096 will also show the same problem. > > > > > > > > > > > > ; > > > > > > > > > > : (.part-entry) ( part-entry ) > > > > > @@ -723,10 +724,15 @@ CREATE GPT-LINUX-PARTITION 10 allot > > > > > THEN > > > > > > > > > > partition IF > > > > > - try-partitions > > > > > + ['] try-partitions > > > > > ELSE > > > > > - try-files > > > > > + ['] try-files > > > > > THEN > > > > > + > > > > > + \ Catch any exception that might happen due to read-sector failing to read > > > > > + \ block-size number of bytes from any sector of the disk. > > > > > + CATCH IF false THEN > > > Segher/Alexey, can we keep this CATCH block or should I remove it ? > > > > imho the original bug should be handled more gracefully. Seeing exceptions in the code just triggers exceptions in my small brain :) Thanks, > > :-). But this is the only other path that doesn't have a CATCH > like the do-load subroutine in slof/fs/boot.fs. According to Segher > there shouldn't ever be a problem with throw because if nothing else the > outer-most interpreter loop's CATCH will catch the exception. But I > thought to cover this throw in read-sector more locally in places near > to this functionality. Because the outermost FORTH SLOF interpreter loop is not > really so related to reading a sector in disk-label.fs. > Alexey/Segher, so what should be the next steps ? Do you find my explanation above okay or should I simply remove these CATCH blocks ? Putting a CATCH block in count-dos-logical-partitions is out of the question as there is already a CATCH in do-load in slof/fs/boot.fs. This CATCH block in the open subroutine is actually to prevent the exception to be caught at some non-local place in the code.
Hi, On 2024-04-16 10:03:33, Kautuk Consul wrote: > > > Alexey/Segher, so what should be the next steps ? > Do you find my explanation above okay or should I simply remove these > CATCH blocks ? Putting a CATCH block in count-dos-logical-partitions is > out of the question as there is already a CATCH in do-load in > slof/fs/boot.fs. This CATCH block in the open subroutine is actually to > prevent the exception to be caught at some non-local place in the code. Any ideas on how to proceed ? > _______________________________________________ > SLOF mailing list > SLOF@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/slof
Hi Alexey/Segher, > > :-). But this is the only other path that doesn't have a CATCH > > like the do-load subroutine in slof/fs/boot.fs. According to Segher > > there shouldn't ever be a problem with throw because if nothing else the > > outer-most interpreter loop's CATCH will catch the exception. But I > > thought to cover this throw in read-sector more locally in places near > > to this functionality. Because the outermost FORTH SLOF interpreter loop is not > > really so related to reading a sector in disk-label.fs. > > > Alexey/Segher, so what should be the next steps ? > Do you find my explanation above okay or should I simply remove these > CATCH blocks ? Putting a CATCH block in count-dos-logical-partitions is > out of the question as there is already a CATCH in do-load in > slof/fs/boot.fs. This CATCH block in the open subroutine is actually to > prevent the exception to be caught at some non-local place in the code. Any ideas on how we proceed with this now ? > _______________________________________________ > SLOF mailing list > SLOF@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/slof
Hi Alexey/Segher, > > :-). But this is the only other path that doesn't have a CATCH > > like the do-load subroutine in slof/fs/boot.fs. According to Segher > > there shouldn't ever be a problem with throw because if nothing else the > > outer-most interpreter loop's CATCH will catch the exception. But I > > thought to cover this throw in read-sector more locally in places near > > to this functionality. Because the outermost FORTH SLOF interpreter loop is not > > really so related to reading a sector in disk-label.fs. > > > Alexey/Segher, so what should be the next steps ? > Do you find my explanation above okay or should I simply remove these > CATCH blocks ? Putting a CATCH block in count-dos-logical-partitions is > out of the question as there is already a CATCH in do-load in > slof/fs/boot.fs. This CATCH block in the open subroutine is actually to > prevent the exception to be caught at some non-local place in the code. Any ideas on how we proceed with this now ?
On Tue, 14 May 2024, at 19:08, Kautuk Consul wrote: > Hi Alexey/Segher, > > > :-). But this is the only other path that doesn't have a CATCH > > > like the do-load subroutine in slof/fs/boot.fs. According to Segher > > > there shouldn't ever be a problem with throw because if nothing else the > > > outer-most interpreter loop's CATCH will catch the exception. But I > > > thought to cover this throw in read-sector more locally in places near > > > to this functionality. Because the outermost FORTH SLOF interpreter loop is not > > > really so related to reading a sector in disk-label.fs. > > > > > Alexey/Segher, so what should be the next steps ? > > Do you find my explanation above okay or should I simply remove these > > CATCH blocks ? Putting a CATCH block in count-dos-logical-partitions is > > out of the question as there is already a CATCH in do-load in > > slof/fs/boot.fs. This CATCH block in the open subroutine is actually to > > prevent the exception to be caught at some non-local place in the code. > > Any ideas on how we proceed with this now ? Ufff, I dropped the ball again :-/ Sorry but if read-sector cannot read a sector because of misconfiguration (not because some underlying hardware error) - this tells me that this should be handled when we open the block device which we knows the size of in sectors and if it is not an integer - barf there. Or it is not possible? In general, when you tweak libvirt xml like this - there are plenty of ways to break SLOF, this one is not worth of another exception throwing imho. Thanks,
Hi, On 2024-05-20 19:03:25, Alexey Kardashevskiy wrote: > > > On Tue, 14 May 2024, at 19:08, Kautuk Consul wrote: > > Hi Alexey/Segher, > > > > :-). But this is the only other path that doesn't have a CATCH > > > > like the do-load subroutine in slof/fs/boot.fs. According to Segher > > > > there shouldn't ever be a problem with throw because if nothing else the > > > > outer-most interpreter loop's CATCH will catch the exception. But I > > > > thought to cover this throw in read-sector more locally in places near > > > > to this functionality. Because the outermost FORTH SLOF interpreter loop is not > > > > really so related to reading a sector in disk-label.fs. > > > > > > > Alexey/Segher, so what should be the next steps ? > > > Do you find my explanation above okay or should I simply remove these > > > CATCH blocks ? Putting a CATCH block in count-dos-logical-partitions is > > > out of the question as there is already a CATCH in do-load in > > > slof/fs/boot.fs. This CATCH block in the open subroutine is actually to > > > prevent the exception to be caught at some non-local place in the code. > > > > Any ideas on how we proceed with this now ? > > > Ufff, I dropped the ball again :-/ > > Sorry but if read-sector cannot read a sector because of misconfiguration (not because some underlying hardware error) - this tells me that this should be handled when we open the block device which we knows the size of in sectors and if it is not an integer - barf there. Or it is not possible? In general, when you tweak libvirt xml like this - there are plenty of ways to break SLOF, this one is not worth of another exception throwing imho. Thanks, Sure, no issues. Just thought to send in this patch as we encountered this problem. Will abandon this email chain now. Thanks again! :-)
On Wed, 22 May 2024, at 18:55, Kautuk Consul wrote: > Hi, > > On 2024-05-20 19:03:25, Alexey Kardashevskiy wrote: > > > > > > On Tue, 14 May 2024, at 19:08, Kautuk Consul wrote: > > > Hi Alexey/Segher, > > > > > :-). But this is the only other path that doesn't have a CATCH > > > > > like the do-load subroutine in slof/fs/boot.fs. According to Segher > > > > > there shouldn't ever be a problem with throw because if nothing else the > > > > > outer-most interpreter loop's CATCH will catch the exception. But I > > > > > thought to cover this throw in read-sector more locally in places near > > > > > to this functionality. Because the outermost FORTH SLOF interpreter loop is not > > > > > really so related to reading a sector in disk-label.fs. > > > > > > > > > Alexey/Segher, so what should be the next steps ? > > > > Do you find my explanation above okay or should I simply remove these > > > > CATCH blocks ? Putting a CATCH block in count-dos-logical-partitions is > > > > out of the question as there is already a CATCH in do-load in > > > > slof/fs/boot.fs. This CATCH block in the open subroutine is actually to > > > > prevent the exception to be caught at some non-local place in the code. > > > > > > Any ideas on how we proceed with this now ? > > > > > > Ufff, I dropped the ball again :-/ > > > > Sorry but if read-sector cannot read a sector because of misconfiguration (not because some underlying hardware error) - this tells me that this should be handled when we open the block device which we knows the size of in sectors and if it is not an integer - barf there. Or it is not possible? In general, when you tweak libvirt xml like this - there are plenty of ways to break SLOF, this one is not worth of another exception throwing imho. Thanks, > > Sure, no issues. Just thought to send in this patch as we encountered > this problem. Will abandon this email chain now. Thanks again! :-) btw I was wondering what if you passed 513 as sector size, for example, have you tried? ;)
diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs index 661c6b0..a6fb231 100644 --- a/slof/fs/packages/disk-label.fs +++ b/slof/fs/packages/disk-label.fs @@ -136,7 +136,8 @@ CONSTANT /gpt-part-entry : read-sector ( sector-number -- ) \ block-size is 0x200 on disks, 0x800 on cdrom drives block-size * 0 seek drop \ seek to sector - block block-size read drop \ read sector + block block-size read \ read sector + block-size < IF throw THEN \ if we read less than the block-size then throw an exception ; : (.part-entry) ( part-entry ) @@ -723,10 +724,15 @@ CREATE GPT-LINUX-PARTITION 10 allot THEN partition IF - try-partitions + ['] try-partitions ELSE - try-files + ['] try-files THEN + + \ Catch any exception that might happen due to read-sector failing to read + \ block-size number of bytes from any sector of the disk. + CATCH IF false THEN + dup 0= IF debug-disk-label? IF ." not found." cr THEN close THEN \ free memory again ;
While testing with a qcow2 with a DOS boot partition it was found that when we set the logical_block_size in the guest XML to >512 then the boot would fail in the following interminable loop: <SNIP> Trying to load: from: /pci@800000020000000/scsi@3 ... virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! </SNIP> Change the "read-sector" Forth subroutine to throw an exception whenever it fails to read a full block-size length of sector from the disk. Also change the "open" method to initiate CATCH exception handling for the calls to try-partitions/try-files which will also call read-sector which could potentially now throw this new exception. After making the above changes, it fails properly with the correct error message as follows: <SNIP> Trying to load: from: /pci@800000020000000/scsi@3 ... virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! virtioblk_transfer: Access beyond end of device! E3404: Not a bootable device! E3407: Load failed Type 'boot' and press return to continue booting the system. Type 'reset-all' and press return to reboot the system. Ready! 0 > </SNIP> Signed-off-by: Kautuk Consul <kconsul@linux.ibm.com> --- slof/fs/packages/disk-label.fs | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)