diff mbox series

[RFC,5/8] tests_pytest: Implement fetch_asset() method for downloading assets

Message ID 20240711115546.40859-6-thuth@redhat.com
State New
Headers show
Series Convert avocado tests to normal Python unittests | expand

Commit Message

Thomas Huth July 11, 2024, 11:55 a.m. UTC
In the pytests, we cannot use the fetch_asset() function from Avocado
anymore, so we have to provide our own implementation now instead.
Thus add such a function based on the _download_with_cache() function
from tests/vm/basevm.py for this purpose.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 tests/pytest/qemu_pytest/__init__.py | 40 ++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 11 deletions(-)

Comments

Richard Henderson July 11, 2024, 4:45 p.m. UTC | #1
On 7/11/24 04:55, Thomas Huth wrote:
> +    def fetch_asset(self, url, asset_hash):
> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
> +        if not os.path.exists(cache_dir):
> +            os.makedirs(cache_dir)
> +        fname = os.path.join(cache_dir,
> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
> +            return fname
> +        logging.debug("Downloading %s to %s...", url, fname)
> +        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
> +        os.rename(fname + ".download", fname)
> +        return fname

Download failure via exception?
Check hash on downloaded asset?


r~
Richard Henderson July 11, 2024, 6:49 p.m. UTC | #2
On 7/11/24 09:45, Richard Henderson wrote:
> On 7/11/24 04:55, Thomas Huth wrote:
>> +    def fetch_asset(self, url, asset_hash):
>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>> +        if not os.path.exists(cache_dir):
>> +            os.makedirs(cache_dir)
>> +        fname = os.path.join(cache_dir,
>> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>> +            return fname
>> +        logging.debug("Downloading %s to %s...", url, fname)
>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
>> +        os.rename(fname + ".download", fname)
>> +        return fname
> 
> Download failure via exception?
> Check hash on downloaded asset?

I would prefer to see assets, particularly downloading, handled in a separate pass from tests.

(1) Asset download should not count against test timeout.
(2) Running tests while disconnected should skip unavailable assets.

Avocado kinda does this, but still generates errors instead of skips.


r~
Alex Bennée July 11, 2024, 7:23 p.m. UTC | #3
Richard Henderson <richard.henderson@linaro.org> writes:

> On 7/11/24 09:45, Richard Henderson wrote:
>> On 7/11/24 04:55, Thomas Huth wrote:
>>> +    def fetch_asset(self, url, asset_hash):
>>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>>> +        if not os.path.exists(cache_dir):
>>> +            os.makedirs(cache_dir)
>>> +        fname = os.path.join(cache_dir,
>>> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
>>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>>> +            return fname
>>> +        logging.debug("Downloading %s to %s...", url, fname)
>>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
>>> +        os.rename(fname + ".download", fname)
>>> +        return fname
>> Download failure via exception?
>> Check hash on downloaded asset?
>
> I would prefer to see assets, particularly downloading, handled in a
> separate pass from tests.

And I assume cachable?

>
> (1) Asset download should not count against test timeout.
> (2) Running tests while disconnected should skip unavailable assets.
>
> Avocado kinda does this, but still generates errors instead of skips.
>
>
> r~
Richard Henderson July 11, 2024, 9:35 p.m. UTC | #4
On 7/11/24 12:23, Alex Bennée wrote:
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> On 7/11/24 09:45, Richard Henderson wrote:
>>> On 7/11/24 04:55, Thomas Huth wrote:
>>>> +    def fetch_asset(self, url, asset_hash):
>>>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>>>> +        if not os.path.exists(cache_dir):
>>>> +            os.makedirs(cache_dir)
>>>> +        fname = os.path.join(cache_dir,
>>>> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
>>>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>>>> +            return fname
>>>> +        logging.debug("Downloading %s to %s...", url, fname)
>>>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
>>>> +        os.rename(fname + ".download", fname)
>>>> +        return fname
>>> Download failure via exception?
>>> Check hash on downloaded asset?
>>
>> I would prefer to see assets, particularly downloading, handled in a
>> separate pass from tests.
> 
> And I assume cachable?

The cache is already handled here.  But downloading after cache miss is non-optional, may 
not fail, and is accounted against the meson test timeout.


r~
Thomas Huth July 12, 2024, 4:18 a.m. UTC | #5
On 11/07/2024 18.45, Richard Henderson wrote:
> On 7/11/24 04:55, Thomas Huth wrote:
>> +    def fetch_asset(self, url, asset_hash):
>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>> +        if not os.path.exists(cache_dir):
>> +            os.makedirs(cache_dir)
>> +        fname = os.path.join(cache_dir,
>> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>> +            return fname
>> +        logging.debug("Downloading %s to %s...", url, fname)
>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + 
>> ".download"])
>> +        os.rename(fname + ".download", fname)
>> +        return fname
> 
> Download failure via exception?
> Check hash on downloaded asset?

Yes, you're right, I'll add that. But that means it's also missing from 
_download_with_cache in tests/vm/basevm.py so far...

  Thomas
Thomas Huth July 12, 2024, 4:21 a.m. UTC | #6
On 11/07/2024 20.49, Richard Henderson wrote:
> On 7/11/24 09:45, Richard Henderson wrote:
>> On 7/11/24 04:55, Thomas Huth wrote:
>>> +    def fetch_asset(self, url, asset_hash):
>>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>>> +        if not os.path.exists(cache_dir):
>>> +            os.makedirs(cache_dir)
>>> +        fname = os.path.join(cache_dir,
>>> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
>>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>>> +            return fname
>>> +        logging.debug("Downloading %s to %s...", url, fname)
>>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + 
>>> ".download"])
>>> +        os.rename(fname + ".download", fname)
>>> +        return fname
>>
>> Download failure via exception?
>> Check hash on downloaded asset?
> 
> I would prefer to see assets, particularly downloading, handled in a 
> separate pass from tests.
> 
> (1) Asset download should not count against test timeout.
> (2) Running tests while disconnected should skip unavailable assets.

Sounds good, but honestly, that's above my primitive python-fu. I guess that 
would need some kind of introspection of the tests to retrieve the 
information what they want to download?

  Thomas
Thomas Huth July 12, 2024, 4:24 a.m. UTC | #7
On 11/07/2024 23.35, Richard Henderson wrote:
> On 7/11/24 12:23, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>>
>>> On 7/11/24 09:45, Richard Henderson wrote:
>>>> On 7/11/24 04:55, Thomas Huth wrote:
>>>>> +    def fetch_asset(self, url, asset_hash):
>>>>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>>>>> +        if not os.path.exists(cache_dir):
>>>>> +            os.makedirs(cache_dir)
>>>>> +        fname = os.path.join(cache_dir,
>>>>> +                             
>>>>> hashlib.sha1(url.encode("utf-8")).hexdigest())
>>>>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>>>>> +            return fname
>>>>> +        logging.debug("Downloading %s to %s...", url, fname)
>>>>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + 
>>>>> ".download"])
>>>>> +        os.rename(fname + ".download", fname)
>>>>> +        return fname
>>>> Download failure via exception?
>>>> Check hash on downloaded asset?
>>>
>>> I would prefer to see assets, particularly downloading, handled in a
>>> separate pass from tests.
>>
>> And I assume cachable?
> 
> The cache is already handled here.  But downloading after cache miss is 
> non-optional, may not fail, and is accounted against the meson test timeout.

Unless someone has a really good idea how to implement that download before 
running the tests, I think we can start by simply giving enough headroom in 
the initial timeout settings. We can then hopefully easily fine-tune later - 
since this time the framework is under our control, so we don't have to sync 
in a cumbersome way with the avocado test runner any more.

  Thomas
Daniel P. Berrangé July 12, 2024, 9:09 a.m. UTC | #8
On Thu, Jul 11, 2024 at 01:55:43PM +0200, Thomas Huth wrote:
> In the pytests, we cannot use the fetch_asset() function from Avocado
> anymore, so we have to provide our own implementation now instead.
> Thus add such a function based on the _download_with_cache() function
> from tests/vm/basevm.py for this purpose.
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  tests/pytest/qemu_pytest/__init__.py | 40 ++++++++++++++++++++--------
>  1 file changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/tests/pytest/qemu_pytest/__init__.py b/tests/pytest/qemu_pytest/__init__.py
> index e3ed32e3de..73d80b3828 100644
> --- a/tests/pytest/qemu_pytest/__init__.py
> +++ b/tests/pytest/qemu_pytest/__init__.py
> @@ -11,6 +11,7 @@
>  # This work is licensed under the terms of the GNU GPL, version 2 or
>  # later.  See the COPYING file in the top-level directory.
>  
> +import hashlib
>  import logging
>  import os
>  import shutil
> @@ -201,17 +202,34 @@ def setUp(self, bin_prefix):
>          self.assertIsNotNone(SOURCE_DIR,'PYTEST_SOURCE_ROOT must be set')
>          self.assertIsNotNone(self.qemu_bin, 'PYTEST_QEMU_BINARY must be set')
>  
> -    def fetch_asset(self, name,
> -                    asset_hash, algorithm=None,
> -                    locations=None, expire=None,
> -                    find_only=False, cancel_on_missing=True):
> -        return super().fetch_asset(name,
> -                        asset_hash=asset_hash,
> -                        algorithm=algorithm,
> -                        locations=locations,
> -                        expire=expire,
> -                        find_only=find_only,
> -                        cancel_on_missing=cancel_on_missing)
> +    def check_hash(self, file_name, expected_hash):
> +        if not expected_hash:
> +            return True
> +        if len(expected_hash) == 32:
> +            sum_prog = 'md5sum'
> +        elif len(expected_hash) == 40:
> +            sum_prog = 'sha1sum'
> +        elif len(expected_hash) == 64:
> +            sum_prog = 'sha256sum'
> +        elif len(expected_hash) == 128:
> +            sum_prog = 'sha512sum'
> +        else:
> +            raise Exception("unknown hash type")

Why shouldn't we just standardize on sha256 as we convert each test
to pytest ? sha512 is overkill, and md5/sha1 shouldn't really be used
anymore.

> +        checksum = subprocess.check_output([sum_prog, file_name]).split()[0]
> +        return expected_hash == checksum.decode("utf-8")
> +
> +    def fetch_asset(self, url, asset_hash):
> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
> +        if not os.path.exists(cache_dir):
> +            os.makedirs(cache_dir)
> +        fname = os.path.join(cache_dir,
> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
> +            return fname
> +        logging.debug("Downloading %s to %s...", url, fname)
> +        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
> +        os.rename(fname + ".download", fname)
> +        return fname

To avoid a dep on an external command that may not be installed,
I think we could replace wget with native python code:

 import urllib
 from shutil import copyfileobj
 
 with urllib.request.urlopen(url) as src:
    with open(fname + ".download", "w+") as dst
       copyfileobj(src, dst)


With regards,
Daniel
Thomas Huth July 12, 2024, 9:26 a.m. UTC | #9
On 12/07/2024 11.09, Daniel P. Berrangé wrote:
> On Thu, Jul 11, 2024 at 01:55:43PM +0200, Thomas Huth wrote:
>> In the pytests, we cannot use the fetch_asset() function from Avocado
>> anymore, so we have to provide our own implementation now instead.
>> Thus add such a function based on the _download_with_cache() function
>> from tests/vm/basevm.py for this purpose.
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> ---
>>   tests/pytest/qemu_pytest/__init__.py | 40 ++++++++++++++++++++--------
>>   1 file changed, 29 insertions(+), 11 deletions(-)
>>
>> diff --git a/tests/pytest/qemu_pytest/__init__.py b/tests/pytest/qemu_pytest/__init__.py
>> index e3ed32e3de..73d80b3828 100644
>> --- a/tests/pytest/qemu_pytest/__init__.py
>> +++ b/tests/pytest/qemu_pytest/__init__.py
>> @@ -11,6 +11,7 @@
>>   # This work is licensed under the terms of the GNU GPL, version 2 or
>>   # later.  See the COPYING file in the top-level directory.
>>   
>> +import hashlib
>>   import logging
>>   import os
>>   import shutil
>> @@ -201,17 +202,34 @@ def setUp(self, bin_prefix):
>>           self.assertIsNotNone(SOURCE_DIR,'PYTEST_SOURCE_ROOT must be set')
>>           self.assertIsNotNone(self.qemu_bin, 'PYTEST_QEMU_BINARY must be set')
>>   
>> -    def fetch_asset(self, name,
>> -                    asset_hash, algorithm=None,
>> -                    locations=None, expire=None,
>> -                    find_only=False, cancel_on_missing=True):
>> -        return super().fetch_asset(name,
>> -                        asset_hash=asset_hash,
>> -                        algorithm=algorithm,
>> -                        locations=locations,
>> -                        expire=expire,
>> -                        find_only=find_only,
>> -                        cancel_on_missing=cancel_on_missing)
>> +    def check_hash(self, file_name, expected_hash):
>> +        if not expected_hash:
>> +            return True
>> +        if len(expected_hash) == 32:
>> +            sum_prog = 'md5sum'
>> +        elif len(expected_hash) == 40:
>> +            sum_prog = 'sha1sum'
>> +        elif len(expected_hash) == 64:
>> +            sum_prog = 'sha256sum'
>> +        elif len(expected_hash) == 128:
>> +            sum_prog = 'sha512sum'
>> +        else:
>> +            raise Exception("unknown hash type")
> 
> Why shouldn't we just standardize on sha256 as we convert each test
> to pytest ? sha512 is overkill, and md5/sha1 shouldn't really be used
> anymore.

I mainly added that for minimizing the changes that I need to do on the 
existing tests. Updating all the hashsums there is certainly some additional 
work... If you want to help, feel free to send patches for the existing 
avocado tests to update the md5 and sha1 sums there!

>> +        checksum = subprocess.check_output([sum_prog, file_name]).split()[0]
>> +        return expected_hash == checksum.decode("utf-8")
>> +
>> +    def fetch_asset(self, url, asset_hash):
>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>> +        if not os.path.exists(cache_dir):
>> +            os.makedirs(cache_dir)
>> +        fname = os.path.join(cache_dir,
>> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>> +            return fname
>> +        logging.debug("Downloading %s to %s...", url, fname)
>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
>> +        os.rename(fname + ".download", fname)
>> +        return fname
> 
> To avoid a dep on an external command that may not be installed,
> I think we could replace wget with native python code:
> 
>   import urllib
...

Yes, I came across urllib after I sent out the patches, too, sounds like the 
right way to go. We should also update tests/vm/basevm.py accordingly!

  Thomas
diff mbox series

Patch

diff --git a/tests/pytest/qemu_pytest/__init__.py b/tests/pytest/qemu_pytest/__init__.py
index e3ed32e3de..73d80b3828 100644
--- a/tests/pytest/qemu_pytest/__init__.py
+++ b/tests/pytest/qemu_pytest/__init__.py
@@ -11,6 +11,7 @@ 
 # This work is licensed under the terms of the GNU GPL, version 2 or
 # later.  See the COPYING file in the top-level directory.
 
+import hashlib
 import logging
 import os
 import shutil
@@ -201,17 +202,34 @@  def setUp(self, bin_prefix):
         self.assertIsNotNone(SOURCE_DIR,'PYTEST_SOURCE_ROOT must be set')
         self.assertIsNotNone(self.qemu_bin, 'PYTEST_QEMU_BINARY must be set')
 
-    def fetch_asset(self, name,
-                    asset_hash, algorithm=None,
-                    locations=None, expire=None,
-                    find_only=False, cancel_on_missing=True):
-        return super().fetch_asset(name,
-                        asset_hash=asset_hash,
-                        algorithm=algorithm,
-                        locations=locations,
-                        expire=expire,
-                        find_only=find_only,
-                        cancel_on_missing=cancel_on_missing)
+    def check_hash(self, file_name, expected_hash):
+        if not expected_hash:
+            return True
+        if len(expected_hash) == 32:
+            sum_prog = 'md5sum'
+        elif len(expected_hash) == 40:
+            sum_prog = 'sha1sum'
+        elif len(expected_hash) == 64:
+            sum_prog = 'sha256sum'
+        elif len(expected_hash) == 128:
+            sum_prog = 'sha512sum'
+        else:
+            raise Exception("unknown hash type")
+        checksum = subprocess.check_output([sum_prog, file_name]).split()[0]
+        return expected_hash == checksum.decode("utf-8")
+
+    def fetch_asset(self, url, asset_hash):
+        cache_dir = os.path.expanduser("~/.cache/qemu/download")
+        if not os.path.exists(cache_dir):
+            os.makedirs(cache_dir)
+        fname = os.path.join(cache_dir,
+                             hashlib.sha1(url.encode("utf-8")).hexdigest())
+        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
+            return fname
+        logging.debug("Downloading %s to %s...", url, fname)
+        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
+        os.rename(fname + ".download", fname)
+        return fname
 
 
 class QemuSystemTest(QemuBaseTest):