diff mbox series

[2/4] REST: fix patch listing query

Message ID 20200317135916.13691-3-dja@axtens.net
State Accepted
Headers show
Series Fix db-murdering API queries | expand

Commit Message

Daniel Axtens March 17, 2020, 1:59 p.m. UTC
The patch listing query is punishingly slow under even very simple
filters.

The new data model in 3.0 will help _a lot_, so this is a simple fix: I
did try indexes but haven't got really deeply into the weeds of what we can do
with them.

Move a number of things from select_related to prefetch_related: we trade off
one big, inefficient query for a slightly larger number of significantly more
efficient queries.

On my laptop with 2 copies of the canonical kernel team list loaded into the
database, and considering only the API view (the JSON view is always faster)
with warm caches and considering the entire set of SQL queries:

 - /api/patches/?project=1
    ~1.4-1.5s -> <100ms, something like 14x better

 - /api/patches/?project=1&since=2010-11-01T00:00:00
    ~1.7-1.8s -> <560ms, something like 3x better (now dominated by the
                         counting query only invoked on the HTML API view,
                         not the pure JSON API view.)

The things I moved:

 * project: this was generating SQL that looked like:

   INNER JOIN `patchwork_project` T5
    ON (`patchwork_submission`.`project_id` = T5.`id`)

   This is correct but we've already had to join the patchwork_submission
   table and perhaps as a result it seems to be inefficient.

 * series__project: Likewise we've already had to join the series table,
   doing another join is possibly why it is inefficient.

 * delegate: I do not know why this was tanking performance. I think it
   might relate to the strategy mysql was using.

Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 patchwork/api/patch.py            | 9 ++++++---
 patchwork/tests/api/test_patch.py | 4 ++--
 2 files changed, 8 insertions(+), 5 deletions(-)

Comments

Daniel Axtens March 17, 2020, 2:16 p.m. UTC | #1
Daniel Axtens <dja@axtens.net> writes:

> The patch listing query is punishingly slow under even very simple
> filters.
>
> The new data model in 3.0 will help _a lot_, so this is a simple fix: I
> did try indexes but haven't got really deeply into the weeds of what we can do
> with them.
>
> Move a number of things from select_related to prefetch_related: we trade off
> one big, inefficient query for a slightly larger number of significantly more
> efficient queries.
>
> On my laptop with 2 copies of the canonical kernel team list loaded into the
> database, and considering only the API view (the JSON view is always faster)
> with warm caches and considering the entire set of SQL queries:
>
>  - /api/patches/?project=1
>     ~1.4-1.5s -> <100ms, something like 14x better
>
>  - /api/patches/?project=1&since=2010-11-01T00:00:00
>     ~1.7-1.8s -> <560ms, something like 3x better (now dominated by the
>                          counting query only invoked on the HTML API view,
>                          not the pure JSON API view.)

Sadly the performance of both requests drops away as you increase the
OFFSET value (as the page number goes up). However, the difference vs
the unpatched case is still good at ~2x.

Not quite sure how we'd fix this, I'm sure the data model changes will
at least help. Some very quick testing suggests an index over the date
field is also helpful in for the request at all offsets.

Regards,
Daniel
>
> The things I moved:
>
>  * project: this was generating SQL that looked like:
>
>    INNER JOIN `patchwork_project` T5
>     ON (`patchwork_submission`.`project_id` = T5.`id`)
>
>    This is correct but we've already had to join the patchwork_submission
>    table and perhaps as a result it seems to be inefficient.
>
>  * series__project: Likewise we've already had to join the series table,
>    doing another join is possibly why it is inefficient.
>
>  * delegate: I do not know why this was tanking performance. I think it
>    might relate to the strategy mysql was using.
>
> Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  patchwork/api/patch.py            | 9 ++++++---
>  patchwork/tests/api/test_patch.py | 4 ++--
>  2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/patchwork/api/patch.py b/patchwork/api/patch.py
> index efa38e1f0788..000100cec6bd 100644
> --- a/patchwork/api/patch.py
> +++ b/patchwork/api/patch.py
> @@ -288,10 +288,13 @@ class PatchList(ListAPIView):
>      ordering = 'id'
>  
>      def get_queryset(self):
> +        # TODO(dja): we need to revisit this after the patch migration, paying
> +        # particular attention to cases with filtering
>          return Patch.objects.all()\
> -            .prefetch_related('check_set', 'related__patches__project')\
> -            .select_related('project', 'state', 'submitter', 'delegate',
> -                            'series__project')\
> +            .prefetch_related('check_set', 'delegate', 'project',
> +                              'series__project',
> +                              'related__patches__project',)\
> +            .select_related('state', 'submitter', 'series')\
>              .defer('content', 'diff', 'headers')
>  
>  
> diff --git a/patchwork/tests/api/test_patch.py b/patchwork/tests/api/test_patch.py
> index aba92b92082e..b24c5ab28947 100644
> --- a/patchwork/tests/api/test_patch.py
> +++ b/patchwork/tests/api/test_patch.py
> @@ -211,11 +211,11 @@ class TestPatchAPI(utils.APITestCase):
>          self.assertNotIn('web_url', resp.data[0])
>  
>      def test_list_bug_335(self):
> -        """Ensure we retrieve the embedded series project once."""
> +        """Ensure we retrieve the embedded series project in O(1)."""
>          series = create_series()
>          create_patches(5, series=series)
>  
> -        with self.assertNumQueries(4):
> +        with self.assertNumQueries(7):
>              self.client.get(self.api_url())
>  
>      @utils.store_samples('patch-detail')
> -- 
> 2.20.1
diff mbox series

Patch

diff --git a/patchwork/api/patch.py b/patchwork/api/patch.py
index efa38e1f0788..000100cec6bd 100644
--- a/patchwork/api/patch.py
+++ b/patchwork/api/patch.py
@@ -288,10 +288,13 @@  class PatchList(ListAPIView):
     ordering = 'id'
 
     def get_queryset(self):
+        # TODO(dja): we need to revisit this after the patch migration, paying
+        # particular attention to cases with filtering
         return Patch.objects.all()\
-            .prefetch_related('check_set', 'related__patches__project')\
-            .select_related('project', 'state', 'submitter', 'delegate',
-                            'series__project')\
+            .prefetch_related('check_set', 'delegate', 'project',
+                              'series__project',
+                              'related__patches__project',)\
+            .select_related('state', 'submitter', 'series')\
             .defer('content', 'diff', 'headers')
 
 
diff --git a/patchwork/tests/api/test_patch.py b/patchwork/tests/api/test_patch.py
index aba92b92082e..b24c5ab28947 100644
--- a/patchwork/tests/api/test_patch.py
+++ b/patchwork/tests/api/test_patch.py
@@ -211,11 +211,11 @@  class TestPatchAPI(utils.APITestCase):
         self.assertNotIn('web_url', resp.data[0])
 
     def test_list_bug_335(self):
-        """Ensure we retrieve the embedded series project once."""
+        """Ensure we retrieve the embedded series project in O(1)."""
         series = create_series()
         create_patches(5, series=series)
 
-        with self.assertNumQueries(4):
+        with self.assertNumQueries(7):
             self.client.get(self.api_url())
 
     @utils.store_samples('patch-detail')