Message ID | 20200317135916.13691-3-dja@axtens.net |
---|---|
State | Accepted |
Headers | show |
Series | Fix db-murdering API queries | expand |
Daniel Axtens <dja@axtens.net> writes: > The patch listing query is punishingly slow under even very simple > filters. > > The new data model in 3.0 will help _a lot_, so this is a simple fix: I > did try indexes but haven't got really deeply into the weeds of what we can do > with them. > > Move a number of things from select_related to prefetch_related: we trade off > one big, inefficient query for a slightly larger number of significantly more > efficient queries. > > On my laptop with 2 copies of the canonical kernel team list loaded into the > database, and considering only the API view (the JSON view is always faster) > with warm caches and considering the entire set of SQL queries: > > - /api/patches/?project=1 > ~1.4-1.5s -> <100ms, something like 14x better > > - /api/patches/?project=1&since=2010-11-01T00:00:00 > ~1.7-1.8s -> <560ms, something like 3x better (now dominated by the > counting query only invoked on the HTML API view, > not the pure JSON API view.) Sadly the performance of both requests drops away as you increase the OFFSET value (as the page number goes up). However, the difference vs the unpatched case is still good at ~2x. Not quite sure how we'd fix this, I'm sure the data model changes will at least help. Some very quick testing suggests an index over the date field is also helpful in for the request at all offsets. Regards, Daniel > > The things I moved: > > * project: this was generating SQL that looked like: > > INNER JOIN `patchwork_project` T5 > ON (`patchwork_submission`.`project_id` = T5.`id`) > > This is correct but we've already had to join the patchwork_submission > table and perhaps as a result it seems to be inefficient. > > * series__project: Likewise we've already had to join the series table, > doing another join is possibly why it is inefficient. > > * delegate: I do not know why this was tanking performance. I think it > might relate to the strategy mysql was using. > > Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> > Signed-off-by: Daniel Axtens <dja@axtens.net> > --- > patchwork/api/patch.py | 9 ++++++--- > patchwork/tests/api/test_patch.py | 4 ++-- > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/patchwork/api/patch.py b/patchwork/api/patch.py > index efa38e1f0788..000100cec6bd 100644 > --- a/patchwork/api/patch.py > +++ b/patchwork/api/patch.py > @@ -288,10 +288,13 @@ class PatchList(ListAPIView): > ordering = 'id' > > def get_queryset(self): > + # TODO(dja): we need to revisit this after the patch migration, paying > + # particular attention to cases with filtering > return Patch.objects.all()\ > - .prefetch_related('check_set', 'related__patches__project')\ > - .select_related('project', 'state', 'submitter', 'delegate', > - 'series__project')\ > + .prefetch_related('check_set', 'delegate', 'project', > + 'series__project', > + 'related__patches__project',)\ > + .select_related('state', 'submitter', 'series')\ > .defer('content', 'diff', 'headers') > > > diff --git a/patchwork/tests/api/test_patch.py b/patchwork/tests/api/test_patch.py > index aba92b92082e..b24c5ab28947 100644 > --- a/patchwork/tests/api/test_patch.py > +++ b/patchwork/tests/api/test_patch.py > @@ -211,11 +211,11 @@ class TestPatchAPI(utils.APITestCase): > self.assertNotIn('web_url', resp.data[0]) > > def test_list_bug_335(self): > - """Ensure we retrieve the embedded series project once.""" > + """Ensure we retrieve the embedded series project in O(1).""" > series = create_series() > create_patches(5, series=series) > > - with self.assertNumQueries(4): > + with self.assertNumQueries(7): > self.client.get(self.api_url()) > > @utils.store_samples('patch-detail') > -- > 2.20.1
diff --git a/patchwork/api/patch.py b/patchwork/api/patch.py index efa38e1f0788..000100cec6bd 100644 --- a/patchwork/api/patch.py +++ b/patchwork/api/patch.py @@ -288,10 +288,13 @@ class PatchList(ListAPIView): ordering = 'id' def get_queryset(self): + # TODO(dja): we need to revisit this after the patch migration, paying + # particular attention to cases with filtering return Patch.objects.all()\ - .prefetch_related('check_set', 'related__patches__project')\ - .select_related('project', 'state', 'submitter', 'delegate', - 'series__project')\ + .prefetch_related('check_set', 'delegate', 'project', + 'series__project', + 'related__patches__project',)\ + .select_related('state', 'submitter', 'series')\ .defer('content', 'diff', 'headers') diff --git a/patchwork/tests/api/test_patch.py b/patchwork/tests/api/test_patch.py index aba92b92082e..b24c5ab28947 100644 --- a/patchwork/tests/api/test_patch.py +++ b/patchwork/tests/api/test_patch.py @@ -211,11 +211,11 @@ class TestPatchAPI(utils.APITestCase): self.assertNotIn('web_url', resp.data[0]) def test_list_bug_335(self): - """Ensure we retrieve the embedded series project once.""" + """Ensure we retrieve the embedded series project in O(1).""" series = create_series() create_patches(5, series=series) - with self.assertNumQueries(4): + with self.assertNumQueries(7): self.client.get(self.api_url()) @utils.store_samples('patch-detail')
The patch listing query is punishingly slow under even very simple filters. The new data model in 3.0 will help _a lot_, so this is a simple fix: I did try indexes but haven't got really deeply into the weeds of what we can do with them. Move a number of things from select_related to prefetch_related: we trade off one big, inefficient query for a slightly larger number of significantly more efficient queries. On my laptop with 2 copies of the canonical kernel team list loaded into the database, and considering only the API view (the JSON view is always faster) with warm caches and considering the entire set of SQL queries: - /api/patches/?project=1 ~1.4-1.5s -> <100ms, something like 14x better - /api/patches/?project=1&since=2010-11-01T00:00:00 ~1.7-1.8s -> <560ms, something like 3x better (now dominated by the counting query only invoked on the HTML API view, not the pure JSON API view.) The things I moved: * project: this was generating SQL that looked like: INNER JOIN `patchwork_project` T5 ON (`patchwork_submission`.`project_id` = T5.`id`) This is correct but we've already had to join the patchwork_submission table and perhaps as a result it seems to be inefficient. * series__project: Likewise we've already had to join the series table, doing another join is possibly why it is inefficient. * delegate: I do not know why this was tanking performance. I think it might relate to the strategy mysql was using. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Signed-off-by: Daniel Axtens <dja@axtens.net> --- patchwork/api/patch.py | 9 ++++++--- patchwork/tests/api/test_patch.py | 4 ++-- 2 files changed, 8 insertions(+), 5 deletions(-)