Message ID | 20110617163838.GB27800@thinkpad-t410 |
---|---|
State | New |
Headers | show |
On 06/17/2011 10:38 AM, Seth Forshee wrote: > This is a rather large series of non-trivial backports to fix a Xorg > hang caused by the i915 driver. Patches 5 and 6 are the key patches that > implement an LRU-based eviction algorithm; patches 1-4 are > prerequisites, patch 7 fixes some of the clean-up, and patch 8 adds a > periodic flush to retire active buffers. > > This is a pretty severe bug for those affected by it, but it's also a > huge amount of change for a stable kernel, so let me know if it's beyond > hope for an SRU. > > SRU Justification: > > Impact: When the GPU aperture becomes sufficiently fragmented two or > more mmapped buffers in active use can repeatedly push each other out > of the aperture. This results in a sort of livelock situation, with > Xorg appearing to be hung with high CPU utilization. > > Fix: Backport of a series of patches that convert the i915 eviction > algorithm from a best-fit approach to one based on evicting the least > recently used objects and a patch that adds a periodic flush requests > to retire active buffers when no client is active. > > Testcase: Without these patches this scenario can be triggered > ocassionally when visiting certain web pages that utilize Flash or > viewing certain images in Firefox, as described on bugs #599017 and > #330460. Testing the same pages/images with these patches does not > induce the hang. > > > The following changes since commit 9cc333d02cb4620561e91a63ed86a2159ec32638: > > UBUNTU: Ubuntu-2.6.32-33.67 (2011-06-16 11:26:04 -0500) > > are available in the git repository at: > git://kernel.ubuntu.com/sforshee/ubuntu-lucid.git lp599017 > > Chris Wilson (5): > drm/i915: Move the eviction logic to its own file. > drm/i915: Implement fair lru eviction across both rings. (v2) > drm/i915: Maintain LRU order of inactive objects upon access by CPU (v2) > drm/i915/evict: Ensure we completely cleanup on failure > drm/i915: Periodically flush the active lists and requests > > Daniel Vetter (3): > drm_mm: extract check_free_mm_node > drm: implement helper functions for scanning lru list > drm/i915: prepare for fair lru eviction > > drivers/gpu/drm/drm_mm.c | 236 +++++++++++++++++++++++++----- > drivers/gpu/drm/i915/Makefile | 1 + > drivers/gpu/drm/i915/i915_drv.h | 13 ++ > drivers/gpu/drm/i915/i915_gem.c | 228 ++++------------------------- > drivers/gpu/drm/i915/i915_gem_evict.c | 253 +++++++++++++++++++++++++++++++++ > include/drm/drm_mm.h | 15 ++- > 6 files changed, 510 insertions(+), 236 deletions(-) > create mode 100644 drivers/gpu/drm/i915/i915_gem_evict.c > I wouldn't do this for a non-LTS release, but I think this case is warranted since the test results look so good. Now we just have to make sure there are no regressions since some of the backport patches differ significantly from upstream. Acked-by: Tim Gardner <tim.gardner@canonical.com>
On 17.06.2011 18:38, Seth Forshee wrote: > This is a rather large series of non-trivial backports to fix a Xorg > hang caused by the i915 driver. Patches 5 and 6 are the key patches that > implement an LRU-based eviction algorithm; patches 1-4 are > prerequisites, patch 7 fixes some of the clean-up, and patch 8 adds a > periodic flush to retire active buffers. > > This is a pretty severe bug for those affected by it, but it's also a > huge amount of change for a stable kernel, so let me know if it's beyond > hope for an SRU. > > SRU Justification: > > Impact: When the GPU aperture becomes sufficiently fragmented two or > more mmapped buffers in active use can repeatedly push each other out > of the aperture. This results in a sort of livelock situation, with > Xorg appearing to be hung with high CPU utilization. > > Fix: Backport of a series of patches that convert the i915 eviction > algorithm from a best-fit approach to one based on evicting the least > recently used objects and a patch that adds a periodic flush requests > to retire active buffers when no client is active. > > Testcase: Without these patches this scenario can be triggered > ocassionally when visiting certain web pages that utilize Flash or > viewing certain images in Firefox, as described on bugs #599017 and > #330460. Testing the same pages/images with these patches does not > induce the hang. > > > The following changes since commit 9cc333d02cb4620561e91a63ed86a2159ec32638: > > UBUNTU: Ubuntu-2.6.32-33.67 (2011-06-16 11:26:04 -0500) > > are available in the git repository at: > git://kernel.ubuntu.com/sforshee/ubuntu-lucid.git lp599017 > > Chris Wilson (5): > drm/i915: Move the eviction logic to its own file. > drm/i915: Implement fair lru eviction across both rings. (v2) > drm/i915: Maintain LRU order of inactive objects upon access by CPU (v2) > drm/i915/evict: Ensure we completely cleanup on failure > drm/i915: Periodically flush the active lists and requests > > Daniel Vetter (3): > drm_mm: extract check_free_mm_node > drm: implement helper functions for scanning lru list > drm/i915: prepare for fair lru eviction > > drivers/gpu/drm/drm_mm.c | 236 +++++++++++++++++++++++++----- > drivers/gpu/drm/i915/Makefile | 1 + > drivers/gpu/drm/i915/i915_drv.h | 13 ++ > drivers/gpu/drm/i915/i915_gem.c | 228 ++++------------------------- > drivers/gpu/drm/i915/i915_gem_evict.c | 253 +++++++++++++++++++++++++++++++++ > include/drm/drm_mm.h | 15 ++- > 6 files changed, 510 insertions(+), 236 deletions(-) > create mode 100644 drivers/gpu/drm/i915/i915_gem_evict.c > I also tend to say the bug sounds serious enough to make an exception to the rule of limiting the change. I would probably add those to the 2.6.32+drm33 tree. Though there are two details which I think should be resolved before: 1. In the bug report (#599017) the last request for testing points to a place which only shows 6 patches. Does the kernel provided actually contain all of them? 2. It would be good to get some feedback about the test kernel in the bug.
On Wed, Jun 22, 2011 at 04:50:34PM +0200, Stefan Bader wrote: > On 17.06.2011 18:38, Seth Forshee wrote: > > This is a rather large series of non-trivial backports to fix a Xorg > > hang caused by the i915 driver. Patches 5 and 6 are the key patches that > > implement an LRU-based eviction algorithm; patches 1-4 are > > prerequisites, patch 7 fixes some of the clean-up, and patch 8 adds a > > periodic flush to retire active buffers. > > > > This is a pretty severe bug for those affected by it, but it's also a > > huge amount of change for a stable kernel, so let me know if it's beyond > > hope for an SRU. > > > > SRU Justification: > > > > Impact: When the GPU aperture becomes sufficiently fragmented two or > > more mmapped buffers in active use can repeatedly push each other out > > of the aperture. This results in a sort of livelock situation, with > > Xorg appearing to be hung with high CPU utilization. > > > > Fix: Backport of a series of patches that convert the i915 eviction > > algorithm from a best-fit approach to one based on evicting the least > > recently used objects and a patch that adds a periodic flush requests > > to retire active buffers when no client is active. > > > > Testcase: Without these patches this scenario can be triggered > > ocassionally when visiting certain web pages that utilize Flash or > > viewing certain images in Firefox, as described on bugs #599017 and > > #330460. Testing the same pages/images with these patches does not > > induce the hang. > > > > > > The following changes since commit 9cc333d02cb4620561e91a63ed86a2159ec32638: > > > > UBUNTU: Ubuntu-2.6.32-33.67 (2011-06-16 11:26:04 -0500) > > > > are available in the git repository at: > > git://kernel.ubuntu.com/sforshee/ubuntu-lucid.git lp599017 > > > > Chris Wilson (5): > > drm/i915: Move the eviction logic to its own file. > > drm/i915: Implement fair lru eviction across both rings. (v2) > > drm/i915: Maintain LRU order of inactive objects upon access by CPU (v2) > > drm/i915/evict: Ensure we completely cleanup on failure > > drm/i915: Periodically flush the active lists and requests > > > > Daniel Vetter (3): > > drm_mm: extract check_free_mm_node > > drm: implement helper functions for scanning lru list > > drm/i915: prepare for fair lru eviction > > > > drivers/gpu/drm/drm_mm.c | 236 +++++++++++++++++++++++++----- > > drivers/gpu/drm/i915/Makefile | 1 + > > drivers/gpu/drm/i915/i915_drv.h | 13 ++ > > drivers/gpu/drm/i915/i915_gem.c | 228 ++++------------------------- > > drivers/gpu/drm/i915/i915_gem_evict.c | 253 +++++++++++++++++++++++++++++++++ > > include/drm/drm_mm.h | 15 ++- > > 6 files changed, 510 insertions(+), 236 deletions(-) > > create mode 100644 drivers/gpu/drm/i915/i915_gem_evict.c > > > > I also tend to say the bug sounds serious enough to make an exception to the > rule of limiting the change. > > I would probably add those to the 2.6.32+drm33 tree. Though there are two > details which I think should be resolved before: > 1. In the bug report (#599017) the last request for testing points to a place > which only shows 6 patches. Does the kernel provided actually contain all of them? No, I guess I didn't post a link to that build on the bug, although I just did so. I have an affected machine though and have tested the full series myself for several days. I was able to trigger the bug very ocassionally without the last patch but have never triggered it with that patch. > 2. It would be good to get some feedback about the test kernel in the bug. I have received some offline feedback, but I'll ask for the feedback to be posted to the bug.