From patchwork Tue May 5 08:32:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283326 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GXzQ2L3Sz9sSx; Tue, 5 May 2020 18:32:50 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt0N-0003L6-A1; Tue, 05 May 2020 08:32:43 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0L-0003KB-F8 for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:41 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0L-0004sj-4V for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:41 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 01/26] drm/i915/tgl/perf: use the same oa ctx_id format as icl Date: Tue, 5 May 2020 11:32:14 +0300 Message-Id: <20200505083239.2428948-2-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Michel Thierry BugLink: https://bugs.launchpad.net/bugs/1871957 Compared to Icelake, Tigerlake's MAX_CONTEXT_HW_ID is smaller by one, but since we just use the upper 32 bits of the lrc_desc, it's guaranteed OA will use the correct one. Cc: Lionel Landwerlin Signed-off-by: Michel Thierry Signed-off-by: Lucas De Marchi Reviewed-by: Umesh Nerlige Ramappa Link: https://patchwork.freedesktop.org/patch/msgid/20190823082055.5992-19-lucas.demarchi@intel.com (cherry picked from commit 45e9c829ebeaa13b3b999f76963b9920d2147128) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index e42b86827d6b..2c9f46e12622 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1299,7 +1299,8 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) } break; - case 11: { + case 11: + case 12: { stream->specific_ctx_id_mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32) | ((1U << GEN11_ENGINE_INSTANCE_WIDTH) - 1) << (GEN11_ENGINE_INSTANCE_SHIFT - 32) | From patchwork Tue May 5 08:32:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283343 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0Q4DnZz9sSx; Tue, 5 May 2020 18:33:42 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt1D-00040I-9r; Tue, 05 May 2020 08:33:35 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0L-0003KP-TB for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:41 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0L-0004sj-GK for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:41 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 02/26] drm/i915/perf: Assert locking for i915_init_oa_perf_state() Date: Tue, 5 May 2020 11:32:15 +0300 Message-Id: <20200505083239.2428948-3-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Chris Wilson BugLink: https://bugs.launchpad.net/bugs/1871957 We use the context->pin_mutex to serialise updates to the OA config and the registers values written into each new context. Document this relationship and assert we do hold the context->pin_mutex as used by gen8_configure_all_contexts() to serialise updates to the OA config itself. v2: Add a white-lie for when we call intel_gt_resume() from init. v3: Lie while we have the context pinned inside atomic reset. Signed-off-by: Chris Wilson Cc: Lionel Landwerlin Reviewed-by: Lionel Landwerlin #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20190830181929.18663-1-chris@chris-wilson.co.uk (cherry picked from commit dffa8feb308455f9b3ce0eeb55a4eac3afc0786b) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 7 ++++++- drivers/gpu/drm/i915/gt/intel_lrc.c | 10 ++++++++++ drivers/gpu/drm/i915/i915_perf.c | 3 +++ 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index fac75afed35b..8b1337f24123 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -6,6 +6,7 @@ #include "i915_drv.h" #include "i915_params.h" +#include "intel_context.h" #include "intel_engine_pm.h" #include "intel_gt.h" #include "intel_gt_pm.h" @@ -150,8 +151,12 @@ int intel_gt_resume(struct intel_gt *gt) intel_engine_pm_get(engine); ce = engine->kernel_context; - if (ce) + if (ce) { + GEM_BUG_ON(!intel_context_is_pinned(ce)); + mutex_acquire(&ce->pin_mutex.dep_map, 0, 0, _THIS_IP_); ce->ops->reset(ce); + mutex_release(&ce->pin_mutex.dep_map, 0, _THIS_IP_); + } engine->serial++; /* kernel context lost */ err = engine->resume(engine); diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 58916adadbea..9b4845ae4bc9 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -2495,6 +2495,11 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) ce = rq->hw_context; GEM_BUG_ON(i915_active_is_idle(&ce->active)); GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); + + /* Proclaim we have exclusive access to the context image! */ + GEM_BUG_ON(!intel_context_is_pinned(ce)); + mutex_acquire(&ce->pin_mutex.dep_map, 2, 0, _THIS_IP_); + rq = active_request(rq); if (!rq) { ce->ring->head = ce->ring->tail; @@ -2554,6 +2559,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) engine->name, ce->ring->head, ce->ring->tail); intel_ring_update_space(ce->ring); __execlists_update_reg_state(ce, engine); + mutex_release(&ce->pin_mutex.dep_map, 0, _THIS_IP_); unwind: /* Push back any incomplete requests for replay after the reset. */ @@ -4014,6 +4020,9 @@ void intel_lr_context_reset(struct intel_engine_cs *engine, u32 head, bool scrub) { + GEM_BUG_ON(!intel_context_is_pinned(ce)); + mutex_acquire(&ce->pin_mutex.dep_map, 2, 0, _THIS_IP_); + /* * We want a simple context + ring to execute the breadcrumb update. * We cannot rely on the context being intact across the GPU hang, @@ -4038,6 +4047,7 @@ void intel_lr_context_reset(struct intel_engine_cs *engine, intel_ring_update_space(ce->ring); __execlists_update_reg_state(ce, engine); + mutex_release(&ce->pin_mutex.dep_map, 0, _THIS_IP_); } #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 2c9f46e12622..c1b764233761 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2305,6 +2305,9 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine, { struct i915_perf_stream *stream; + /* perf.exclusive_stream serialised by gen8_configure_all_contexts() */ + lockdep_assert_held(&ce->pin_mutex); + if (engine->class != RENDER_CLASS) return; From patchwork Tue May 5 08:32:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283327 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GXzQ4drNz9sT0; Tue, 5 May 2020 18:32:50 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt0O-0003Mc-S2; Tue, 05 May 2020 08:32:44 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0M-0003Kg-Bh for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:42 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0L-0004sj-Se for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:42 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 03/26] drm/i915/perf: move perf types to their own header Date: Tue, 5 May 2020 11:32:16 +0300 Message-Id: <20200505083239.2428948-4-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 Following a pattern used throughout the driver. Signed-off-by: Lionel Landwerlin Reviewed-by: Umesh Nerlige Ramappa Link: https://patchwork.freedesktop.org/patch/msgid/20190909093116.7747-7-lionel.g.landwerlin@intel.com (cherry picked from commit 1d0f2ebf392ef11465cefb29d545ed9e3222b29e) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_drv.h | 300 +---------------------- drivers/gpu/drm/i915/i915_perf.h | 2 + drivers/gpu/drm/i915/i915_perf_types.h | 327 +++++++++++++++++++++++++ 3 files changed, 330 insertions(+), 299 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_perf_types.h diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 46b90941f81b..1238394997c7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -92,6 +92,7 @@ #include "i915_gem_fence_reg.h" #include "i915_gem_gtt.h" #include "i915_gpu_error.h" +#include "i915_perf_types.h" #include "i915_request.h" #include "i915_scheduler.h" #include "gt/intel_timeline.h" @@ -975,305 +976,6 @@ struct intel_wm_config { bool sprites_scaled; }; -struct i915_oa_format { - u32 format; - int size; -}; - -struct i915_oa_reg { - i915_reg_t addr; - u32 value; -}; - -struct i915_oa_config { - char uuid[UUID_STRING_LEN + 1]; - int id; - - const struct i915_oa_reg *mux_regs; - u32 mux_regs_len; - const struct i915_oa_reg *b_counter_regs; - u32 b_counter_regs_len; - const struct i915_oa_reg *flex_regs; - u32 flex_regs_len; - - struct attribute_group sysfs_metric; - struct attribute *attrs[2]; - struct device_attribute sysfs_metric_id; - - atomic_t ref_count; -}; - -struct i915_perf_stream; - -/** - * struct i915_perf_stream_ops - the OPs to support a specific stream type - */ -struct i915_perf_stream_ops { - /** - * @enable: Enables the collection of HW samples, either in response to - * `I915_PERF_IOCTL_ENABLE` or implicitly called when stream is opened - * without `I915_PERF_FLAG_DISABLED`. - */ - void (*enable)(struct i915_perf_stream *stream); - - /** - * @disable: Disables the collection of HW samples, either in response - * to `I915_PERF_IOCTL_DISABLE` or implicitly called before destroying - * the stream. - */ - void (*disable)(struct i915_perf_stream *stream); - - /** - * @poll_wait: Call poll_wait, passing a wait queue that will be woken - * once there is something ready to read() for the stream - */ - void (*poll_wait)(struct i915_perf_stream *stream, - struct file *file, - poll_table *wait); - - /** - * @wait_unlocked: For handling a blocking read, wait until there is - * something to ready to read() for the stream. E.g. wait on the same - * wait queue that would be passed to poll_wait(). - */ - int (*wait_unlocked)(struct i915_perf_stream *stream); - - /** - * @read: Copy buffered metrics as records to userspace - * **buf**: the userspace, destination buffer - * **count**: the number of bytes to copy, requested by userspace - * **offset**: zero at the start of the read, updated as the read - * proceeds, it represents how many bytes have been copied so far and - * the buffer offset for copying the next record. - * - * Copy as many buffered i915 perf samples and records for this stream - * to userspace as will fit in the given buffer. - * - * Only write complete records; returning -%ENOSPC if there isn't room - * for a complete record. - * - * Return any error condition that results in a short read such as - * -%ENOSPC or -%EFAULT, even though these may be squashed before - * returning to userspace. - */ - int (*read)(struct i915_perf_stream *stream, - char __user *buf, - size_t count, - size_t *offset); - - /** - * @destroy: Cleanup any stream specific resources. - * - * The stream will always be disabled before this is called. - */ - void (*destroy)(struct i915_perf_stream *stream); -}; - -/** - * struct i915_perf_stream - state for a single open stream FD - */ -struct i915_perf_stream { - /** - * @dev_priv: i915 drm device - */ - struct drm_i915_private *dev_priv; - - /** - * @link: Links the stream into ``&drm_i915_private->streams`` - */ - struct list_head link; - - /** - * @wakeref: As we keep the device awake while the perf stream is - * active, we track our runtime pm reference for later release. - */ - intel_wakeref_t wakeref; - - /** - * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*` - * properties given when opening a stream, representing the contents - * of a single sample as read() by userspace. - */ - u32 sample_flags; - - /** - * @sample_size: Considering the configured contents of a sample - * combined with the required header size, this is the total size - * of a single sample record. - */ - int sample_size; - - /** - * @ctx: %NULL if measuring system-wide across all contexts or a - * specific context that is being monitored. - */ - struct i915_gem_context *ctx; - - /** - * @enabled: Whether the stream is currently enabled, considering - * whether the stream was opened in a disabled state and based - * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls. - */ - bool enabled; - - /** - * @ops: The callbacks providing the implementation of this specific - * type of configured stream. - */ - const struct i915_perf_stream_ops *ops; - - /** - * @oa_config: The OA configuration used by the stream. - */ - struct i915_oa_config *oa_config; - - /** - * The OA context specific information. - */ - struct intel_context *pinned_ctx; - u32 specific_ctx_id; - u32 specific_ctx_id_mask; - - struct hrtimer poll_check_timer; - wait_queue_head_t poll_wq; - bool pollin; - - bool periodic; - int period_exponent; - - /** - * State of the OA buffer. - */ - struct { - struct i915_vma *vma; - u8 *vaddr; - u32 last_ctx_id; - int format; - int format_size; - int size_exponent; - - /** - * Locks reads and writes to all head/tail state - * - * Consider: the head and tail pointer state needs to be read - * consistently from a hrtimer callback (atomic context) and - * read() fop (user context) with tail pointer updates happening - * in atomic context and head updates in user context and the - * (unlikely) possibility of read() errors needing to reset all - * head/tail state. - * - * Note: Contention/performance aren't currently a significant - * concern here considering the relatively low frequency of - * hrtimer callbacks (5ms period) and that reads typically only - * happen in response to a hrtimer event and likely complete - * before the next callback. - * - * Note: This lock is not held *while* reading and copying data - * to userspace so the value of head observed in htrimer - * callbacks won't represent any partial consumption of data. - */ - spinlock_t ptr_lock; - - /** - * One 'aging' tail pointer and one 'aged' tail pointer ready to - * used for reading. - * - * Initial values of 0xffffffff are invalid and imply that an - * update is required (and should be ignored by an attempted - * read) - */ - struct { - u32 offset; - } tails[2]; - - /** - * Index for the aged tail ready to read() data up to. - */ - unsigned int aged_tail_idx; - - /** - * A monotonic timestamp for when the current aging tail pointer - * was read; used to determine when it is old enough to trust. - */ - u64 aging_timestamp; - - /** - * Although we can always read back the head pointer register, - * we prefer to avoid trusting the HW state, just to avoid any - * risk that some hardware condition could * somehow bump the - * head pointer unpredictably and cause us to forward the wrong - * OA buffer data to userspace. - */ - u32 head; - } oa_buffer; -}; - -/** - * struct i915_oa_ops - Gen specific implementation of an OA unit stream - */ -struct i915_oa_ops { - /** - * @is_valid_b_counter_reg: Validates register's address for - * programming boolean counters for a particular platform. - */ - bool (*is_valid_b_counter_reg)(struct drm_i915_private *dev_priv, - u32 addr); - - /** - * @is_valid_mux_reg: Validates register's address for programming mux - * for a particular platform. - */ - bool (*is_valid_mux_reg)(struct drm_i915_private *dev_priv, u32 addr); - - /** - * @is_valid_flex_reg: Validates register's address for programming - * flex EU filtering for a particular platform. - */ - bool (*is_valid_flex_reg)(struct drm_i915_private *dev_priv, u32 addr); - - /** - * @enable_metric_set: Selects and applies any MUX configuration to set - * up the Boolean and Custom (B/C) counters that are part of the - * counter reports being sampled. May apply system constraints such as - * disabling EU clock gating as required. - */ - int (*enable_metric_set)(struct i915_perf_stream *stream); - - /** - * @disable_metric_set: Remove system constraints associated with using - * the OA unit. - */ - void (*disable_metric_set)(struct i915_perf_stream *stream); - - /** - * @oa_enable: Enable periodic sampling - */ - void (*oa_enable)(struct i915_perf_stream *stream); - - /** - * @oa_disable: Disable periodic sampling - */ - void (*oa_disable)(struct i915_perf_stream *stream); - - /** - * @read: Copy data from the circular OA buffer into a given userspace - * buffer. - */ - int (*read)(struct i915_perf_stream *stream, - char __user *buf, - size_t count, - size_t *offset); - - /** - * @oa_hw_tail_read: read the OA tail pointer register - * - * In particular this enables us to share all the fiddly code for - * handling the OA unit tail pointer race that affects multiple - * generations. - */ - u32 (*oa_hw_tail_read)(struct i915_perf_stream *stream); -}; - struct intel_cdclk_state { unsigned int cdclk, vco, ref, bypass; u8 voltage_level; diff --git a/drivers/gpu/drm/i915/i915_perf.h b/drivers/gpu/drm/i915/i915_perf.h index a412b16d9ffc..676f3efceda3 100644 --- a/drivers/gpu/drm/i915/i915_perf.h +++ b/drivers/gpu/drm/i915/i915_perf.h @@ -8,6 +8,8 @@ #include +#include "i915_perf_types.h" + struct drm_device; struct drm_file; struct drm_i915_private; diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h new file mode 100644 index 000000000000..edcab2df74fb --- /dev/null +++ b/drivers/gpu/drm/i915/i915_perf_types.h @@ -0,0 +1,327 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2019 Intel Corporation + */ + +#ifndef _I915_PERF_TYPES_H_ +#define _I915_PERF_TYPES_H_ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "i915_reg.h" +#include "intel_wakeref.h" + +struct drm_i915_private; +struct file; +struct i915_gem_context; +struct i915_vma; +struct intel_context; +struct intel_engine_cs; + +struct i915_oa_format { + u32 format; + int size; +}; + +struct i915_oa_reg { + i915_reg_t addr; + u32 value; +}; + +struct i915_oa_config { + char uuid[UUID_STRING_LEN + 1]; + int id; + + const struct i915_oa_reg *mux_regs; + u32 mux_regs_len; + const struct i915_oa_reg *b_counter_regs; + u32 b_counter_regs_len; + const struct i915_oa_reg *flex_regs; + u32 flex_regs_len; + + struct attribute_group sysfs_metric; + struct attribute *attrs[2]; + struct device_attribute sysfs_metric_id; + + atomic_t ref_count; +}; + +struct i915_perf_stream; + +/** + * struct i915_perf_stream_ops - the OPs to support a specific stream type + */ +struct i915_perf_stream_ops { + /** + * @enable: Enables the collection of HW samples, either in response to + * `I915_PERF_IOCTL_ENABLE` or implicitly called when stream is opened + * without `I915_PERF_FLAG_DISABLED`. + */ + void (*enable)(struct i915_perf_stream *stream); + + /** + * @disable: Disables the collection of HW samples, either in response + * to `I915_PERF_IOCTL_DISABLE` or implicitly called before destroying + * the stream. + */ + void (*disable)(struct i915_perf_stream *stream); + + /** + * @poll_wait: Call poll_wait, passing a wait queue that will be woken + * once there is something ready to read() for the stream + */ + void (*poll_wait)(struct i915_perf_stream *stream, + struct file *file, + poll_table *wait); + + /** + * @wait_unlocked: For handling a blocking read, wait until there is + * something to ready to read() for the stream. E.g. wait on the same + * wait queue that would be passed to poll_wait(). + */ + int (*wait_unlocked)(struct i915_perf_stream *stream); + + /** + * @read: Copy buffered metrics as records to userspace + * **buf**: the userspace, destination buffer + * **count**: the number of bytes to copy, requested by userspace + * **offset**: zero at the start of the read, updated as the read + * proceeds, it represents how many bytes have been copied so far and + * the buffer offset for copying the next record. + * + * Copy as many buffered i915 perf samples and records for this stream + * to userspace as will fit in the given buffer. + * + * Only write complete records; returning -%ENOSPC if there isn't room + * for a complete record. + * + * Return any error condition that results in a short read such as + * -%ENOSPC or -%EFAULT, even though these may be squashed before + * returning to userspace. + */ + int (*read)(struct i915_perf_stream *stream, + char __user *buf, + size_t count, + size_t *offset); + + /** + * @destroy: Cleanup any stream specific resources. + * + * The stream will always be disabled before this is called. + */ + void (*destroy)(struct i915_perf_stream *stream); +}; + +/** + * struct i915_perf_stream - state for a single open stream FD + */ +struct i915_perf_stream { + /** + * @dev_priv: i915 drm device + */ + struct drm_i915_private *dev_priv; + + /** + * @link: Links the stream into ``&drm_i915_private->streams`` + */ + struct list_head link; + + /** + * @wakeref: As we keep the device awake while the perf stream is + * active, we track our runtime pm reference for later release. + */ + intel_wakeref_t wakeref; + + /** + * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*` + * properties given when opening a stream, representing the contents + * of a single sample as read() by userspace. + */ + u32 sample_flags; + + /** + * @sample_size: Considering the configured contents of a sample + * combined with the required header size, this is the total size + * of a single sample record. + */ + int sample_size; + + /** + * @ctx: %NULL if measuring system-wide across all contexts or a + * specific context that is being monitored. + */ + struct i915_gem_context *ctx; + + /** + * @enabled: Whether the stream is currently enabled, considering + * whether the stream was opened in a disabled state and based + * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls. + */ + bool enabled; + + /** + * @ops: The callbacks providing the implementation of this specific + * type of configured stream. + */ + const struct i915_perf_stream_ops *ops; + + /** + * @oa_config: The OA configuration used by the stream. + */ + struct i915_oa_config *oa_config; + + /** + * @pinned_ctx: The OA context specific information. + */ + struct intel_context *pinned_ctx; + u32 specific_ctx_id; + u32 specific_ctx_id_mask; + + struct hrtimer poll_check_timer; + wait_queue_head_t poll_wq; + bool pollin; + + bool periodic; + int period_exponent; + + /** + * @oa_buffer: State of the OA buffer. + */ + struct { + struct i915_vma *vma; + u8 *vaddr; + u32 last_ctx_id; + int format; + int format_size; + int size_exponent; + + /** + * @ptr_lock: Locks reads and writes to all head/tail state + * + * Consider: the head and tail pointer state needs to be read + * consistently from a hrtimer callback (atomic context) and + * read() fop (user context) with tail pointer updates happening + * in atomic context and head updates in user context and the + * (unlikely) possibility of read() errors needing to reset all + * head/tail state. + * + * Note: Contention/performance aren't currently a significant + * concern here considering the relatively low frequency of + * hrtimer callbacks (5ms period) and that reads typically only + * happen in response to a hrtimer event and likely complete + * before the next callback. + * + * Note: This lock is not held *while* reading and copying data + * to userspace so the value of head observed in htrimer + * callbacks won't represent any partial consumption of data. + */ + spinlock_t ptr_lock; + + /** + * @tails: One 'aging' tail pointer and one 'aged' tail pointer ready to + * used for reading. + * + * Initial values of 0xffffffff are invalid and imply that an + * update is required (and should be ignored by an attempted + * read) + */ + struct { + u32 offset; + } tails[2]; + + /** + * @aged_tail_idx: Index for the aged tail ready to read() data up to. + */ + unsigned int aged_tail_idx; + + /** + * @aging_timestamp: A monotonic timestamp for when the current aging tail pointer + * was read; used to determine when it is old enough to trust. + */ + u64 aging_timestamp; + + /** + * @head: Although we can always read back the head pointer register, + * we prefer to avoid trusting the HW state, just to avoid any + * risk that some hardware condition could * somehow bump the + * head pointer unpredictably and cause us to forward the wrong + * OA buffer data to userspace. + */ + u32 head; + } oa_buffer; +}; + +/** + * struct i915_oa_ops - Gen specific implementation of an OA unit stream + */ +struct i915_oa_ops { + /** + * @is_valid_b_counter_reg: Validates register's address for + * programming boolean counters for a particular platform. + */ + bool (*is_valid_b_counter_reg)(struct drm_i915_private *dev_priv, + u32 addr); + + /** + * @is_valid_mux_reg: Validates register's address for programming mux + * for a particular platform. + */ + bool (*is_valid_mux_reg)(struct drm_i915_private *dev_priv, u32 addr); + + /** + * @is_valid_flex_reg: Validates register's address for programming + * flex EU filtering for a particular platform. + */ + bool (*is_valid_flex_reg)(struct drm_i915_private *dev_priv, u32 addr); + + /** + * @enable_metric_set: Selects and applies any MUX configuration to set + * up the Boolean and Custom (B/C) counters that are part of the + * counter reports being sampled. May apply system constraints such as + * disabling EU clock gating as required. + */ + int (*enable_metric_set)(struct i915_perf_stream *stream); + + /** + * @disable_metric_set: Remove system constraints associated with using + * the OA unit. + */ + void (*disable_metric_set)(struct i915_perf_stream *stream); + + /** + * @oa_enable: Enable periodic sampling + */ + void (*oa_enable)(struct i915_perf_stream *stream); + + /** + * @oa_disable: Disable periodic sampling + */ + void (*oa_disable)(struct i915_perf_stream *stream); + + /** + * @read: Copy data from the circular OA buffer into a given userspace + * buffer. + */ + int (*read)(struct i915_perf_stream *stream, + char __user *buf, + size_t count, + size_t *offset); + + /** + * @oa_hw_tail_read: read the OA tail pointer register + * + * In particular this enables us to share all the fiddly code for + * handling the OA unit tail pointer race that affects multiple + * generations. + */ + u32 (*oa_hw_tail_read)(struct i915_perf_stream *stream); +}; + +#endif /* _I915_PERF_TYPES_H_ */ From patchwork Tue May 5 08:32:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283335 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0313gfz9sP7; Tue, 5 May 2020 18:33:23 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt0v-0003kb-Nf; Tue, 05 May 2020 08:33:17 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0M-0003Ku-RI for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:42 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0M-0004sj-9K for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:42 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 04/26] drm/i915/perf: Wean ourselves off dev_priv Date: Tue, 5 May 2020 11:32:17 +0300 Message-Id: <20200505083239.2428948-5-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Chris Wilson BugLink: https://bugs.launchpad.net/bugs/1871957 Use the local uncore accessors for the GT rather than using the [not-so] magic global dev_priv mmio routines. In the process, we also teach the perf stream to use backpointers to the i915_perf rather than digging it out of dev_priv. v2: Rebase onto i915_perf_types.h Signed-off-by: Chris Wilson Cc: Umesh Nerlige Ramappa Cc: Lionel Landwerlin Reviewed-by: Lionel Landwerlin #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20191007140812.10963-1-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20191007210942.18145-1-chris@chris-wilson.co.uk (cherry picked from commit 8f8b1171e1a514c2fdfd388b662a7e5b34e839a8) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_drv.h | 56 +- drivers/gpu/drm/i915/i915_perf.c | 731 ++++++++++++------------- drivers/gpu/drm/i915/i915_perf_types.h | 73 ++- 3 files changed, 427 insertions(+), 433 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1238394997c7..3604ff7570c6 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1346,61 +1346,7 @@ struct drm_i915_private { struct intel_runtime_pm runtime_pm; - struct { - bool initialized; - - struct kobject *metrics_kobj; - struct ctl_table_header *sysctl_header; - - /* - * Lock associated with adding/modifying/removing OA configs - * in dev_priv->perf.metrics_idr. - */ - struct mutex metrics_lock; - - /* - * List of dynamic configurations, you need to hold - * dev_priv->perf.metrics_lock to access it. - */ - struct idr metrics_idr; - - /* - * Lock associated with anything below within this structure - * except exclusive_stream. - */ - struct mutex lock; - struct list_head streams; - - /* - * The stream currently using the OA unit. If accessed - * outside a syscall associated to its file - * descriptor, you need to hold - * dev_priv->drm.struct_mutex. - */ - struct i915_perf_stream *exclusive_stream; - - /** - * For rate limiting any notifications of spurious - * invalid OA reports - */ - struct ratelimit_state spurious_report_rs; - - struct i915_oa_config test_config; - - u32 gen7_latched_oastatus1; - u32 ctx_oactxctrl_offset; - u32 ctx_flexeu0_offset; - - /** - * The RPT_ID/reason field for Gen8+ includes a bit - * to determine if the CTX ID in the report is valid - * but the specific bit differs between Gen 8 and 9 - */ - u32 gen8_valid_ctx_bit; - - struct i915_oa_ops ops; - const struct i915_oa_format *oa_formats; - } perf; + struct i915_perf perf; /* Abstract the submission mechanism (legacy ringbuffer or execlists) away */ struct intel_gt gt; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index c1b764233761..bfe58cfc3e67 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -367,8 +367,7 @@ struct perf_open_properties { static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer); -static void free_oa_config(struct drm_i915_private *dev_priv, - struct i915_oa_config *oa_config) +static void free_oa_config(struct i915_oa_config *oa_config) { if (!PTR_ERR(oa_config->flex_regs)) kfree(oa_config->flex_regs); @@ -379,53 +378,52 @@ static void free_oa_config(struct drm_i915_private *dev_priv, kfree(oa_config); } -static void put_oa_config(struct drm_i915_private *dev_priv, - struct i915_oa_config *oa_config) +static void put_oa_config(struct i915_oa_config *oa_config) { if (!atomic_dec_and_test(&oa_config->ref_count)) return; - free_oa_config(dev_priv, oa_config); + free_oa_config(oa_config); } -static int get_oa_config(struct drm_i915_private *dev_priv, +static int get_oa_config(struct i915_perf *perf, int metrics_set, struct i915_oa_config **out_config) { int ret; if (metrics_set == 1) { - *out_config = &dev_priv->perf.test_config; - atomic_inc(&dev_priv->perf.test_config.ref_count); + *out_config = &perf->test_config; + atomic_inc(&perf->test_config.ref_count); return 0; } - ret = mutex_lock_interruptible(&dev_priv->perf.metrics_lock); + ret = mutex_lock_interruptible(&perf->metrics_lock); if (ret) return ret; - *out_config = idr_find(&dev_priv->perf.metrics_idr, metrics_set); + *out_config = idr_find(&perf->metrics_idr, metrics_set); if (!*out_config) ret = -EINVAL; else atomic_inc(&(*out_config)->ref_count); - mutex_unlock(&dev_priv->perf.metrics_lock); + mutex_unlock(&perf->metrics_lock); return ret; } static u32 gen8_oa_hw_tail_read(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; - return I915_READ(GEN8_OATAILPTR) & GEN8_OATAILPTR_MASK; + return intel_uncore_read(uncore, GEN8_OATAILPTR) & GEN8_OATAILPTR_MASK; } static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; - u32 oastatus1 = I915_READ(GEN7_OASTATUS1); + struct intel_uncore *uncore = stream->gt->uncore; + u32 oastatus1 = intel_uncore_read(uncore, GEN7_OASTATUS1); return oastatus1 & GEN7_OASTATUS1_TAIL_MASK; } @@ -456,7 +454,6 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream) */ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; int report_size = stream->oa_buffer.format_size; unsigned long flags; unsigned int aged_idx; @@ -479,7 +476,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) aged_tail = stream->oa_buffer.tails[aged_idx].offset; aging_tail = stream->oa_buffer.tails[!aged_idx].offset; - hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream); + hw_tail = stream->perf->ops.oa_hw_tail_read(stream); /* The tail pointer increases in 64 byte increments, * not in report_size steps... @@ -655,7 +652,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, size_t count, size_t *offset) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; int report_size = stream->oa_buffer.format_size; u8 *oa_buf_base = stream->oa_buffer.vaddr; u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); @@ -740,7 +737,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, reason = ((report32[0] >> OAREPORT_REASON_SHIFT) & OAREPORT_REASON_MASK); if (reason == 0) { - if (__ratelimit(&dev_priv->perf.spurious_report_rs)) + if (__ratelimit(&stream->perf->spurious_report_rs)) DRM_NOTE("Skipping spurious, invalid OA report\n"); continue; } @@ -755,7 +752,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, * Note: that we don't clear the valid_ctx_bit so userspace can * understand that the ID has been squashed by the kernel. */ - if (!(report32[0] & dev_priv->perf.gen8_valid_ctx_bit)) + if (!(report32[0] & stream->perf->gen8_valid_ctx_bit)) ctx_id = report32[2] = INVALID_CTX_ID; /* @@ -789,7 +786,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, * switches since it's not-uncommon for periodic samples to * identify a switch before any 'context switch' report. */ - if (!dev_priv->perf.exclusive_stream->ctx || + if (!stream->perf->exclusive_stream->ctx || stream->specific_ctx_id == ctx_id || stream->oa_buffer.last_ctx_id == stream->specific_ctx_id || reason & OAREPORT_REASON_CTX_SWITCH) { @@ -798,7 +795,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, * While filtering for a single context we avoid * leaking the IDs of other contexts. */ - if (dev_priv->perf.exclusive_stream->ctx && + if (stream->perf->exclusive_stream->ctx && stream->specific_ctx_id != ctx_id) { report32[2] = INVALID_CTX_ID; } @@ -830,7 +827,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, */ head += gtt_offset; - I915_WRITE(GEN8_OAHEADPTR, head & GEN8_OAHEADPTR_MASK); + intel_uncore_write(uncore, GEN8_OAHEADPTR, + head & GEN8_OAHEADPTR_MASK); stream->oa_buffer.head = head; spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags); @@ -864,14 +862,14 @@ static int gen8_oa_read(struct i915_perf_stream *stream, size_t count, size_t *offset) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; u32 oastatus; int ret; if (WARN_ON(!stream->oa_buffer.vaddr)) return -EIO; - oastatus = I915_READ(GEN8_OASTATUS); + oastatus = intel_uncore_read(uncore, GEN8_OASTATUS); /* * We treat OABUFFER_OVERFLOW as a significant error: @@ -896,14 +894,14 @@ static int gen8_oa_read(struct i915_perf_stream *stream, DRM_DEBUG("OA buffer overflow (exponent = %d): force restart\n", stream->period_exponent); - dev_priv->perf.ops.oa_disable(stream); - dev_priv->perf.ops.oa_enable(stream); + stream->perf->ops.oa_disable(stream); + stream->perf->ops.oa_enable(stream); /* * Note: .oa_enable() is expected to re-init the oabuffer and * reset GEN8_OASTATUS for us */ - oastatus = I915_READ(GEN8_OASTATUS); + oastatus = intel_uncore_read(uncore, GEN8_OASTATUS); } if (oastatus & GEN8_OASTATUS_REPORT_LOST) { @@ -911,8 +909,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream, DRM_I915_PERF_RECORD_OA_REPORT_LOST); if (ret) return ret; - I915_WRITE(GEN8_OASTATUS, - oastatus & ~GEN8_OASTATUS_REPORT_LOST); + intel_uncore_write(uncore, GEN8_OASTATUS, + oastatus & ~GEN8_OASTATUS_REPORT_LOST); } return gen8_append_oa_reports(stream, buf, count, offset); @@ -943,7 +941,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, size_t count, size_t *offset) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; int report_size = stream->oa_buffer.format_size; u8 *oa_buf_base = stream->oa_buffer.vaddr; u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); @@ -1017,7 +1015,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, * copying it to userspace... */ if (report32[0] == 0) { - if (__ratelimit(&dev_priv->perf.spurious_report_rs)) + if (__ratelimit(&stream->perf->spurious_report_rs)) DRM_NOTE("Skipping spurious, invalid OA report\n"); continue; } @@ -1043,9 +1041,9 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, */ head += gtt_offset; - I915_WRITE(GEN7_OASTATUS2, - ((head & GEN7_OASTATUS2_HEAD_MASK) | - GEN7_OASTATUS2_MEM_SELECT_GGTT)); + intel_uncore_write(uncore, GEN7_OASTATUS2, + (head & GEN7_OASTATUS2_HEAD_MASK) | + GEN7_OASTATUS2_MEM_SELECT_GGTT); stream->oa_buffer.head = head; spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags); @@ -1075,21 +1073,21 @@ static int gen7_oa_read(struct i915_perf_stream *stream, size_t count, size_t *offset) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; u32 oastatus1; int ret; if (WARN_ON(!stream->oa_buffer.vaddr)) return -EIO; - oastatus1 = I915_READ(GEN7_OASTATUS1); + oastatus1 = intel_uncore_read(uncore, GEN7_OASTATUS1); /* XXX: On Haswell we don't have a safe way to clear oastatus1 * bits while the OA unit is enabled (while the tail pointer * may be updated asynchronously) so we ignore status bits * that have already been reported to userspace. */ - oastatus1 &= ~dev_priv->perf.gen7_latched_oastatus1; + oastatus1 &= ~stream->perf->gen7_latched_oastatus1; /* We treat OABUFFER_OVERFLOW as a significant error: * @@ -1120,10 +1118,10 @@ static int gen7_oa_read(struct i915_perf_stream *stream, DRM_DEBUG("OA buffer overflow (exponent = %d): force restart\n", stream->period_exponent); - dev_priv->perf.ops.oa_disable(stream); - dev_priv->perf.ops.oa_enable(stream); + stream->perf->ops.oa_disable(stream); + stream->perf->ops.oa_enable(stream); - oastatus1 = I915_READ(GEN7_OASTATUS1); + oastatus1 = intel_uncore_read(uncore, GEN7_OASTATUS1); } if (unlikely(oastatus1 & GEN7_OASTATUS1_REPORT_LOST)) { @@ -1131,7 +1129,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream, DRM_I915_PERF_RECORD_OA_REPORT_LOST); if (ret) return ret; - dev_priv->perf.gen7_latched_oastatus1 |= + stream->perf->gen7_latched_oastatus1 |= GEN7_OASTATUS1_REPORT_LOST; } @@ -1196,15 +1194,13 @@ static int i915_oa_read(struct i915_perf_stream *stream, size_t count, size_t *offset) { - struct drm_i915_private *dev_priv = stream->dev_priv; - - return dev_priv->perf.ops.read(stream, buf, count, offset); + return stream->perf->ops.read(stream, buf, count, offset); } static struct intel_context *oa_pin_context(struct i915_perf_stream *stream) { struct i915_gem_engines_iter it; - struct drm_i915_private *i915 = stream->dev_priv; + struct drm_i915_private *i915 = stream->perf->i915; struct i915_gem_context *ctx = stream->ctx; struct intel_context *ce; int err; @@ -1248,14 +1244,13 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream) */ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) { - struct drm_i915_private *i915 = stream->dev_priv; struct intel_context *ce; ce = oa_pin_context(stream); if (IS_ERR(ce)) return PTR_ERR(ce); - switch (INTEL_GEN(i915)) { + switch (INTEL_GEN(ce->engine->i915)) { case 7: { /* * On Haswell we don't do any post processing of the reports @@ -1269,7 +1264,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) case 8: case 9: case 10: - if (USES_GUC_SUBMISSION(i915)) { + if (USES_GUC_SUBMISSION(ce->engine->i915)) { /* * When using GuC, the context descriptor we write in * i915 is read by GuC and rewritten before it's @@ -1312,7 +1307,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) } default: - MISSING_CASE(INTEL_GEN(i915)); + MISSING_CASE(INTEL_GEN(ce->engine->i915)); } DRM_DEBUG_DRIVER("filtering on ctx_id=0x%x ctx_id_mask=0x%x\n", @@ -1331,7 +1326,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) */ static void oa_put_render_ctx_id(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct drm_i915_private *i915 = stream->perf->i915; struct intel_context *ce; stream->specific_ctx_id = INVALID_CTX_ID; @@ -1339,16 +1334,16 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream) ce = fetch_and_zero(&stream->pinned_ctx); if (ce) { - mutex_lock(&dev_priv->drm.struct_mutex); + mutex_lock(&i915->drm.struct_mutex); intel_context_unpin(ce); - mutex_unlock(&dev_priv->drm.struct_mutex); + mutex_unlock(&i915->drm.struct_mutex); } } static void free_oa_buffer(struct i915_perf_stream *stream) { - struct drm_i915_private *i915 = stream->dev_priv; + struct drm_i915_private *i915 = stream->perf->i915; mutex_lock(&i915->drm.struct_mutex); @@ -1362,38 +1357,38 @@ free_oa_buffer(struct i915_perf_stream *stream) static void i915_oa_stream_destroy(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct i915_perf *perf = stream->perf; - BUG_ON(stream != dev_priv->perf.exclusive_stream); + BUG_ON(stream != perf->exclusive_stream); /* * Unset exclusive_stream first, it will be checked while disabling * the metric set on gen8+. */ - mutex_lock(&dev_priv->drm.struct_mutex); - dev_priv->perf.exclusive_stream = NULL; - dev_priv->perf.ops.disable_metric_set(stream); - mutex_unlock(&dev_priv->drm.struct_mutex); + mutex_lock(&perf->i915->drm.struct_mutex); + perf->exclusive_stream = NULL; + perf->ops.disable_metric_set(stream); + mutex_unlock(&perf->i915->drm.struct_mutex); free_oa_buffer(stream); - intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL); - intel_runtime_pm_put(&dev_priv->runtime_pm, stream->wakeref); + intel_uncore_forcewake_put(stream->gt->uncore, FORCEWAKE_ALL); + intel_runtime_pm_put(stream->gt->uncore->rpm, stream->wakeref); if (stream->ctx) oa_put_render_ctx_id(stream); - put_oa_config(dev_priv, stream->oa_config); + put_oa_config(stream->oa_config); - if (dev_priv->perf.spurious_report_rs.missed) { + if (perf->spurious_report_rs.missed) { DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n", - dev_priv->perf.spurious_report_rs.missed); + perf->spurious_report_rs.missed); } } static void gen7_init_oa_buffer(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); unsigned long flags; @@ -1402,13 +1397,14 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream) /* Pre-DevBDW: OABUFFER must be set with counters off, * before OASTATUS1, but after OASTATUS2 */ - I915_WRITE(GEN7_OASTATUS2, - gtt_offset | GEN7_OASTATUS2_MEM_SELECT_GGTT); /* head */ + intel_uncore_write(uncore, GEN7_OASTATUS2, /* head */ + gtt_offset | GEN7_OASTATUS2_MEM_SELECT_GGTT); stream->oa_buffer.head = gtt_offset; - I915_WRITE(GEN7_OABUFFER, gtt_offset); + intel_uncore_write(uncore, GEN7_OABUFFER, gtt_offset); - I915_WRITE(GEN7_OASTATUS1, gtt_offset | OABUFFER_SIZE_16M); /* tail */ + intel_uncore_write(uncore, GEN7_OASTATUS1, /* tail */ + gtt_offset | OABUFFER_SIZE_16M); /* Mark that we need updated tail pointers to read from... */ stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR; @@ -1420,7 +1416,7 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream) * already seen since they can't be cleared while periodic * sampling is enabled. */ - dev_priv->perf.gen7_latched_oastatus1 = 0; + stream->perf->gen7_latched_oastatus1 = 0; /* NB: although the OA buffer will initially be allocated * zeroed via shmfs (and so this memset is redundant when @@ -1443,17 +1439,17 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream) static void gen8_init_oa_buffer(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); unsigned long flags; spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags); - I915_WRITE(GEN8_OASTATUS, 0); - I915_WRITE(GEN8_OAHEADPTR, gtt_offset); + intel_uncore_write(uncore, GEN8_OASTATUS, 0); + intel_uncore_write(uncore, GEN8_OAHEADPTR, gtt_offset); stream->oa_buffer.head = gtt_offset; - I915_WRITE(GEN8_OABUFFER_UDW, 0); + intel_uncore_write(uncore, GEN8_OABUFFER_UDW, 0); /* * PRM says: @@ -1463,9 +1459,9 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream) * to enable proper functionality of the overflow * bit." */ - I915_WRITE(GEN8_OABUFFER, gtt_offset | + intel_uncore_write(uncore, GEN8_OABUFFER, gtt_offset | OABUFFER_SIZE_16M | GEN8_OABUFFER_MEM_SELECT_GGTT); - I915_WRITE(GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK); + intel_uncore_write(uncore, GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK); /* Mark that we need updated tail pointers to read from... */ stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR; @@ -1503,22 +1499,22 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream) static int alloc_oa_buffer(struct i915_perf_stream *stream) { + struct drm_i915_private *i915 = stream->perf->i915; struct drm_i915_gem_object *bo; - struct drm_i915_private *dev_priv = stream->dev_priv; struct i915_vma *vma; int ret; if (WARN_ON(stream->oa_buffer.vma)) return -ENODEV; - ret = i915_mutex_lock_interruptible(&dev_priv->drm); + ret = i915_mutex_lock_interruptible(&i915->drm); if (ret) return ret; BUILD_BUG_ON_NOT_POWER_OF_2(OA_BUFFER_SIZE); BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M); - bo = i915_gem_object_create_shmem(dev_priv, OA_BUFFER_SIZE); + bo = i915_gem_object_create_shmem(stream->perf->i915, OA_BUFFER_SIZE); if (IS_ERR(bo)) { DRM_ERROR("Failed to allocate OA buffer\n"); ret = PTR_ERR(bo); @@ -1558,11 +1554,11 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream) stream->oa_buffer.vma = NULL; unlock: - mutex_unlock(&dev_priv->drm.struct_mutex); + mutex_unlock(&i915->drm.struct_mutex); return ret; } -static void config_oa_regs(struct drm_i915_private *dev_priv, +static void config_oa_regs(struct intel_uncore *uncore, const struct i915_oa_reg *regs, u32 n_regs) { @@ -1571,7 +1567,7 @@ static void config_oa_regs(struct drm_i915_private *dev_priv, for (i = 0; i < n_regs; i++) { const struct i915_oa_reg *reg = regs + i; - I915_WRITE(reg->addr, reg->value); + intel_uncore_write(uncore, reg->addr, reg->value); } } @@ -1604,7 +1600,7 @@ static void delay_after_mux(void) static int hsw_enable_metric_set(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; const struct i915_oa_config *oa_config = stream->oa_config; /* @@ -1617,15 +1613,15 @@ static int hsw_enable_metric_set(struct i915_perf_stream *stream) * count the events from non-render domain. Unit level clock * gating for RCS should also be disabled. */ - I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) & - ~GEN7_DOP_CLOCK_GATE_ENABLE)); - I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) | - GEN6_CSUNIT_CLOCK_GATE_DISABLE)); + intel_uncore_rmw(uncore, GEN7_MISCCPCTL, + GEN7_DOP_CLOCK_GATE_ENABLE, 0); + intel_uncore_rmw(uncore, GEN6_UCGCTL1, + 0, GEN6_CSUNIT_CLOCK_GATE_DISABLE); - config_oa_regs(dev_priv, oa_config->mux_regs, oa_config->mux_regs_len); + config_oa_regs(uncore, oa_config->mux_regs, oa_config->mux_regs_len); delay_after_mux(); - config_oa_regs(dev_priv, oa_config->b_counter_regs, + config_oa_regs(uncore, oa_config->b_counter_regs, oa_config->b_counter_regs_len); return 0; @@ -1633,15 +1629,14 @@ static int hsw_enable_metric_set(struct i915_perf_stream *stream) static void hsw_disable_metric_set(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; - I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) & - ~GEN6_CSUNIT_CLOCK_GATE_DISABLE)); - I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) | - GEN7_DOP_CLOCK_GATE_ENABLE)); + intel_uncore_rmw(uncore, GEN6_UCGCTL1, + GEN6_CSUNIT_CLOCK_GATE_DISABLE, 0); + intel_uncore_rmw(uncore, GEN7_MISCCPCTL, + 0, GEN7_DOP_CLOCK_GATE_ENABLE); - I915_WRITE(GDT_CHICKEN_BITS, (I915_READ(GDT_CHICKEN_BITS) & - ~GT_NOA_ENABLE)); + intel_uncore_rmw(uncore, GDT_CHICKEN_BITS, GT_NOA_ENABLE, 0); } static u32 oa_config_flex_reg(const struct i915_oa_config *oa_config, @@ -1678,9 +1673,8 @@ gen8_update_reg_state_unlocked(struct i915_perf_stream *stream, u32 *reg_state, const struct i915_oa_config *oa_config) { - struct drm_i915_private *i915 = ce->engine->i915; - u32 ctx_oactxctrl = i915->perf.ctx_oactxctrl_offset; - u32 ctx_flexeu0 = i915->perf.ctx_flexeu0_offset; + u32 ctx_oactxctrl = stream->perf->ctx_oactxctrl_offset; + u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset; /* The MMIO offsets for Flex EU registers aren't contiguous */ i915_reg_t flex_regs[] = { EU_PERF_CNTL0, @@ -1705,7 +1699,7 @@ gen8_update_reg_state_unlocked(struct i915_perf_stream *stream, CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, - intel_sseu_make_rpcs(i915, &ce->sseu)); + intel_sseu_make_rpcs(stream->perf->i915, &ce->sseu)); } struct flex { @@ -1860,7 +1854,7 @@ static int gen8_configure_context(struct i915_gem_context *ctx, static int gen8_configure_all_contexts(struct i915_perf_stream *stream, const struct i915_oa_config *oa_config) { - struct drm_i915_private *i915 = stream->dev_priv; + struct drm_i915_private *i915 = stream->perf->i915; /* The MMIO offsets for Flex EU registers aren't contiguous */ const u32 ctx_flexeu0 = i915->perf.ctx_flexeu0_offset; #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N)) @@ -1945,7 +1939,7 @@ static int gen8_configure_all_contexts(struct i915_perf_stream *stream, static int gen8_enable_metric_set(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; const struct i915_oa_config *oa_config = stream->oa_config; int ret; @@ -1972,10 +1966,10 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream) * be read back from automatically triggered reports, as part of the * RPT_ID field. */ - if (IS_GEN_RANGE(dev_priv, 9, 11)) { - I915_WRITE(GEN8_OA_DEBUG, - _MASKED_BIT_ENABLE(GEN9_OA_DEBUG_DISABLE_CLK_RATIO_REPORTS | - GEN9_OA_DEBUG_INCLUDE_CLK_RATIO)); + if (IS_GEN_RANGE(stream->perf->i915, 9, 11)) { + intel_uncore_write(uncore, GEN8_OA_DEBUG, + _MASKED_BIT_ENABLE(GEN9_OA_DEBUG_DISABLE_CLK_RATIO_REPORTS | + GEN9_OA_DEBUG_INCLUDE_CLK_RATIO)); } /* @@ -1987,10 +1981,10 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream) if (ret) return ret; - config_oa_regs(dev_priv, oa_config->mux_regs, oa_config->mux_regs_len); + config_oa_regs(uncore, oa_config->mux_regs, oa_config->mux_regs_len); delay_after_mux(); - config_oa_regs(dev_priv, oa_config->b_counter_regs, + config_oa_regs(uncore, oa_config->b_counter_regs, oa_config->b_counter_regs_len); return 0; @@ -1998,30 +1992,28 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream) static void gen8_disable_metric_set(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; /* Reset all contexts' slices/subslices configurations. */ gen8_configure_all_contexts(stream, NULL); - I915_WRITE(GDT_CHICKEN_BITS, (I915_READ(GDT_CHICKEN_BITS) & - ~GT_NOA_ENABLE)); + intel_uncore_rmw(uncore, GDT_CHICKEN_BITS, GT_NOA_ENABLE, 0); } static void gen10_disable_metric_set(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; /* Reset all contexts' slices/subslices configurations. */ gen8_configure_all_contexts(stream, NULL); /* Make sure we disable noa to save power. */ - I915_WRITE(RPM_CONFIG1, - I915_READ(RPM_CONFIG1) & ~GEN10_GT_NOA_ENABLE); + intel_uncore_rmw(uncore, RPM_CONFIG1, GEN10_GT_NOA_ENABLE, 0); } static void gen7_oa_enable(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; struct i915_gem_context *ctx = stream->ctx; u32 ctx_id = stream->specific_ctx_id; bool periodic = stream->periodic; @@ -2039,19 +2031,19 @@ static void gen7_oa_enable(struct i915_perf_stream *stream) */ gen7_init_oa_buffer(stream); - I915_WRITE(GEN7_OACONTROL, - (ctx_id & GEN7_OACONTROL_CTX_MASK) | - (period_exponent << - GEN7_OACONTROL_TIMER_PERIOD_SHIFT) | - (periodic ? GEN7_OACONTROL_TIMER_ENABLE : 0) | - (report_format << GEN7_OACONTROL_FORMAT_SHIFT) | - (ctx ? GEN7_OACONTROL_PER_CTX_ENABLE : 0) | - GEN7_OACONTROL_ENABLE); + intel_uncore_write(uncore, GEN7_OACONTROL, + (ctx_id & GEN7_OACONTROL_CTX_MASK) | + (period_exponent << + GEN7_OACONTROL_TIMER_PERIOD_SHIFT) | + (periodic ? GEN7_OACONTROL_TIMER_ENABLE : 0) | + (report_format << GEN7_OACONTROL_FORMAT_SHIFT) | + (ctx ? GEN7_OACONTROL_PER_CTX_ENABLE : 0) | + GEN7_OACONTROL_ENABLE); } static void gen8_oa_enable(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct intel_uncore *uncore = stream->gt->uncore; u32 report_format = stream->oa_buffer.format; /* @@ -2070,9 +2062,9 @@ static void gen8_oa_enable(struct i915_perf_stream *stream) * filtering and instead filter on the cpu based on the context-id * field of reports */ - I915_WRITE(GEN8_OACONTROL, (report_format << - GEN8_OA_REPORT_FORMAT_SHIFT) | - GEN8_OA_COUNTER_ENABLE); + intel_uncore_write(uncore, GEN8_OACONTROL, + (report_format << GEN8_OA_REPORT_FORMAT_SHIFT) | + GEN8_OA_COUNTER_ENABLE); } /** @@ -2086,9 +2078,7 @@ static void gen8_oa_enable(struct i915_perf_stream *stream) */ static void i915_oa_stream_enable(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; - - dev_priv->perf.ops.oa_enable(stream); + stream->perf->ops.oa_enable(stream); if (stream->periodic) hrtimer_start(&stream->poll_check_timer, @@ -2098,7 +2088,7 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream) static void gen7_oa_disable(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = &stream->dev_priv->uncore; + struct intel_uncore *uncore = stream->gt->uncore; intel_uncore_write(uncore, GEN7_OACONTROL, 0); if (intel_wait_for_register(uncore, @@ -2109,7 +2099,7 @@ static void gen7_oa_disable(struct i915_perf_stream *stream) static void gen8_oa_disable(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = &stream->dev_priv->uncore; + struct intel_uncore *uncore = stream->gt->uncore; intel_uncore_write(uncore, GEN8_OACONTROL, 0); if (intel_wait_for_register(uncore, @@ -2128,9 +2118,7 @@ static void gen8_oa_disable(struct i915_perf_stream *stream) */ static void i915_oa_stream_disable(struct i915_perf_stream *stream) { - struct drm_i915_private *dev_priv = stream->dev_priv; - - dev_priv->perf.ops.oa_disable(stream); + stream->perf->ops.oa_disable(stream); if (stream->periodic) hrtimer_cancel(&stream->poll_check_timer); @@ -2167,7 +2155,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, struct drm_i915_perf_open_param *param, struct perf_open_properties *props) { - struct drm_i915_private *dev_priv = stream->dev_priv; + struct i915_perf *perf = stream->perf; int format_size; int ret; @@ -2175,7 +2163,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, * reason then don't let userspace try their luck with config * IDs */ - if (!dev_priv->perf.metrics_kobj) { + if (!perf->metrics_kobj) { DRM_DEBUG("OA metrics weren't advertised via sysfs\n"); return -EINVAL; } @@ -2185,7 +2173,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, return -EINVAL; } - if (!dev_priv->perf.ops.enable_metric_set) { + if (!perf->ops.enable_metric_set) { DRM_DEBUG("OA unit not supported\n"); return -ENODEV; } @@ -2194,7 +2182,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, * counter reports and marshal to the appropriate client * we currently only allow exclusive access */ - if (dev_priv->perf.exclusive_stream) { + if (perf->exclusive_stream) { DRM_DEBUG("OA unit already in use\n"); return -EBUSY; } @@ -2206,7 +2194,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, stream->sample_size = sizeof(struct drm_i915_perf_record_header); - format_size = dev_priv->perf.oa_formats[props->oa_format].size; + format_size = perf->oa_formats[props->oa_format].size; stream->sample_flags |= SAMPLE_OA_REPORT; stream->sample_size += format_size; @@ -2216,7 +2204,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, return -EINVAL; stream->oa_buffer.format = - dev_priv->perf.oa_formats[props->oa_format].format; + perf->oa_formats[props->oa_format].format; stream->periodic = props->oa_periodic; if (stream->periodic) @@ -2230,7 +2218,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, } } - ret = get_oa_config(dev_priv, props->metrics_set, &stream->oa_config); + ret = get_oa_config(perf, props->metrics_set, &stream->oa_config); if (ret) { DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set); goto err_config; @@ -2248,27 +2236,27 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, * In our case we are expecting that taking pm + FORCEWAKE * references will effectively disable RC6. */ - stream->wakeref = intel_runtime_pm_get(&dev_priv->runtime_pm); - intel_uncore_forcewake_get(&dev_priv->uncore, FORCEWAKE_ALL); + stream->wakeref = intel_runtime_pm_get(stream->gt->uncore->rpm); + intel_uncore_forcewake_get(stream->gt->uncore, FORCEWAKE_ALL); ret = alloc_oa_buffer(stream); if (ret) goto err_oa_buf_alloc; - ret = i915_mutex_lock_interruptible(&dev_priv->drm); + ret = i915_mutex_lock_interruptible(&stream->perf->i915->drm); if (ret) goto err_lock; stream->ops = &i915_oa_stream_ops; - dev_priv->perf.exclusive_stream = stream; + perf->exclusive_stream = stream; - ret = dev_priv->perf.ops.enable_metric_set(stream); + ret = perf->ops.enable_metric_set(stream); if (ret) { DRM_DEBUG("Unable to enable metric set\n"); goto err_enable; } - mutex_unlock(&dev_priv->drm.struct_mutex); + mutex_unlock(&stream->perf->i915->drm.struct_mutex); hrtimer_init(&stream->poll_check_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); @@ -2279,18 +2267,18 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, return 0; err_enable: - dev_priv->perf.exclusive_stream = NULL; - dev_priv->perf.ops.disable_metric_set(stream); - mutex_unlock(&dev_priv->drm.struct_mutex); + perf->exclusive_stream = NULL; + perf->ops.disable_metric_set(stream); + mutex_unlock(&stream->perf->i915->drm.struct_mutex); err_lock: free_oa_buffer(stream); err_oa_buf_alloc: - put_oa_config(dev_priv, stream->oa_config); + put_oa_config(stream->oa_config); - intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL); - intel_runtime_pm_put(&dev_priv->runtime_pm, stream->wakeref); + intel_uncore_forcewake_put(stream->gt->uncore, FORCEWAKE_ALL); + intel_runtime_pm_put(stream->gt->uncore->rpm, stream->wakeref); err_config: if (stream->ctx) @@ -2383,7 +2371,7 @@ static ssize_t i915_perf_read(struct file *file, loff_t *ppos) { struct i915_perf_stream *stream = file->private_data; - struct drm_i915_private *dev_priv = stream->dev_priv; + struct i915_perf *perf = stream->perf; ssize_t ret; /* To ensure it's handled consistently we simply treat all reads of a @@ -2406,15 +2394,15 @@ static ssize_t i915_perf_read(struct file *file, if (ret) return ret; - mutex_lock(&dev_priv->perf.lock); + mutex_lock(&perf->lock); ret = i915_perf_read_locked(stream, file, buf, count, ppos); - mutex_unlock(&dev_priv->perf.lock); + mutex_unlock(&perf->lock); } while (ret == -EAGAIN); } else { - mutex_lock(&dev_priv->perf.lock); + mutex_lock(&perf->lock); ret = i915_perf_read_locked(stream, file, buf, count, ppos); - mutex_unlock(&dev_priv->perf.lock); + mutex_unlock(&perf->lock); } /* We allow the poll checking to sometimes report false positive EPOLLIN @@ -2452,7 +2440,6 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer) /** * i915_perf_poll_locked - poll_wait() with a suitable wait queue for stream - * @dev_priv: i915 device instance * @stream: An i915 perf stream * @file: An i915 perf stream file * @wait: poll() state table @@ -2461,15 +2448,14 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer) * &i915_perf_stream_ops->poll_wait to call poll_wait() with a wait queue that * will be woken for new stream data. * - * Note: The &drm_i915_private->perf.lock mutex has been taken to serialize + * Note: The &perf->lock mutex has been taken to serialize * with any non-file-operation driver hooks. * * Returns: any poll events that are ready without sleeping */ -static __poll_t i915_perf_poll_locked(struct drm_i915_private *dev_priv, - struct i915_perf_stream *stream, - struct file *file, - poll_table *wait) +static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream, + struct file *file, + poll_table *wait) { __poll_t events = 0; @@ -2503,12 +2489,12 @@ static __poll_t i915_perf_poll_locked(struct drm_i915_private *dev_priv, static __poll_t i915_perf_poll(struct file *file, poll_table *wait) { struct i915_perf_stream *stream = file->private_data; - struct drm_i915_private *dev_priv = stream->dev_priv; + struct i915_perf *perf = stream->perf; __poll_t ret; - mutex_lock(&dev_priv->perf.lock); - ret = i915_perf_poll_locked(dev_priv, stream, file, wait); - mutex_unlock(&dev_priv->perf.lock); + mutex_lock(&perf->lock); + ret = i915_perf_poll_locked(stream, file, wait); + mutex_unlock(&perf->lock); return ret; } @@ -2567,7 +2553,7 @@ static void i915_perf_disable_locked(struct i915_perf_stream *stream) * @cmd: the ioctl request * @arg: the ioctl data * - * Note: The &drm_i915_private->perf.lock mutex has been taken to serialize + * Note: The &perf->lock mutex has been taken to serialize * with any non-file-operation driver hooks. * * Returns: zero on success or a negative error code. Returns -EINVAL for @@ -2605,12 +2591,12 @@ static long i915_perf_ioctl(struct file *file, unsigned long arg) { struct i915_perf_stream *stream = file->private_data; - struct drm_i915_private *dev_priv = stream->dev_priv; + struct i915_perf *perf = stream->perf; long ret; - mutex_lock(&dev_priv->perf.lock); + mutex_lock(&perf->lock); ret = i915_perf_ioctl_locked(stream, cmd, arg); - mutex_unlock(&dev_priv->perf.lock); + mutex_unlock(&perf->lock); return ret; } @@ -2622,7 +2608,7 @@ static long i915_perf_ioctl(struct file *file, * Frees all resources associated with the given i915 perf @stream, disabling * any associated data capture in the process. * - * Note: The &drm_i915_private->perf.lock mutex has been taken to serialize + * Note: The &perf->lock mutex has been taken to serialize * with any non-file-operation driver hooks. */ static void i915_perf_destroy_locked(struct i915_perf_stream *stream) @@ -2655,14 +2641,14 @@ static void i915_perf_destroy_locked(struct i915_perf_stream *stream) static int i915_perf_release(struct inode *inode, struct file *file) { struct i915_perf_stream *stream = file->private_data; - struct drm_i915_private *dev_priv = stream->dev_priv; + struct i915_perf *perf = stream->perf; - mutex_lock(&dev_priv->perf.lock); + mutex_lock(&perf->lock); i915_perf_destroy_locked(stream); - mutex_unlock(&dev_priv->perf.lock); + mutex_unlock(&perf->lock); /* Release the reference the perf stream kept on the driver. */ - drm_dev_put(&dev_priv->drm); + drm_dev_put(&perf->i915->drm); return 0; } @@ -2684,7 +2670,7 @@ static const struct file_operations fops = { /** * i915_perf_open_ioctl_locked - DRM ioctl() for userspace to open a stream FD - * @dev_priv: i915 device instance + * @perf: i915 perf instance * @param: The open parameters passed to 'DRM_I915_PERF_OPEN` * @props: individually validated u64 property value pairs * @file: drm file @@ -2692,7 +2678,7 @@ static const struct file_operations fops = { * See i915_perf_ioctl_open() for interface details. * * Implements further stream config validation and stream initialization on - * behalf of i915_perf_open_ioctl() with the &drm_i915_private->perf.lock mutex + * behalf of i915_perf_open_ioctl() with the &perf->lock mutex * taken to serialize with any non-file-operation driver hooks. * * Note: at this point the @props have only been validated in isolation and @@ -2707,7 +2693,7 @@ static const struct file_operations fops = { * Returns: zero on success or a negative error code. */ static int -i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, +i915_perf_open_ioctl_locked(struct i915_perf *perf, struct drm_i915_perf_open_param *param, struct perf_open_properties *props, struct drm_file *file) @@ -2746,7 +2732,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, * MI_REPORT_PERF_COUNT commands and so consider it a privileged op to * enable the OA unit by default. */ - if (IS_HASWELL(dev_priv) && specific_ctx) + if (IS_HASWELL(perf->i915) && specific_ctx) privileged_op = false; /* Similar to perf's kernel.perf_paranoid_cpu sysctl option @@ -2767,7 +2753,8 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, goto err_ctx; } - stream->dev_priv = dev_priv; + stream->perf = perf; + stream->gt = &perf->i915->gt; stream->ctx = specific_ctx; ret = i915_oa_stream_init(stream, param, props); @@ -2783,7 +2770,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, goto err_flags; } - list_add(&stream->link, &dev_priv->perf.streams); + list_add(&stream->link, &perf->streams); if (param->flags & I915_PERF_FLAG_FD_CLOEXEC) f_flags |= O_CLOEXEC; @@ -2802,7 +2789,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, /* Take a reference on the driver that will be kept with stream_fd * until its release. */ - drm_dev_get(&dev_priv->drm); + drm_dev_get(&perf->i915->drm); return stream_fd; @@ -2820,15 +2807,15 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, return ret; } -static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int exponent) +static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent) { return div64_u64(1000000000ULL * (2ULL << exponent), - 1000ULL * RUNTIME_INFO(dev_priv)->cs_timestamp_frequency_khz); + 1000ULL * RUNTIME_INFO(perf->i915)->cs_timestamp_frequency_khz); } /** * read_properties_unlocked - validate + copy userspace stream open properties - * @dev_priv: i915 device instance + * @perf: i915 perf instance * @uprops: The array of u64 key value pairs given by userspace * @n_props: The number of key value pairs expected in @uprops * @props: The stream configuration built up while validating properties @@ -2841,7 +2828,7 @@ static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int exponent) * we shouldn't validate or assume anything about ordering here. This doesn't * rule out defining new properties with ordering requirements in the future. */ -static int read_properties_unlocked(struct drm_i915_private *dev_priv, +static int read_properties_unlocked(struct i915_perf *perf, u64 __user *uprops, u32 n_props, struct perf_open_properties *props) @@ -2907,7 +2894,7 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, value); return -EINVAL; } - if (!dev_priv->perf.oa_formats[value].size) { + if (!perf->oa_formats[value].size) { DRM_DEBUG("Unsupported OA report format %llu\n", value); return -EINVAL; @@ -2928,7 +2915,7 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, */ BUILD_BUG_ON(sizeof(oa_period) != 8); - oa_period = oa_exponent_to_ns(dev_priv, value); + oa_period = oa_exponent_to_ns(perf, value); /* This check is primarily to ensure that oa_period <= * UINT32_MAX (before passing to do_div which only @@ -2982,7 +2969,7 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, * mutex to avoid an awkward lockdep with mmap_sem. * * Most of the implementation details are handled by - * i915_perf_open_ioctl_locked() after taking the &drm_i915_private->perf.lock + * i915_perf_open_ioctl_locked() after taking the &perf->lock * mutex for serializing with any non-file-operation driver hooks. * * Return: A newly opened i915 Perf stream file descriptor or negative @@ -2991,13 +2978,13 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, int i915_perf_open_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { - struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_perf *perf = &to_i915(dev)->perf; struct drm_i915_perf_open_param *param = data; struct perf_open_properties props; u32 known_open_flags; int ret; - if (!dev_priv->perf.initialized) { + if (!perf->i915) { DRM_DEBUG("i915 perf interface not available for this system\n"); return -ENOTSUPP; } @@ -3010,124 +2997,127 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data, return -EINVAL; } - ret = read_properties_unlocked(dev_priv, + ret = read_properties_unlocked(perf, u64_to_user_ptr(param->properties_ptr), param->num_properties, &props); if (ret) return ret; - mutex_lock(&dev_priv->perf.lock); - ret = i915_perf_open_ioctl_locked(dev_priv, param, &props, file); - mutex_unlock(&dev_priv->perf.lock); + mutex_lock(&perf->lock); + ret = i915_perf_open_ioctl_locked(perf, param, &props, file); + mutex_unlock(&perf->lock); return ret; } /** * i915_perf_register - exposes i915-perf to userspace - * @dev_priv: i915 device instance + * @i915: i915 device instance * * In particular OA metric sets are advertised under a sysfs metrics/ * directory allowing userspace to enumerate valid IDs that can be * used to open an i915-perf stream. */ -void i915_perf_register(struct drm_i915_private *dev_priv) +void i915_perf_register(struct drm_i915_private *i915) { + struct i915_perf *perf = &i915->perf; int ret; - if (!dev_priv->perf.initialized) + if (!perf->i915) return; /* To be sure we're synchronized with an attempted * i915_perf_open_ioctl(); considering that we register after * being exposed to userspace. */ - mutex_lock(&dev_priv->perf.lock); + mutex_lock(&perf->lock); - dev_priv->perf.metrics_kobj = + perf->metrics_kobj = kobject_create_and_add("metrics", - &dev_priv->drm.primary->kdev->kobj); - if (!dev_priv->perf.metrics_kobj) + &i915->drm.primary->kdev->kobj); + if (!perf->metrics_kobj) goto exit; - sysfs_attr_init(&dev_priv->perf.test_config.sysfs_metric_id.attr); - - if (INTEL_GEN(dev_priv) >= 11) { - i915_perf_load_test_config_icl(dev_priv); - } else if (IS_CANNONLAKE(dev_priv)) { - i915_perf_load_test_config_cnl(dev_priv); - } else if (IS_COFFEELAKE(dev_priv)) { - if (IS_CFL_GT2(dev_priv)) - i915_perf_load_test_config_cflgt2(dev_priv); - if (IS_CFL_GT3(dev_priv)) - i915_perf_load_test_config_cflgt3(dev_priv); - } else if (IS_GEMINILAKE(dev_priv)) { - i915_perf_load_test_config_glk(dev_priv); - } else if (IS_KABYLAKE(dev_priv)) { - if (IS_KBL_GT2(dev_priv)) - i915_perf_load_test_config_kblgt2(dev_priv); - else if (IS_KBL_GT3(dev_priv)) - i915_perf_load_test_config_kblgt3(dev_priv); - } else if (IS_BROXTON(dev_priv)) { - i915_perf_load_test_config_bxt(dev_priv); - } else if (IS_SKYLAKE(dev_priv)) { - if (IS_SKL_GT2(dev_priv)) - i915_perf_load_test_config_sklgt2(dev_priv); - else if (IS_SKL_GT3(dev_priv)) - i915_perf_load_test_config_sklgt3(dev_priv); - else if (IS_SKL_GT4(dev_priv)) - i915_perf_load_test_config_sklgt4(dev_priv); - } else if (IS_CHERRYVIEW(dev_priv)) { - i915_perf_load_test_config_chv(dev_priv); - } else if (IS_BROADWELL(dev_priv)) { - i915_perf_load_test_config_bdw(dev_priv); - } else if (IS_HASWELL(dev_priv)) { - i915_perf_load_test_config_hsw(dev_priv); -} - - if (dev_priv->perf.test_config.id == 0) + sysfs_attr_init(&perf->test_config.sysfs_metric_id.attr); + + if (INTEL_GEN(i915) >= 11) { + i915_perf_load_test_config_icl(i915); + } else if (IS_CANNONLAKE(i915)) { + i915_perf_load_test_config_cnl(i915); + } else if (IS_COFFEELAKE(i915)) { + if (IS_CFL_GT2(i915)) + i915_perf_load_test_config_cflgt2(i915); + if (IS_CFL_GT3(i915)) + i915_perf_load_test_config_cflgt3(i915); + } else if (IS_GEMINILAKE(i915)) { + i915_perf_load_test_config_glk(i915); + } else if (IS_KABYLAKE(i915)) { + if (IS_KBL_GT2(i915)) + i915_perf_load_test_config_kblgt2(i915); + else if (IS_KBL_GT3(i915)) + i915_perf_load_test_config_kblgt3(i915); + } else if (IS_BROXTON(i915)) { + i915_perf_load_test_config_bxt(i915); + } else if (IS_SKYLAKE(i915)) { + if (IS_SKL_GT2(i915)) + i915_perf_load_test_config_sklgt2(i915); + else if (IS_SKL_GT3(i915)) + i915_perf_load_test_config_sklgt3(i915); + else if (IS_SKL_GT4(i915)) + i915_perf_load_test_config_sklgt4(i915); + } else if (IS_CHERRYVIEW(i915)) { + i915_perf_load_test_config_chv(i915); + } else if (IS_BROADWELL(i915)) { + i915_perf_load_test_config_bdw(i915); + } else if (IS_HASWELL(i915)) { + i915_perf_load_test_config_hsw(i915); + } + + if (perf->test_config.id == 0) goto sysfs_error; - ret = sysfs_create_group(dev_priv->perf.metrics_kobj, - &dev_priv->perf.test_config.sysfs_metric); + ret = sysfs_create_group(perf->metrics_kobj, + &perf->test_config.sysfs_metric); if (ret) goto sysfs_error; - atomic_set(&dev_priv->perf.test_config.ref_count, 1); + atomic_set(&perf->test_config.ref_count, 1); goto exit; sysfs_error: - kobject_put(dev_priv->perf.metrics_kobj); - dev_priv->perf.metrics_kobj = NULL; + kobject_put(perf->metrics_kobj); + perf->metrics_kobj = NULL; exit: - mutex_unlock(&dev_priv->perf.lock); + mutex_unlock(&perf->lock); } /** * i915_perf_unregister - hide i915-perf from userspace - * @dev_priv: i915 device instance + * @i915: i915 device instance * * i915-perf state cleanup is split up into an 'unregister' and * 'deinit' phase where the interface is first hidden from * userspace by i915_perf_unregister() before cleaning up * remaining state in i915_perf_fini(). */ -void i915_perf_unregister(struct drm_i915_private *dev_priv) +void i915_perf_unregister(struct drm_i915_private *i915) { - if (!dev_priv->perf.metrics_kobj) + struct i915_perf *perf = &i915->perf; + + if (!perf->metrics_kobj) return; - sysfs_remove_group(dev_priv->perf.metrics_kobj, - &dev_priv->perf.test_config.sysfs_metric); + sysfs_remove_group(perf->metrics_kobj, + &perf->test_config.sysfs_metric); - kobject_put(dev_priv->perf.metrics_kobj); - dev_priv->perf.metrics_kobj = NULL; + kobject_put(perf->metrics_kobj); + perf->metrics_kobj = NULL; } -static bool gen8_is_valid_flex_addr(struct drm_i915_private *dev_priv, u32 addr) +static bool gen8_is_valid_flex_addr(struct i915_perf *perf, u32 addr) { static const i915_reg_t flex_eu_regs[] = { EU_PERF_CNTL0, @@ -3147,7 +3137,7 @@ static bool gen8_is_valid_flex_addr(struct drm_i915_private *dev_priv, u32 addr) return false; } -static bool gen7_is_valid_b_counter_addr(struct drm_i915_private *dev_priv, u32 addr) +static bool gen7_is_valid_b_counter_addr(struct i915_perf *perf, u32 addr) { return (addr >= i915_mmio_reg_offset(OASTARTTRIG1) && addr <= i915_mmio_reg_offset(OASTARTTRIG8)) || @@ -3157,7 +3147,7 @@ static bool gen7_is_valid_b_counter_addr(struct drm_i915_private *dev_priv, u32 addr <= i915_mmio_reg_offset(OACEC7_1)); } -static bool gen7_is_valid_mux_addr(struct drm_i915_private *dev_priv, u32 addr) +static bool gen7_is_valid_mux_addr(struct i915_perf *perf, u32 addr) { return addr == i915_mmio_reg_offset(HALF_SLICE_CHICKEN2) || (addr >= i915_mmio_reg_offset(MICRO_BP0_0) && @@ -3168,34 +3158,34 @@ static bool gen7_is_valid_mux_addr(struct drm_i915_private *dev_priv, u32 addr) addr <= i915_mmio_reg_offset(OA_PERFMATRIX_HI)); } -static bool gen8_is_valid_mux_addr(struct drm_i915_private *dev_priv, u32 addr) +static bool gen8_is_valid_mux_addr(struct i915_perf *perf, u32 addr) { - return gen7_is_valid_mux_addr(dev_priv, addr) || + return gen7_is_valid_mux_addr(perf, addr) || addr == i915_mmio_reg_offset(WAIT_FOR_RC6_EXIT) || (addr >= i915_mmio_reg_offset(RPM_CONFIG0) && addr <= i915_mmio_reg_offset(NOA_CONFIG(8))); } -static bool gen10_is_valid_mux_addr(struct drm_i915_private *dev_priv, u32 addr) +static bool gen10_is_valid_mux_addr(struct i915_perf *perf, u32 addr) { - return gen8_is_valid_mux_addr(dev_priv, addr) || + return gen8_is_valid_mux_addr(perf, addr) || addr == i915_mmio_reg_offset(GEN10_NOA_WRITE_HIGH) || (addr >= i915_mmio_reg_offset(OA_PERFCNT3_LO) && addr <= i915_mmio_reg_offset(OA_PERFCNT4_HI)); } -static bool hsw_is_valid_mux_addr(struct drm_i915_private *dev_priv, u32 addr) +static bool hsw_is_valid_mux_addr(struct i915_perf *perf, u32 addr) { - return gen7_is_valid_mux_addr(dev_priv, addr) || + return gen7_is_valid_mux_addr(perf, addr) || (addr >= 0x25100 && addr <= 0x2FF90) || (addr >= i915_mmio_reg_offset(HSW_MBVID2_NOA0) && addr <= i915_mmio_reg_offset(HSW_MBVID2_NOA9)) || addr == i915_mmio_reg_offset(HSW_MBVID2_MISR0); } -static bool chv_is_valid_mux_addr(struct drm_i915_private *dev_priv, u32 addr) +static bool chv_is_valid_mux_addr(struct i915_perf *perf, u32 addr) { - return gen7_is_valid_mux_addr(dev_priv, addr) || + return gen7_is_valid_mux_addr(perf, addr) || (addr >= 0x182300 && addr <= 0x1823A4); } @@ -3218,8 +3208,8 @@ static u32 mask_reg_value(u32 reg, u32 val) return val; } -static struct i915_oa_reg *alloc_oa_regs(struct drm_i915_private *dev_priv, - bool (*is_valid)(struct drm_i915_private *dev_priv, u32 addr), +static struct i915_oa_reg *alloc_oa_regs(struct i915_perf *perf, + bool (*is_valid)(struct i915_perf *perf, u32 addr), u32 __user *regs, u32 n_regs) { @@ -3249,7 +3239,7 @@ static struct i915_oa_reg *alloc_oa_regs(struct drm_i915_private *dev_priv, if (err) goto addr_err; - if (!is_valid(dev_priv, addr)) { + if (!is_valid(perf, addr)) { DRM_DEBUG("Invalid oa_reg address: %X\n", addr); err = -EINVAL; goto addr_err; @@ -3282,7 +3272,7 @@ static ssize_t show_dynamic_id(struct device *dev, return sprintf(buf, "%d\n", oa_config->id); } -static int create_dynamic_oa_sysfs_entry(struct drm_i915_private *dev_priv, +static int create_dynamic_oa_sysfs_entry(struct i915_perf *perf, struct i915_oa_config *oa_config) { sysfs_attr_init(&oa_config->sysfs_metric_id.attr); @@ -3297,7 +3287,7 @@ static int create_dynamic_oa_sysfs_entry(struct drm_i915_private *dev_priv, oa_config->sysfs_metric.name = oa_config->uuid; oa_config->sysfs_metric.attrs = oa_config->attrs; - return sysfs_create_group(dev_priv->perf.metrics_kobj, + return sysfs_create_group(perf->metrics_kobj, &oa_config->sysfs_metric); } @@ -3317,17 +3307,17 @@ static int create_dynamic_oa_sysfs_entry(struct drm_i915_private *dev_priv, int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { - struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_perf *perf = &to_i915(dev)->perf; struct drm_i915_perf_oa_config *args = data; struct i915_oa_config *oa_config, *tmp; int err, id; - if (!dev_priv->perf.initialized) { + if (!perf->i915) { DRM_DEBUG("i915 perf interface not available for this system\n"); return -ENOTSUPP; } - if (!dev_priv->perf.metrics_kobj) { + if (!perf->metrics_kobj) { DRM_DEBUG("OA metrics weren't advertised via sysfs\n"); return -EINVAL; } @@ -3365,8 +3355,8 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, oa_config->mux_regs_len = args->n_mux_regs; oa_config->mux_regs = - alloc_oa_regs(dev_priv, - dev_priv->perf.ops.is_valid_mux_reg, + alloc_oa_regs(perf, + perf->ops.is_valid_mux_reg, u64_to_user_ptr(args->mux_regs_ptr), args->n_mux_regs); @@ -3378,8 +3368,8 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, oa_config->b_counter_regs_len = args->n_boolean_regs; oa_config->b_counter_regs = - alloc_oa_regs(dev_priv, - dev_priv->perf.ops.is_valid_b_counter_reg, + alloc_oa_regs(perf, + perf->ops.is_valid_b_counter_reg, u64_to_user_ptr(args->boolean_regs_ptr), args->n_boolean_regs); @@ -3389,7 +3379,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, goto reg_err; } - if (INTEL_GEN(dev_priv) < 8) { + if (INTEL_GEN(perf->i915) < 8) { if (args->n_flex_regs != 0) { err = -EINVAL; goto reg_err; @@ -3397,8 +3387,8 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, } else { oa_config->flex_regs_len = args->n_flex_regs; oa_config->flex_regs = - alloc_oa_regs(dev_priv, - dev_priv->perf.ops.is_valid_flex_reg, + alloc_oa_regs(perf, + perf->ops.is_valid_flex_reg, u64_to_user_ptr(args->flex_regs_ptr), args->n_flex_regs); @@ -3409,14 +3399,14 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, } } - err = mutex_lock_interruptible(&dev_priv->perf.metrics_lock); + err = mutex_lock_interruptible(&perf->metrics_lock); if (err) goto reg_err; /* We shouldn't have too many configs, so this iteration shouldn't be * too costly. */ - idr_for_each_entry(&dev_priv->perf.metrics_idr, tmp, id) { + idr_for_each_entry(&perf->metrics_idr, tmp, id) { if (!strcmp(tmp->uuid, oa_config->uuid)) { DRM_DEBUG("OA config already exists with this uuid\n"); err = -EADDRINUSE; @@ -3424,14 +3414,14 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, } } - err = create_dynamic_oa_sysfs_entry(dev_priv, oa_config); + err = create_dynamic_oa_sysfs_entry(perf, oa_config); if (err) { DRM_DEBUG("Failed to create sysfs entry for OA config\n"); goto sysfs_err; } /* Config id 0 is invalid, id 1 for kernel stored test config. */ - oa_config->id = idr_alloc(&dev_priv->perf.metrics_idr, + oa_config->id = idr_alloc(&perf->metrics_idr, oa_config, 2, 0, GFP_KERNEL); if (oa_config->id < 0) { @@ -3440,16 +3430,16 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, goto sysfs_err; } - mutex_unlock(&dev_priv->perf.metrics_lock); + mutex_unlock(&perf->metrics_lock); DRM_DEBUG("Added config %s id=%i\n", oa_config->uuid, oa_config->id); return oa_config->id; sysfs_err: - mutex_unlock(&dev_priv->perf.metrics_lock); + mutex_unlock(&perf->metrics_lock); reg_err: - put_oa_config(dev_priv, oa_config); + put_oa_config(oa_config); DRM_DEBUG("Failed to add new OA config\n"); return err; } @@ -3468,12 +3458,12 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { - struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_perf *perf = &to_i915(dev)->perf; u64 *arg = data; struct i915_oa_config *oa_config; int ret; - if (!dev_priv->perf.initialized) { + if (!perf->i915) { DRM_DEBUG("i915 perf interface not available for this system\n"); return -ENOTSUPP; } @@ -3483,11 +3473,11 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data, return -EACCES; } - ret = mutex_lock_interruptible(&dev_priv->perf.metrics_lock); + ret = mutex_lock_interruptible(&perf->metrics_lock); if (ret) goto lock_err; - oa_config = idr_find(&dev_priv->perf.metrics_idr, *arg); + oa_config = idr_find(&perf->metrics_idr, *arg); if (!oa_config) { DRM_DEBUG("Failed to remove unknown OA config\n"); ret = -ENOENT; @@ -3496,17 +3486,17 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data, GEM_BUG_ON(*arg != oa_config->id); - sysfs_remove_group(dev_priv->perf.metrics_kobj, + sysfs_remove_group(perf->metrics_kobj, &oa_config->sysfs_metric); - idr_remove(&dev_priv->perf.metrics_idr, *arg); + idr_remove(&perf->metrics_idr, *arg); DRM_DEBUG("Removed config %s id=%i\n", oa_config->uuid, oa_config->id); - put_oa_config(dev_priv, oa_config); + put_oa_config(oa_config); config_err: - mutex_unlock(&dev_priv->perf.metrics_lock); + mutex_unlock(&perf->metrics_lock); lock_err: return ret; } @@ -3555,103 +3545,104 @@ static struct ctl_table dev_root[] = { /** * i915_perf_init - initialize i915-perf state on module load - * @dev_priv: i915 device instance + * @i915: i915 device instance * * Initializes i915-perf state without exposing anything to userspace. * * Note: i915-perf initialization is split into an 'init' and 'register' * phase with the i915_perf_register() exposing state to userspace. */ -void i915_perf_init(struct drm_i915_private *dev_priv) -{ - if (IS_HASWELL(dev_priv)) { - dev_priv->perf.ops.is_valid_b_counter_reg = - gen7_is_valid_b_counter_addr; - dev_priv->perf.ops.is_valid_mux_reg = - hsw_is_valid_mux_addr; - dev_priv->perf.ops.is_valid_flex_reg = NULL; - dev_priv->perf.ops.enable_metric_set = hsw_enable_metric_set; - dev_priv->perf.ops.disable_metric_set = hsw_disable_metric_set; - dev_priv->perf.ops.oa_enable = gen7_oa_enable; - dev_priv->perf.ops.oa_disable = gen7_oa_disable; - dev_priv->perf.ops.read = gen7_oa_read; - dev_priv->perf.ops.oa_hw_tail_read = - gen7_oa_hw_tail_read; - - dev_priv->perf.oa_formats = hsw_oa_formats; - } else if (HAS_LOGICAL_RING_CONTEXTS(dev_priv)) { +void i915_perf_init(struct drm_i915_private *i915) +{ + struct i915_perf *perf = &i915->perf; + + /* XXX const struct i915_perf_ops! */ + + if (IS_HASWELL(i915)) { + perf->ops.is_valid_b_counter_reg = gen7_is_valid_b_counter_addr; + perf->ops.is_valid_mux_reg = hsw_is_valid_mux_addr; + perf->ops.is_valid_flex_reg = NULL; + perf->ops.enable_metric_set = hsw_enable_metric_set; + perf->ops.disable_metric_set = hsw_disable_metric_set; + perf->ops.oa_enable = gen7_oa_enable; + perf->ops.oa_disable = gen7_oa_disable; + perf->ops.read = gen7_oa_read; + perf->ops.oa_hw_tail_read = gen7_oa_hw_tail_read; + + perf->oa_formats = hsw_oa_formats; + } else if (HAS_LOGICAL_RING_CONTEXTS(i915)) { /* Note: that although we could theoretically also support the * legacy ringbuffer mode on BDW (and earlier iterations of * this driver, before upstreaming did this) it didn't seem * worth the complexity to maintain now that BDW+ enable * execlist mode by default. */ - dev_priv->perf.oa_formats = gen8_plus_oa_formats; + perf->oa_formats = gen8_plus_oa_formats; - dev_priv->perf.ops.oa_enable = gen8_oa_enable; - dev_priv->perf.ops.oa_disable = gen8_oa_disable; - dev_priv->perf.ops.read = gen8_oa_read; - dev_priv->perf.ops.oa_hw_tail_read = gen8_oa_hw_tail_read; + perf->ops.oa_enable = gen8_oa_enable; + perf->ops.oa_disable = gen8_oa_disable; + perf->ops.read = gen8_oa_read; + perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read; - if (IS_GEN_RANGE(dev_priv, 8, 9)) { - dev_priv->perf.ops.is_valid_b_counter_reg = + if (IS_GEN_RANGE(i915, 8, 9)) { + perf->ops.is_valid_b_counter_reg = gen7_is_valid_b_counter_addr; - dev_priv->perf.ops.is_valid_mux_reg = + perf->ops.is_valid_mux_reg = gen8_is_valid_mux_addr; - dev_priv->perf.ops.is_valid_flex_reg = + perf->ops.is_valid_flex_reg = gen8_is_valid_flex_addr; - if (IS_CHERRYVIEW(dev_priv)) { - dev_priv->perf.ops.is_valid_mux_reg = + if (IS_CHERRYVIEW(i915)) { + perf->ops.is_valid_mux_reg = chv_is_valid_mux_addr; } - dev_priv->perf.ops.enable_metric_set = gen8_enable_metric_set; - dev_priv->perf.ops.disable_metric_set = gen8_disable_metric_set; + perf->ops.enable_metric_set = gen8_enable_metric_set; + perf->ops.disable_metric_set = gen8_disable_metric_set; - if (IS_GEN(dev_priv, 8)) { - dev_priv->perf.ctx_oactxctrl_offset = 0x120; - dev_priv->perf.ctx_flexeu0_offset = 0x2ce; + if (IS_GEN(i915, 8)) { + perf->ctx_oactxctrl_offset = 0x120; + perf->ctx_flexeu0_offset = 0x2ce; - dev_priv->perf.gen8_valid_ctx_bit = BIT(25); + perf->gen8_valid_ctx_bit = BIT(25); } else { - dev_priv->perf.ctx_oactxctrl_offset = 0x128; - dev_priv->perf.ctx_flexeu0_offset = 0x3de; + perf->ctx_oactxctrl_offset = 0x128; + perf->ctx_flexeu0_offset = 0x3de; - dev_priv->perf.gen8_valid_ctx_bit = BIT(16); + perf->gen8_valid_ctx_bit = BIT(16); } - } else if (IS_GEN_RANGE(dev_priv, 10, 11)) { - dev_priv->perf.ops.is_valid_b_counter_reg = + } else if (IS_GEN_RANGE(i915, 10, 11)) { + perf->ops.is_valid_b_counter_reg = gen7_is_valid_b_counter_addr; - dev_priv->perf.ops.is_valid_mux_reg = + perf->ops.is_valid_mux_reg = gen10_is_valid_mux_addr; - dev_priv->perf.ops.is_valid_flex_reg = + perf->ops.is_valid_flex_reg = gen8_is_valid_flex_addr; - dev_priv->perf.ops.enable_metric_set = gen8_enable_metric_set; - dev_priv->perf.ops.disable_metric_set = gen10_disable_metric_set; + perf->ops.enable_metric_set = gen8_enable_metric_set; + perf->ops.disable_metric_set = gen10_disable_metric_set; - if (IS_GEN(dev_priv, 10)) { - dev_priv->perf.ctx_oactxctrl_offset = 0x128; - dev_priv->perf.ctx_flexeu0_offset = 0x3de; + if (IS_GEN(i915, 10)) { + perf->ctx_oactxctrl_offset = 0x128; + perf->ctx_flexeu0_offset = 0x3de; } else { - dev_priv->perf.ctx_oactxctrl_offset = 0x124; - dev_priv->perf.ctx_flexeu0_offset = 0x78e; + perf->ctx_oactxctrl_offset = 0x124; + perf->ctx_flexeu0_offset = 0x78e; } - dev_priv->perf.gen8_valid_ctx_bit = BIT(16); + perf->gen8_valid_ctx_bit = BIT(16); } } - if (dev_priv->perf.ops.enable_metric_set) { - INIT_LIST_HEAD(&dev_priv->perf.streams); - mutex_init(&dev_priv->perf.lock); + if (perf->ops.enable_metric_set) { + INIT_LIST_HEAD(&perf->streams); + mutex_init(&perf->lock); oa_sample_rate_hard_limit = 1000 * - (RUNTIME_INFO(dev_priv)->cs_timestamp_frequency_khz / 2); - dev_priv->perf.sysctl_header = register_sysctl_table(dev_root); + (RUNTIME_INFO(i915)->cs_timestamp_frequency_khz / 2); + perf->sysctl_header = register_sysctl_table(dev_root); - mutex_init(&dev_priv->perf.metrics_lock); - idr_init(&dev_priv->perf.metrics_idr); + mutex_init(&perf->metrics_lock); + idr_init(&perf->metrics_idr); /* We set up some ratelimit state to potentially throttle any * _NOTES about spurious, invalid OA reports which we don't @@ -3663,44 +3654,40 @@ void i915_perf_init(struct drm_i915_private *dev_priv) * * Using the same limiting factors as printk_ratelimit() */ - ratelimit_state_init(&dev_priv->perf.spurious_report_rs, - 5 * HZ, 10); + ratelimit_state_init(&perf->spurious_report_rs, 5 * HZ, 10); /* Since we use a DRM_NOTE for spurious reports it would be * inconsistent to let __ratelimit() automatically print a * warning for throttling. */ - ratelimit_set_flags(&dev_priv->perf.spurious_report_rs, + ratelimit_set_flags(&perf->spurious_report_rs, RATELIMIT_MSG_ON_RELEASE); - dev_priv->perf.initialized = true; + perf->i915 = i915; } } static int destroy_config(int id, void *p, void *data) { - struct drm_i915_private *dev_priv = data; - struct i915_oa_config *oa_config = p; - - put_oa_config(dev_priv, oa_config); - + put_oa_config(p); return 0; } /** * i915_perf_fini - Counter part to i915_perf_init() - * @dev_priv: i915 device instance + * @i915: i915 device instance */ -void i915_perf_fini(struct drm_i915_private *dev_priv) +void i915_perf_fini(struct drm_i915_private *i915) { - if (!dev_priv->perf.initialized) - return; + struct i915_perf *perf = &i915->perf; - idr_for_each(&dev_priv->perf.metrics_idr, destroy_config, dev_priv); - idr_destroy(&dev_priv->perf.metrics_idr); + if (!perf->i915) + return; - unregister_sysctl_table(dev_priv->perf.sysctl_header); + idr_for_each(&perf->metrics_idr, destroy_config, perf); + idr_destroy(&perf->metrics_idr); - memset(&dev_priv->perf.ops, 0, sizeof(dev_priv->perf.ops)); + unregister_sysctl_table(perf->sysctl_header); - dev_priv->perf.initialized = false; + memset(&perf->ops, 0, sizeof(perf->ops)); + perf->i915 = NULL; } diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h index edcab2df74fb..3c6246064a0b 100644 --- a/drivers/gpu/drm/i915/i915_perf_types.h +++ b/drivers/gpu/drm/i915/i915_perf_types.h @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -124,9 +125,14 @@ struct i915_perf_stream_ops { */ struct i915_perf_stream { /** - * @dev_priv: i915 drm device + * @perf: i915_perf backpointer */ - struct drm_i915_private *dev_priv; + struct i915_perf *perf; + + /** + * @gt: intel_gt container + */ + struct intel_gt *gt; /** * @link: Links the stream into ``&drm_i915_private->streams`` @@ -266,20 +272,19 @@ struct i915_oa_ops { * @is_valid_b_counter_reg: Validates register's address for * programming boolean counters for a particular platform. */ - bool (*is_valid_b_counter_reg)(struct drm_i915_private *dev_priv, - u32 addr); + bool (*is_valid_b_counter_reg)(struct i915_perf *perf, u32 addr); /** * @is_valid_mux_reg: Validates register's address for programming mux * for a particular platform. */ - bool (*is_valid_mux_reg)(struct drm_i915_private *dev_priv, u32 addr); + bool (*is_valid_mux_reg)(struct i915_perf *perf, u32 addr); /** * @is_valid_flex_reg: Validates register's address for programming * flex EU filtering for a particular platform. */ - bool (*is_valid_flex_reg)(struct drm_i915_private *dev_priv, u32 addr); + bool (*is_valid_flex_reg)(struct i915_perf *perf, u32 addr); /** * @enable_metric_set: Selects and applies any MUX configuration to set @@ -324,4 +329,60 @@ struct i915_oa_ops { u32 (*oa_hw_tail_read)(struct i915_perf_stream *stream); }; +struct i915_perf { + struct drm_i915_private *i915; + + struct kobject *metrics_kobj; + struct ctl_table_header *sysctl_header; + + /* + * Lock associated with adding/modifying/removing OA configs + * in dev_priv->perf.metrics_idr. + */ + struct mutex metrics_lock; + + /* + * List of dynamic configurations, you need to hold + * dev_priv->perf.metrics_lock to access it. + */ + struct idr metrics_idr; + + /* + * Lock associated with anything below within this structure + * except exclusive_stream. + */ + struct mutex lock; + struct list_head streams; + + /* + * The stream currently using the OA unit. If accessed + * outside a syscall associated to its file + * descriptor, you need to hold + * dev_priv->drm.struct_mutex. + */ + struct i915_perf_stream *exclusive_stream; + + /** + * For rate limiting any notifications of spurious + * invalid OA reports + */ + struct ratelimit_state spurious_report_rs; + + struct i915_oa_config test_config; + + u32 gen7_latched_oastatus1; + u32 ctx_oactxctrl_offset; + u32 ctx_flexeu0_offset; + + /** + * The RPT_ID/reason field for Gen8+ includes a bit + * to determine if the CTX ID in the report is valid + * but the specific bit differs between Gen 8 and 9 + */ + u32 gen8_valid_ctx_bit; + + struct i915_oa_ops ops; + const struct i915_oa_format *oa_formats; +}; + #endif /* _I915_PERF_TYPES_H_ */ From patchwork Tue May 5 08:32:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283328 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GXzV3dGFz9sP7; Tue, 5 May 2020 18:32:54 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt0R-0003OP-Sp; Tue, 05 May 2020 08:32:48 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0N-0003L0-7I for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:43 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0M-0004sj-Rf for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:43 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 05/26] drm/i915/perf: drop list of streams Date: Tue, 5 May 2020 11:32:18 +0300 Message-Id: <20200505083239.2428948-6-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 At some point in time there was the idea that we could have multiple stream from the same piece of HW but that never materialized and given the hard time we already have making everything work with the submission side, there is no real point having this list of 1 element around. Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191008140111.5437-1-chris@chris-wilson.co.uk (cherry picked from commit 23b9e41a3dbdad16266b16d0c16ac629f0b6d732) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 16 +--------------- drivers/gpu/drm/i915/i915_perf_types.h | 6 ------ 2 files changed, 1 insertion(+), 21 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index bfe58cfc3e67..0872a6db6258 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1431,9 +1431,6 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream) */ memset(stream->oa_buffer.vaddr, 0, OA_BUFFER_SIZE); - /* Maybe make ->pollin per-stream state if we support multiple - * concurrent streams in the future. - */ stream->pollin = false; } @@ -1490,10 +1487,6 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream) */ memset(stream->oa_buffer.vaddr, 0, OA_BUFFER_SIZE); - /* - * Maybe make ->pollin per-stream state if we support multiple - * concurrent streams in the future. - */ stream->pollin = false; } @@ -2619,8 +2612,6 @@ static void i915_perf_destroy_locked(struct i915_perf_stream *stream) if (stream->ops->destroy) stream->ops->destroy(stream); - list_del(&stream->link); - if (stream->ctx) i915_gem_context_put(stream->ctx); @@ -2770,8 +2761,6 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf, goto err_flags; } - list_add(&stream->link, &perf->streams); - if (param->flags & I915_PERF_FLAG_FD_CLOEXEC) f_flags |= O_CLOEXEC; if (param->flags & I915_PERF_FLAG_FD_NONBLOCK) @@ -2780,7 +2769,7 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf, stream_fd = anon_inode_getfd("[i915_perf]", &fops, stream, f_flags); if (stream_fd < 0) { ret = stream_fd; - goto err_open; + goto err_flags; } if (!(param->flags & I915_PERF_FLAG_DISABLED)) @@ -2793,8 +2782,6 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf, return stream_fd; -err_open: - list_del(&stream->link); err_flags: if (stream->ops->destroy) stream->ops->destroy(stream); @@ -3634,7 +3621,6 @@ void i915_perf_init(struct drm_i915_private *i915) } if (perf->ops.enable_metric_set) { - INIT_LIST_HEAD(&perf->streams); mutex_init(&perf->lock); oa_sample_rate_hard_limit = 1000 * diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h index 3c6246064a0b..2d17059d32ee 100644 --- a/drivers/gpu/drm/i915/i915_perf_types.h +++ b/drivers/gpu/drm/i915/i915_perf_types.h @@ -134,11 +134,6 @@ struct i915_perf_stream { */ struct intel_gt *gt; - /** - * @link: Links the stream into ``&drm_i915_private->streams`` - */ - struct list_head link; - /** * @wakeref: As we keep the device awake while the perf stream is * active, we track our runtime pm reference for later release. @@ -352,7 +347,6 @@ struct i915_perf { * except exclusive_stream. */ struct mutex lock; - struct list_head streams; /* * The stream currently using the OA unit. If accessed From patchwork Tue May 5 08:32:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283329 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GXzZ1865z9sP7; Tue, 5 May 2020 18:32:58 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt0V-0003RZ-PY; Tue, 05 May 2020 08:32:51 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0N-0003LN-Tv for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:43 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0N-0004sj-7v for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:43 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 06/26] drm/i915/perf: store the associated engine of a stream Date: Tue, 5 May 2020 11:32:19 +0300 Message-Id: <20200505083239.2428948-7-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 We'll use this information later to verify that a client trying to reconfigure the stream does so on the right engine. For now, we want to pull the knowledge of which engine we use into a central property. Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191010150520.26488-1-chris@chris-wilson.co.uk (cherry picked from commit 9a61363a6310dda769b8c7601fc9de8ed81e6bc8) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 30 ++++++++++++++++++++++---- drivers/gpu/drm/i915/i915_perf_types.h | 5 +++++ 2 files changed, 31 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 0872a6db6258..f22da8c1b556 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -197,6 +197,7 @@ #include "gem/i915_gem_context.h" #include "gem/i915_gem_pm.h" +#include "gt/intel_engine_user.h" #include "gt/intel_lrc_reg.h" #include "i915_drv.h" @@ -347,6 +348,7 @@ static const struct i915_oa_format gen8_plus_oa_formats[I915_OA_FORMAT_MAX] = { * @oa_format: An OA unit HW report format * @oa_periodic: Whether to enable periodic OA unit sampling * @oa_period_exponent: The OA unit sampling period is derived from this + * @engine: The engine (typically rcs0) being monitored by the OA unit * * As read_properties_unlocked() enumerates and validates the properties given * to open a stream of metrics the configuration is built up in the structure @@ -363,6 +365,8 @@ struct perf_open_properties { int oa_format; bool oa_periodic; int oa_period_exponent; + + struct intel_engine_cs *engine; }; static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer); @@ -1210,7 +1214,7 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream) return ERR_PTR(err); for_each_gem_engine(ce, i915_gem_context_lock_engines(ctx), it) { - if (ce->engine->class != RENDER_CLASS) + if (ce->engine != stream->engine) /* first match! */ continue; /* @@ -2152,7 +2156,13 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, int format_size; int ret; - /* If the sysfs metrics/ directory wasn't registered for some + if (!props->engine) { + DRM_DEBUG("OA engine not specified\n"); + return -EINVAL; + } + + /* + * If the sysfs metrics/ directory wasn't registered for some * reason then don't let userspace try their luck with config * IDs */ @@ -2171,7 +2181,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, return -ENODEV; } - /* To avoid the complexity of having to accurately filter + /* + * To avoid the complexity of having to accurately filter * counter reports and marshal to the appropriate client * we currently only allow exclusive access */ @@ -2185,6 +2196,9 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, return -EINVAL; } + stream->engine = props->engine; + stream->gt = stream->engine->gt; + stream->sample_size = sizeof(struct drm_i915_perf_record_header); format_size = perf->oa_formats[props->oa_format].size; @@ -2745,7 +2759,6 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf, } stream->perf = perf; - stream->gt = &perf->i915->gt; stream->ctx = specific_ctx; ret = i915_oa_stream_init(stream, param, props); @@ -2830,6 +2843,15 @@ static int read_properties_unlocked(struct i915_perf *perf, return -EINVAL; } + /* At the moment we only support using i915-perf on the RCS. */ + props->engine = intel_engine_lookup_user(perf->i915, + I915_ENGINE_CLASS_RENDER, + 0); + if (!props->engine) { + DRM_DEBUG("No RENDER-capable engines\n"); + return -EINVAL; + } + /* Considering that ID = 0 is reserved and assuming that we don't * (currently) expect any configurations to ever specify duplicate * values for a particular property ID then the last _PROP_MAX value is diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h index 2d17059d32ee..82cd3b295037 100644 --- a/drivers/gpu/drm/i915/i915_perf_types.h +++ b/drivers/gpu/drm/i915/i915_perf_types.h @@ -140,6 +140,11 @@ struct i915_perf_stream { */ intel_wakeref_t wakeref; + /** + * @engine: Engine associated with this performance stream. + */ + struct intel_engine_cs *engine; + /** * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*` * properties given when opening a stream, representing the contents From patchwork Tue May 5 08:32:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283330 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GXzd1FQNz9sSk; Tue, 5 May 2020 18:33:01 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt0Y-0003T0-CO; Tue, 05 May 2020 08:32:54 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0O-0003Lh-Cb for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:44 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0N-0004sj-KK for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:43 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 07/26] drm/i915/perf: Store shortcut to intel_uncore Date: Tue, 5 May 2020 11:32:20 +0300 Message-Id: <20200505083239.2428948-8-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Chris Wilson BugLink: https://bugs.launchpad.net/bugs/1871957 Now that we have the engine stored in i915_perf, we have a means of accessing intel_gt should we require it. However, we are currently only using the intel_gt to find the right intel_uncore, so replace our i915_perf.gt pointer with the more useful i915_perf.uncore. Signed-off-by: Chris Wilson Cc: Lionel Landwerlin Reviewed-by: Lionel Landwerlin Link: https://patchwork.freedesktop.org/patch/msgid/20191010150520.26488-2-chris@chris-wilson.co.uk (cherry picked from commit 52111c4628a29973d8be266f45b464318912b199) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 48 +++++++++++++------------- drivers/gpu/drm/i915/i915_perf_types.h | 4 +-- 2 files changed, 26 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index f22da8c1b556..8c8e6c6ab4e0 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -419,14 +419,14 @@ static int get_oa_config(struct i915_perf *perf, static u32 gen8_oa_hw_tail_read(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; return intel_uncore_read(uncore, GEN8_OATAILPTR) & GEN8_OATAILPTR_MASK; } static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; u32 oastatus1 = intel_uncore_read(uncore, GEN7_OASTATUS1); return oastatus1 & GEN7_OASTATUS1_TAIL_MASK; @@ -656,7 +656,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, size_t count, size_t *offset) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; int report_size = stream->oa_buffer.format_size; u8 *oa_buf_base = stream->oa_buffer.vaddr; u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); @@ -866,7 +866,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream, size_t count, size_t *offset) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; u32 oastatus; int ret; @@ -945,7 +945,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, size_t count, size_t *offset) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; int report_size = stream->oa_buffer.format_size; u8 *oa_buf_base = stream->oa_buffer.vaddr; u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); @@ -1077,7 +1077,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream, size_t count, size_t *offset) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; u32 oastatus1; int ret; @@ -1376,8 +1376,8 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream) free_oa_buffer(stream); - intel_uncore_forcewake_put(stream->gt->uncore, FORCEWAKE_ALL); - intel_runtime_pm_put(stream->gt->uncore->rpm, stream->wakeref); + intel_uncore_forcewake_put(stream->uncore, FORCEWAKE_ALL); + intel_runtime_pm_put(stream->uncore->rpm, stream->wakeref); if (stream->ctx) oa_put_render_ctx_id(stream); @@ -1392,7 +1392,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream) static void gen7_init_oa_buffer(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); unsigned long flags; @@ -1440,7 +1440,7 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream) static void gen8_init_oa_buffer(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); unsigned long flags; @@ -1597,7 +1597,7 @@ static void delay_after_mux(void) static int hsw_enable_metric_set(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; const struct i915_oa_config *oa_config = stream->oa_config; /* @@ -1626,7 +1626,7 @@ static int hsw_enable_metric_set(struct i915_perf_stream *stream) static void hsw_disable_metric_set(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; intel_uncore_rmw(uncore, GEN6_UCGCTL1, GEN6_CSUNIT_CLOCK_GATE_DISABLE, 0); @@ -1936,7 +1936,7 @@ static int gen8_configure_all_contexts(struct i915_perf_stream *stream, static int gen8_enable_metric_set(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; const struct i915_oa_config *oa_config = stream->oa_config; int ret; @@ -1989,7 +1989,7 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream) static void gen8_disable_metric_set(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; /* Reset all contexts' slices/subslices configurations. */ gen8_configure_all_contexts(stream, NULL); @@ -1999,7 +1999,7 @@ static void gen8_disable_metric_set(struct i915_perf_stream *stream) static void gen10_disable_metric_set(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; /* Reset all contexts' slices/subslices configurations. */ gen8_configure_all_contexts(stream, NULL); @@ -2010,7 +2010,7 @@ static void gen10_disable_metric_set(struct i915_perf_stream *stream) static void gen7_oa_enable(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; struct i915_gem_context *ctx = stream->ctx; u32 ctx_id = stream->specific_ctx_id; bool periodic = stream->periodic; @@ -2040,7 +2040,7 @@ static void gen7_oa_enable(struct i915_perf_stream *stream) static void gen8_oa_enable(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; u32 report_format = stream->oa_buffer.format; /* @@ -2085,7 +2085,7 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream) static void gen7_oa_disable(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; intel_uncore_write(uncore, GEN7_OACONTROL, 0); if (intel_wait_for_register(uncore, @@ -2096,7 +2096,7 @@ static void gen7_oa_disable(struct i915_perf_stream *stream) static void gen8_oa_disable(struct i915_perf_stream *stream) { - struct intel_uncore *uncore = stream->gt->uncore; + struct intel_uncore *uncore = stream->uncore; intel_uncore_write(uncore, GEN8_OACONTROL, 0); if (intel_wait_for_register(uncore, @@ -2197,7 +2197,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, } stream->engine = props->engine; - stream->gt = stream->engine->gt; + stream->uncore = stream->engine->gt->uncore; stream->sample_size = sizeof(struct drm_i915_perf_record_header); @@ -2243,8 +2243,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, * In our case we are expecting that taking pm + FORCEWAKE * references will effectively disable RC6. */ - stream->wakeref = intel_runtime_pm_get(stream->gt->uncore->rpm); - intel_uncore_forcewake_get(stream->gt->uncore, FORCEWAKE_ALL); + stream->wakeref = intel_runtime_pm_get(stream->uncore->rpm); + intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL); ret = alloc_oa_buffer(stream); if (ret) @@ -2284,8 +2284,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, err_oa_buf_alloc: put_oa_config(stream->oa_config); - intel_uncore_forcewake_put(stream->gt->uncore, FORCEWAKE_ALL); - intel_runtime_pm_put(stream->gt->uncore->rpm, stream->wakeref); + intel_uncore_forcewake_put(stream->uncore, FORCEWAKE_ALL); + intel_runtime_pm_put(stream->uncore->rpm, stream->wakeref); err_config: if (stream->ctx) diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h index 82cd3b295037..a91ae2d1a543 100644 --- a/drivers/gpu/drm/i915/i915_perf_types.h +++ b/drivers/gpu/drm/i915/i915_perf_types.h @@ -130,9 +130,9 @@ struct i915_perf_stream { struct i915_perf *perf; /** - * @gt: intel_gt container + * @uncore: mmio access path */ - struct intel_gt *gt; + struct intel_uncore *uncore; /** * @wakeref: As we keep the device awake while the perf stream is From patchwork Tue May 5 08:32:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283331 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GXzg06Mlz9sT0; Tue, 5 May 2020 18:33:03 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt0b-0003Ur-7S; Tue, 05 May 2020 08:32:57 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0O-0003MJ-Kz for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:44 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0O-0004sj-0d for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:44 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 08/26] drm/i915/perf: Replace global wakeref tracking with engine-pm Date: Tue, 5 May 2020 11:32:21 +0300 Message-Id: <20200505083239.2428948-9-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Chris Wilson BugLink: https://bugs.launchpad.net/bugs/1871957 As we now have a specific engine to use OA on, exchange the top-level runtime-pm wakeref with the engine-pm. This still results in the same top-level runtime-pm, but with more nuances to keep the engine and its gt awake. Signed-off-by: Chris Wilson Reviewed-by: Lionel Landwerlin Link: https://patchwork.freedesktop.org/patch/msgid/20191011190325.10979-1-chris@chris-wilson.co.uk (cherry picked from commit a5efcde69b112d276416f8e6e5e9fd8ede13bfc5) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 8 ++++---- drivers/gpu/drm/i915/i915_perf_types.h | 6 ------ 2 files changed, 4 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 8c8e6c6ab4e0..e75e5e9c8c06 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -196,7 +196,7 @@ #include #include "gem/i915_gem_context.h" -#include "gem/i915_gem_pm.h" +#include "gt/intel_engine_pm.h" #include "gt/intel_engine_user.h" #include "gt/intel_lrc_reg.h" @@ -1377,7 +1377,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream) free_oa_buffer(stream); intel_uncore_forcewake_put(stream->uncore, FORCEWAKE_ALL); - intel_runtime_pm_put(stream->uncore->rpm, stream->wakeref); + intel_engine_pm_put(stream->engine); if (stream->ctx) oa_put_render_ctx_id(stream); @@ -2243,7 +2243,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, * In our case we are expecting that taking pm + FORCEWAKE * references will effectively disable RC6. */ - stream->wakeref = intel_runtime_pm_get(stream->uncore->rpm); + intel_engine_pm_get(stream->engine); intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL); ret = alloc_oa_buffer(stream); @@ -2285,7 +2285,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, put_oa_config(stream->oa_config); intel_uncore_forcewake_put(stream->uncore, FORCEWAKE_ALL); - intel_runtime_pm_put(stream->uncore->rpm, stream->wakeref); + intel_engine_pm_put(stream->engine); err_config: if (stream->ctx) diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h index a91ae2d1a543..eb8d1ebd5095 100644 --- a/drivers/gpu/drm/i915/i915_perf_types.h +++ b/drivers/gpu/drm/i915/i915_perf_types.h @@ -134,12 +134,6 @@ struct i915_perf_stream { */ struct intel_uncore *uncore; - /** - * @wakeref: As we keep the device awake while the perf stream is - * active, we track our runtime pm reference for later release. - */ - intel_wakeref_t wakeref; - /** * @engine: Engine associated with this performance stream. */ From patchwork Tue May 5 08:32:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283336 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY042s5yz9sSs; Tue, 5 May 2020 18:33:24 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt0x-0003m6-I5; Tue, 05 May 2020 08:33:19 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0O-0003MQ-O5 for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:44 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0O-0004sj-CG for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:44 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 09/26] drm/i915/perf: allow for CS OA configs to be created lazily Date: Tue, 5 May 2020 11:32:22 +0300 Message-Id: <20200505083239.2428948-10-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 Here we introduce a mechanism by which the execbuf part of the i915 driver will be able to request that a batch buffer containing the programming for a particular OA config be created. We'll execute these OA configuration buffers right before executing a set of userspace commands so that a particular user batchbuffer be executed with a given OA configuration. This mechanism essentially allows the userspace driver to go through several OA configuration without having to open/close the i915/perf stream. v2: No need for locking on object OA config object creation (Chris) Flush cpu mapping of OA config (Chris) v3: Properly deal with the perf_metric lock (Chris/Lionel) v4: Fix oa config unref/put when not found (Lionel) v5: Allocate BOs for configurations on the stream instead of globally (Lionel) v6: Fix 64bit division (Chris) v7: Store allocated config BOs into the stream (Lionel) Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-1-chris@chris-wilson.co.uk (cherry picked from commit 6a45008ab7bb5e13b543de0c141b94aaa71d8397) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 1 + drivers/gpu/drm/i915/i915_perf.c | 103 +++++++++++-------- drivers/gpu/drm/i915/i915_perf.h | 23 +++++ drivers/gpu/drm/i915/i915_perf_types.h | 23 +++-- 4 files changed, 101 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h index 86e00a2db8a4..a7f1377a54a2 100644 --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h @@ -133,6 +133,7 @@ */ #define MI_LOAD_REGISTER_IMM(x) MI_INSTR(0x22, 2*(x)-1) #define MI_LRI_FORCE_POSTED (1<<12) +#define MI_LOAD_REGISTER_IMM_MAX_REGS (126) #define MI_STORE_REGISTER_MEM MI_INSTR(0x24, 1) #define MI_STORE_REGISTER_MEM_GEN8 MI_INSTR(0x24, 2) #define MI_SRM_LRM_GLOBAL_GTT (1<<22) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index e75e5e9c8c06..a06cb0df8c9e 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -369,52 +369,52 @@ struct perf_open_properties { struct intel_engine_cs *engine; }; +struct i915_oa_config_bo { + struct llist_node node; + + struct i915_oa_config *oa_config; + struct i915_vma *vma; +}; + static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer); -static void free_oa_config(struct i915_oa_config *oa_config) +void i915_oa_config_release(struct kref *ref) { + struct i915_oa_config *oa_config = + container_of(ref, typeof(*oa_config), ref); + if (!PTR_ERR(oa_config->flex_regs)) kfree(oa_config->flex_regs); if (!PTR_ERR(oa_config->b_counter_regs)) kfree(oa_config->b_counter_regs); if (!PTR_ERR(oa_config->mux_regs)) kfree(oa_config->mux_regs); - kfree(oa_config); -} -static void put_oa_config(struct i915_oa_config *oa_config) -{ - if (!atomic_dec_and_test(&oa_config->ref_count)) - return; - - free_oa_config(oa_config); + kfree_rcu(oa_config, rcu); } -static int get_oa_config(struct i915_perf *perf, - int metrics_set, - struct i915_oa_config **out_config) +struct i915_oa_config * +i915_perf_get_oa_config(struct i915_perf *perf, int metrics_set) { - int ret; - - if (metrics_set == 1) { - *out_config = &perf->test_config; - atomic_inc(&perf->test_config.ref_count); - return 0; - } - - ret = mutex_lock_interruptible(&perf->metrics_lock); - if (ret) - return ret; + struct i915_oa_config *oa_config; - *out_config = idr_find(&perf->metrics_idr, metrics_set); - if (!*out_config) - ret = -EINVAL; + rcu_read_lock(); + if (metrics_set == 1) + oa_config = &perf->test_config; else - atomic_inc(&(*out_config)->ref_count); + oa_config = idr_find(&perf->metrics_idr, metrics_set); + if (oa_config) + oa_config = i915_oa_config_get(oa_config); + rcu_read_unlock(); - mutex_unlock(&perf->metrics_lock); + return oa_config; +} - return ret; +static void free_oa_config_bo(struct i915_oa_config_bo *oa_bo) +{ + i915_oa_config_put(oa_bo->oa_config); + i915_vma_put(oa_bo->vma); + kfree(oa_bo); } static u32 gen8_oa_hw_tail_read(struct i915_perf_stream *stream) @@ -1359,6 +1359,16 @@ free_oa_buffer(struct i915_perf_stream *stream) stream->oa_buffer.vaddr = NULL; } +static void +free_oa_configs(struct i915_perf_stream *stream) +{ + struct i915_oa_config_bo *oa_bo, *tmp; + + i915_oa_config_put(stream->oa_config); + llist_for_each_entry_safe(oa_bo, tmp, stream->oa_config_bos.first, node) + free_oa_config_bo(oa_bo); +} + static void i915_oa_stream_destroy(struct i915_perf_stream *stream) { struct i915_perf *perf = stream->perf; @@ -1382,7 +1392,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream) if (stream->ctx) oa_put_render_ctx_id(stream); - put_oa_config(stream->oa_config); + free_oa_configs(stream); if (perf->spurious_report_rs.missed) { DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n", @@ -2225,9 +2235,10 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, } } - ret = get_oa_config(perf, props->metrics_set, &stream->oa_config); - if (ret) { + stream->oa_config = i915_perf_get_oa_config(perf, props->metrics_set); + if (!stream->oa_config) { DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set); + ret = -EINVAL; goto err_config; } @@ -2265,6 +2276,9 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, mutex_unlock(&stream->perf->i915->drm.struct_mutex); + DRM_DEBUG("opening stream oa config uuid=%s\n", + stream->oa_config->uuid); + hrtimer_init(&stream->poll_check_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); stream->poll_check_timer.function = oa_poll_check_timer_cb; @@ -2282,7 +2296,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, free_oa_buffer(stream); err_oa_buf_alloc: - put_oa_config(stream->oa_config); + free_oa_configs(stream); intel_uncore_forcewake_put(stream->uncore, FORCEWAKE_ALL); intel_engine_pm_put(stream->engine); @@ -3091,7 +3105,8 @@ void i915_perf_register(struct drm_i915_private *i915) if (ret) goto sysfs_error; - atomic_set(&perf->test_config.ref_count, 1); + perf->test_config.perf = perf; + kref_init(&perf->test_config.ref); goto exit; @@ -3349,7 +3364,8 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, return -ENOMEM; } - atomic_set(&oa_config->ref_count, 1); + oa_config->perf = perf; + kref_init(&oa_config->ref); if (!uuid_is_valid(args->uuid)) { DRM_DEBUG("Invalid uuid format for OA config\n"); @@ -3448,7 +3464,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, sysfs_err: mutex_unlock(&perf->metrics_lock); reg_err: - put_oa_config(oa_config); + i915_oa_config_put(oa_config); DRM_DEBUG("Failed to add new OA config\n"); return err; } @@ -3484,13 +3500,13 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data, ret = mutex_lock_interruptible(&perf->metrics_lock); if (ret) - goto lock_err; + return ret; oa_config = idr_find(&perf->metrics_idr, *arg); if (!oa_config) { DRM_DEBUG("Failed to remove unknown OA config\n"); ret = -ENOENT; - goto config_err; + goto err_unlock; } GEM_BUG_ON(*arg != oa_config->id); @@ -3500,13 +3516,16 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data, idr_remove(&perf->metrics_idr, *arg); + mutex_unlock(&perf->metrics_lock); + DRM_DEBUG("Removed config %s id=%i\n", oa_config->uuid, oa_config->id); - put_oa_config(oa_config); + i915_oa_config_put(oa_config); + + return 0; -config_err: +err_unlock: mutex_unlock(&perf->metrics_lock); -lock_err: return ret; } @@ -3676,7 +3695,7 @@ void i915_perf_init(struct drm_i915_private *i915) static int destroy_config(int id, void *p, void *data) { - put_oa_config(p); + i915_oa_config_put(p); return 0; } diff --git a/drivers/gpu/drm/i915/i915_perf.h b/drivers/gpu/drm/i915/i915_perf.h index 676f3efceda3..5a83c95e71ce 100644 --- a/drivers/gpu/drm/i915/i915_perf.h +++ b/drivers/gpu/drm/i915/i915_perf.h @@ -6,6 +6,7 @@ #ifndef __I915_PERF_H__ #define __I915_PERF_H__ +#include #include #include "i915_perf_types.h" @@ -13,6 +14,7 @@ struct drm_device; struct drm_file; struct drm_i915_private; +struct i915_oa_config; struct intel_context; struct intel_engine_cs; @@ -31,4 +33,25 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine, struct intel_context *ce, u32 *reg_state); +struct i915_oa_config * +i915_perf_get_oa_config(struct i915_perf *perf, int metrics_set); + +static inline struct i915_oa_config * +i915_oa_config_get(struct i915_oa_config *oa_config) +{ + if (kref_get_unless_zero(&oa_config->ref)) + return oa_config; + else + return NULL; +} + +void i915_oa_config_release(struct kref *ref); +static inline void i915_oa_config_put(struct i915_oa_config *oa_config) +{ + if (!oa_config) + return; + + kref_put(&oa_config->ref, i915_oa_config_release); +} + #endif /* __I915_PERF_H__ */ diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h index eb8d1ebd5095..337cd7d2ad77 100644 --- a/drivers/gpu/drm/i915/i915_perf_types.h +++ b/drivers/gpu/drm/i915/i915_perf_types.h @@ -9,7 +9,7 @@ #include #include #include -#include +#include #include #include #include @@ -22,6 +22,7 @@ struct drm_i915_private; struct file; struct i915_gem_context; +struct i915_perf; struct i915_vma; struct intel_context; struct intel_engine_cs; @@ -37,6 +38,8 @@ struct i915_oa_reg { }; struct i915_oa_config { + struct i915_perf *perf; + char uuid[UUID_STRING_LEN + 1]; int id; @@ -51,7 +54,8 @@ struct i915_oa_config { struct attribute *attrs[2]; struct device_attribute sysfs_metric_id; - atomic_t ref_count; + struct kref ref; + struct rcu_head rcu; }; struct i915_perf_stream; @@ -177,6 +181,12 @@ struct i915_perf_stream { */ struct i915_oa_config *oa_config; + /** + * @oa_config_bos: A list of struct i915_oa_config_bo allocated lazily + * each time @oa_config changes. + */ + struct llist_head oa_config_bos; + /** * @pinned_ctx: The OA context specific information. */ @@ -331,13 +341,13 @@ struct i915_perf { /* * Lock associated with adding/modifying/removing OA configs - * in dev_priv->perf.metrics_idr. + * in perf->metrics_idr. */ struct mutex metrics_lock; /* - * List of dynamic configurations, you need to hold - * dev_priv->perf.metrics_lock to access it. + * List of dynamic configurations (struct i915_oa_config), you + * need to hold perf->metrics_lock to access it. */ struct idr metrics_idr; @@ -350,8 +360,7 @@ struct i915_perf { /* * The stream currently using the OA unit. If accessed * outside a syscall associated to its file - * descriptor, you need to hold - * dev_priv->drm.struct_mutex. + * descriptor. */ struct i915_perf_stream *exclusive_stream; From patchwork Tue May 5 08:32:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283332 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GXzh60RYz9sTC; Tue, 5 May 2020 18:33:04 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt0c-0003WR-PD; Tue, 05 May 2020 08:32:58 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0P-0003Mn-4i for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:45 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0O-0004sj-Nz for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:44 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 10/26] drm/i915: Add definitions for MI_MATH command Date: Tue, 5 May 2020 11:32:23 +0300 Message-Id: <20200505083239.2428948-11-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Michał Winiarski BugLink: https://bugs.launchpad.net/bugs/1871957 We can use it in i915 for updating parts of unmasked registers from within a batch. We're also adding Gen8+ versions of CS_GPR registers (aka MI_MATH_REG in the coprocessor). Signed-off-by: Michał Winiarski Cc: Chris Wilson Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20190926100635.9416-4-michal.winiarski@intel.com (cherry picked from commit 74b2089a105f43c33d2f2124bd03d1085fe7c9fc) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 23 ++++++++++++++++++++ drivers/gpu/drm/i915/i915_reg.h | 4 ++++ 2 files changed, 27 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h index a7f1377a54a2..fed06f057627 100644 --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h @@ -236,6 +236,29 @@ #define PIPE_CONTROL_DEPTH_CACHE_FLUSH (1<<0) #define PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */ +#define MI_MATH(x) MI_INSTR(0x1a, (x) - 1) +#define MI_MATH_INSTR(opcode, op1, op2) ((opcode) << 20 | (op1) << 10 | (op2)) +/* Opcodes for MI_MATH_INSTR */ +#define MI_MATH_NOOP MI_MATH_INSTR(0x000, 0x0, 0x0) +#define MI_MATH_LOAD(op1, op2) MI_MATH_INSTR(0x080, op1, op2) +#define MI_MATH_LOADINV(op1, op2) MI_MATH_INSTR(0x480, op1, op2) +#define MI_MATH_LOAD0(op1) MI_MATH_INSTR(0x081, op1) +#define MI_MATH_LOAD1(op1) MI_MATH_INSTR(0x481, op1) +#define MI_MATH_ADD MI_MATH_INSTR(0x100, 0x0, 0x0) +#define MI_MATH_SUB MI_MATH_INSTR(0x101, 0x0, 0x0) +#define MI_MATH_AND MI_MATH_INSTR(0x102, 0x0, 0x0) +#define MI_MATH_OR MI_MATH_INSTR(0x103, 0x0, 0x0) +#define MI_MATH_XOR MI_MATH_INSTR(0x104, 0x0, 0x0) +#define MI_MATH_STORE(op1, op2) MI_MATH_INSTR(0x180, op1, op2) +#define MI_MATH_STOREINV(op1, op2) MI_MATH_INSTR(0x580, op1, op2) +/* Registers used as operands in MI_MATH_INSTR */ +#define MI_MATH_REG(x) (x) +#define MI_MATH_REG_SRCA 0x20 +#define MI_MATH_REG_SRCB 0x21 +#define MI_MATH_REG_ACCU 0x31 +#define MI_MATH_REG_ZF 0x32 +#define MI_MATH_REG_CF 0x33 + /* * Commands used only by the command parser */ diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 7ac37a9f1262..b5eb382514ec 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -2489,6 +2489,10 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define RING_WAIT (1 << 11) /* gen3+, PRBx_CTL */ #define RING_WAIT_SEMAPHORE (1 << 10) /* gen6+ */ +/* There are 16 64-bit CS General Purpose Registers per-engine on Gen8+ */ +#define GEN8_RING_CS_GPR(base, n) _MMIO((base) + 0x600 + (n) * 8) +#define GEN8_RING_CS_GPR_UDW(base, n) _MMIO((base) + 0x600 + (n) * 8 + 4) + #define RING_FORCE_TO_NONPRIV(base, i) _MMIO(((base) + 0x4D0) + (i) * 4) #define RING_FORCE_TO_NONPRIV_ACCESS_RW (0 << 28) /* CFL+ & Gen11+ */ #define RING_FORCE_TO_NONPRIV_ACCESS_RD (1 << 28) From patchwork Tue May 5 08:32:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283333 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GXzq1Sdgz9sSk; Tue, 5 May 2020 18:33:11 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt0h-0003Zy-MV; Tue, 05 May 2020 08:33:03 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0Q-0003NG-FL for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:46 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0P-0004sj-3O for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:45 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 11/26] drm/i915/perf: implement active wait for noa configurations Date: Tue, 5 May 2020 11:32:24 +0300 Message-Id: <20200505083239.2428948-12-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 NOA configuration take some amount of time to apply. That amount of time depends on the size of the GT. There is no documented time for this. For example, past experimentations with powergating configuration changes seem to indicate a 60~70us delay. We go with 500us as default for now which should be over the required amount of time (according to HW architects). v2: Don't forget to save/restore registers used for the wait (Chris) v3: Name used CS_GPR registers (Chris) Fix compile issue due to rebase (Lionel) v4: Fix save/restore helpers (Umesh) v5: Move noa_wait from drm_i915_private to i915_perf_stream (Lionel) v6: Add missing struct declarations in i915_perf.h Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-2-chris@chris-wilson.co.uk (cherry picked from commit daed3e44396d178cf2098b754bb8ef5ca4e918bc) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 4 +- drivers/gpu/drm/i915/gt/intel_gt_types.h | 5 + drivers/gpu/drm/i915/i915_debugfs.c | 32 +++ drivers/gpu/drm/i915/i915_perf.c | 224 ++++++++++++++++++ drivers/gpu/drm/i915/i915_perf_types.h | 8 + drivers/gpu/drm/i915/i915_reg.h | 4 +- .../drm/i915/selftests/i915_live_selftests.h | 1 + drivers/gpu/drm/i915/selftests/i915_perf.c | 216 +++++++++++++++++ 8 files changed, 492 insertions(+), 2 deletions(-) create mode 100644 drivers/gpu/drm/i915/selftests/i915_perf.c diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h index fed06f057627..f3b7e8a2e751 100644 --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h @@ -157,7 +157,8 @@ #define MI_BATCH_BUFFER_START MI_INSTR(0x31, 0) #define MI_BATCH_GTT (2<<6) /* aliased with (1<<7) on gen4 */ #define MI_BATCH_BUFFER_START_GEN8 MI_INSTR(0x31, 1) -#define MI_BATCH_RESOURCE_STREAMER (1<<10) +#define MI_BATCH_RESOURCE_STREAMER REG_BIT(10) +#define MI_BATCH_PREDICATE REG_BIT(15) /* HSW+ on RCS only*/ /* * 3D instructions used by the kernel @@ -218,6 +219,7 @@ #define PIPE_CONTROL_CS_STALL (1<<20) #define PIPE_CONTROL_TLB_INVALIDATE (1<<18) #define PIPE_CONTROL_MEDIA_STATE_CLEAR (1<<16) +#define PIPE_CONTROL_WRITE_TIMESTAMP (3<<14) #define PIPE_CONTROL_QW_WRITE (1<<14) #define PIPE_CONTROL_POST_SYNC_OP_MASK (3<<14) #define PIPE_CONTROL_DEPTH_STALL (1<<13) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h index dc295c196d11..f752b6cf9ea1 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h @@ -97,6 +97,11 @@ enum intel_gt_scratch_field { /* 8 bytes */ INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA = 256, + /* 6 * 8 bytes */ + INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR = 2048, + + /* 4 bytes */ + INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1 = 2096, }; #endif /* __INTEL_GT_TYPES_H__ */ diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index c4750354581d..1f3c7387e1c1 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -3572,6 +3572,37 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_wedged_fops, i915_wedged_get, i915_wedged_set, "%llu\n"); +static int +i915_perf_noa_delay_set(void *data, u64 val) +{ + struct drm_i915_private *i915 = data; + const u32 clk = RUNTIME_INFO(i915)->cs_timestamp_frequency_khz; + + /* + * This would lead to infinite waits as we're doing timestamp + * difference on the CS with only 32bits. + */ + if (val > mul_u32_u32(U32_MAX, clk)) + return -EINVAL; + + atomic64_set(&i915->perf.noa_programming_delay, val); + return 0; +} + +static int +i915_perf_noa_delay_get(void *data, u64 *val) +{ + struct drm_i915_private *i915 = data; + + *val = atomic64_read(&i915->perf.noa_programming_delay); + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops, + i915_perf_noa_delay_get, + i915_perf_noa_delay_set, + "%llu\n"); + #define DROP_UNBOUND BIT(0) #define DROP_BOUND BIT(1) #define DROP_RETIRE BIT(2) @@ -4338,6 +4369,7 @@ static const struct i915_debugfs_files { const char *name; const struct file_operations *fops; } i915_debugfs_files[] = { + {"i915_perf_noa_delay", &i915_perf_noa_delay_fops}, {"i915_wedged", &i915_wedged_fops}, {"i915_cache_sharing", &i915_cache_sharing_fops}, {"i915_gem_drop_caches", &i915_drop_caches_fops}, diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index a06cb0df8c9e..a603992a8304 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -198,6 +198,7 @@ #include "gem/i915_gem_context.h" #include "gt/intel_engine_pm.h" #include "gt/intel_engine_user.h" +#include "gt/intel_gt.h" #include "gt/intel_lrc_reg.h" #include "i915_drv.h" @@ -1369,6 +1370,12 @@ free_oa_configs(struct i915_perf_stream *stream) free_oa_config_bo(oa_bo); } +static void +free_noa_wait(struct i915_perf_stream *stream) +{ + i915_vma_unpin_and_release(&stream->noa_wait, 0); +} + static void i915_oa_stream_destroy(struct i915_perf_stream *stream) { struct i915_perf *perf = stream->perf; @@ -1393,6 +1400,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream) oa_put_render_ctx_id(stream); free_oa_configs(stream); + free_noa_wait(stream); if (perf->spurious_report_rs.missed) { DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n", @@ -1565,6 +1573,206 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream) return ret; } +static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs, + bool save, i915_reg_t reg, u32 offset, + u32 dword_count) +{ + u32 cmd; + u32 d; + + cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM; + if (INTEL_GEN(stream->perf->i915) >= 8) + cmd++; + + for (d = 0; d < dword_count; d++) { + *cs++ = cmd; + *cs++ = i915_mmio_reg_offset(reg) + 4 * d; + *cs++ = intel_gt_scratch_offset(stream->engine->gt, + offset) + 4 * d; + *cs++ = 0; + } + + return cs; +} + +static int alloc_noa_wait(struct i915_perf_stream *stream) +{ + struct drm_i915_private *i915 = stream->perf->i915; + struct drm_i915_gem_object *bo; + struct i915_vma *vma; + const u64 delay_ticks = 0xffffffffffffffff - + DIV64_U64_ROUND_UP( + atomic64_read(&stream->perf->noa_programming_delay) * + RUNTIME_INFO(i915)->cs_timestamp_frequency_khz, + 1000000ull); + const u32 base = stream->engine->mmio_base; +#define CS_GPR(x) GEN8_RING_CS_GPR(base, x) + u32 *batch, *ts0, *cs, *jump; + int ret, i; + enum { + START_TS, + NOW_TS, + DELTA_TS, + JUMP_PREDICATE, + DELTA_TARGET, + N_CS_GPR + }; + + bo = i915_gem_object_create_internal(i915, 4096); + if (IS_ERR(bo)) { + DRM_ERROR("Failed to allocate NOA wait batchbuffer\n"); + return PTR_ERR(bo); + } + + /* + * We pin in GGTT because we jump into this buffer now because + * multiple OA config BOs will have a jump to this address and it + * needs to be fixed during the lifetime of the i915/perf stream. + */ + vma = i915_gem_object_ggtt_pin(bo, NULL, 0, 0, PIN_HIGH); + if (IS_ERR(vma)) { + ret = PTR_ERR(vma); + goto err_unref; + } + + batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB); + if (IS_ERR(batch)) { + ret = PTR_ERR(batch); + goto err_unpin; + } + + /* Save registers. */ + for (i = 0; i < N_CS_GPR; i++) + cs = save_restore_register( + stream, cs, true /* save */, CS_GPR(i), + INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2); + cs = save_restore_register( + stream, cs, true /* save */, MI_PREDICATE_RESULT_1, + INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1); + + /* First timestamp snapshot location. */ + ts0 = cs; + + /* + * Initial snapshot of the timestamp register to implement the wait. + * We work with 32b values, so clear out the top 32b bits of the + * register because the ALU works 64bits. + */ + *cs++ = MI_LOAD_REGISTER_IMM(1); + *cs++ = i915_mmio_reg_offset(CS_GPR(START_TS)) + 4; + *cs++ = 0; + *cs++ = MI_LOAD_REGISTER_REG | (3 - 2); + *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(base)); + *cs++ = i915_mmio_reg_offset(CS_GPR(START_TS)); + + /* + * This is the location we're going to jump back into until the + * required amount of time has passed. + */ + jump = cs; + + /* + * Take another snapshot of the timestamp register. Take care to clear + * up the top 32bits of CS_GPR(1) as we're using it for other + * operations below. + */ + *cs++ = MI_LOAD_REGISTER_IMM(1); + *cs++ = i915_mmio_reg_offset(CS_GPR(NOW_TS)) + 4; + *cs++ = 0; + *cs++ = MI_LOAD_REGISTER_REG | (3 - 2); + *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(base)); + *cs++ = i915_mmio_reg_offset(CS_GPR(NOW_TS)); + + /* + * Do a diff between the 2 timestamps and store the result back into + * CS_GPR(1). + */ + *cs++ = MI_MATH(5); + *cs++ = MI_MATH_LOAD(MI_MATH_REG_SRCA, MI_MATH_REG(NOW_TS)); + *cs++ = MI_MATH_LOAD(MI_MATH_REG_SRCB, MI_MATH_REG(START_TS)); + *cs++ = MI_MATH_SUB; + *cs++ = MI_MATH_STORE(MI_MATH_REG(DELTA_TS), MI_MATH_REG_ACCU); + *cs++ = MI_MATH_STORE(MI_MATH_REG(JUMP_PREDICATE), MI_MATH_REG_CF); + + /* + * Transfer the carry flag (set to 1 if ts1 < ts0, meaning the + * timestamp have rolled over the 32bits) into the predicate register + * to be used for the predicated jump. + */ + *cs++ = MI_LOAD_REGISTER_REG | (3 - 2); + *cs++ = i915_mmio_reg_offset(CS_GPR(JUMP_PREDICATE)); + *cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1); + + /* Restart from the beginning if we had timestamps roll over. */ + *cs++ = (INTEL_GEN(i915) < 8 ? + MI_BATCH_BUFFER_START : + MI_BATCH_BUFFER_START_GEN8) | + MI_BATCH_PREDICATE; + *cs++ = i915_ggtt_offset(vma) + (ts0 - batch) * 4; + *cs++ = 0; + + /* + * Now add the diff between to previous timestamps and add it to : + * (((1 * << 64) - 1) - delay_ns) + * + * When the Carry Flag contains 1 this means the elapsed time is + * longer than the expected delay, and we can exit the wait loop. + */ + *cs++ = MI_LOAD_REGISTER_IMM(2); + *cs++ = i915_mmio_reg_offset(CS_GPR(DELTA_TARGET)); + *cs++ = lower_32_bits(delay_ticks); + *cs++ = i915_mmio_reg_offset(CS_GPR(DELTA_TARGET)) + 4; + *cs++ = upper_32_bits(delay_ticks); + + *cs++ = MI_MATH(4); + *cs++ = MI_MATH_LOAD(MI_MATH_REG_SRCA, MI_MATH_REG(DELTA_TS)); + *cs++ = MI_MATH_LOAD(MI_MATH_REG_SRCB, MI_MATH_REG(DELTA_TARGET)); + *cs++ = MI_MATH_ADD; + *cs++ = MI_MATH_STOREINV(MI_MATH_REG(JUMP_PREDICATE), MI_MATH_REG_CF); + + /* + * Transfer the result into the predicate register to be used for the + * predicated jump. + */ + *cs++ = MI_LOAD_REGISTER_REG | (3 - 2); + *cs++ = i915_mmio_reg_offset(CS_GPR(JUMP_PREDICATE)); + *cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1); + + /* Predicate the jump. */ + *cs++ = (INTEL_GEN(i915) < 8 ? + MI_BATCH_BUFFER_START : + MI_BATCH_BUFFER_START_GEN8) | + MI_BATCH_PREDICATE; + *cs++ = i915_ggtt_offset(vma) + (jump - batch) * 4; + *cs++ = 0; + + /* Restore registers. */ + for (i = 0; i < N_CS_GPR; i++) + cs = save_restore_register( + stream, cs, false /* restore */, CS_GPR(i), + INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2); + cs = save_restore_register( + stream, cs, false /* restore */, MI_PREDICATE_RESULT_1, + INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1); + + /* And return to the ring. */ + *cs++ = MI_BATCH_BUFFER_END; + + GEM_BUG_ON(cs - batch > PAGE_SIZE / sizeof(*batch)); + + i915_gem_object_flush_map(bo); + i915_gem_object_unpin_map(bo); + + stream->noa_wait = vma; + return 0; + +err_unpin: + __i915_vma_unpin(vma); +err_unref: + i915_gem_object_put(bo); + return ret; +} + static void config_oa_regs(struct intel_uncore *uncore, const struct i915_oa_reg *regs, u32 n_regs) @@ -2235,6 +2443,12 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, } } + ret = alloc_noa_wait(stream); + if (ret) { + DRM_DEBUG("Unable to allocate NOA wait batch buffer\n"); + goto err_noa_wait_alloc; + } + stream->oa_config = i915_perf_get_oa_config(perf, props->metrics_set); if (!stream->oa_config) { DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set); @@ -2302,6 +2516,9 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, intel_engine_pm_put(stream->engine); err_config: + free_noa_wait(stream); + +err_noa_wait_alloc: if (stream->ctx) oa_put_render_ctx_id(stream); @@ -3689,6 +3906,9 @@ void i915_perf_init(struct drm_i915_private *i915) ratelimit_set_flags(&perf->spurious_report_rs, RATELIMIT_MSG_ON_RELEASE); + atomic64_set(&perf->noa_programming_delay, + 500 * 1000 /* 500us */); + perf->i915 = i915; } } @@ -3718,3 +3938,7 @@ void i915_perf_fini(struct drm_i915_private *i915) memset(&perf->ops, 0, sizeof(perf->ops)); perf->i915 = NULL; } + +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) +#include "selftests/i915_perf.c" +#endif diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h index 337cd7d2ad77..d35a3c1946c3 100644 --- a/drivers/gpu/drm/i915/i915_perf_types.h +++ b/drivers/gpu/drm/i915/i915_perf_types.h @@ -266,6 +266,12 @@ struct i915_perf_stream { */ u32 head; } oa_buffer; + + /** + * A batch buffer doing a wait on the GPU for the NOA logic to be + * reprogrammed. + */ + struct i915_vma *noa_wait; }; /** @@ -385,6 +391,8 @@ struct i915_perf { struct i915_oa_ops ops; const struct i915_oa_format *oa_formats; + + atomic64_t noa_programming_delay; }; #endif /* _I915_PERF_TYPES_H_ */ diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index b5eb382514ec..c877cda32808 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -547,7 +547,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define MI_PREDICATE_SRC0_UDW _MMIO(0x2400 + 4) #define MI_PREDICATE_SRC1 _MMIO(0x2408) #define MI_PREDICATE_SRC1_UDW _MMIO(0x2408 + 4) - +#define MI_PREDICATE_DATA _MMIO(0x2410) +#define MI_PREDICATE_RESULT _MMIO(0x2418) +#define MI_PREDICATE_RESULT_1 _MMIO(0x241c) #define MI_PREDICATE_RESULT_2 _MMIO(0x2214) #define LOWER_SLICE_ENABLED (1 << 0) #define LOWER_SLICE_DISABLED (0 << 0) diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h index 1ccf0f731ac0..846ede427f37 100644 --- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h +++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h @@ -33,3 +33,4 @@ selftest(reset, intel_reset_live_selftests) selftest(hangcheck, intel_hangcheck_live_selftests) selftest(execlists, intel_execlists_live_selftests) selftest(guc, intel_guc_live_selftest) +selftest(perf, i915_perf_live_selftests) diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c new file mode 100644 index 000000000000..9abd525bfa15 --- /dev/null +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c @@ -0,0 +1,216 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2019 Intel Corporation + */ + +#include + +#include "gem/i915_gem_pm.h" +#include "gt/intel_gt.h" + +#include "i915_selftest.h" + +#include "igt_flush_test.h" +#include "lib_sw_fence.h" + +static struct i915_perf_stream * +test_stream(struct i915_perf *perf) +{ + struct drm_i915_perf_open_param param = {}; + struct perf_open_properties props = { + .engine = intel_engine_lookup_user(perf->i915, + I915_ENGINE_CLASS_RENDER, + 0), + .sample_flags = SAMPLE_OA_REPORT, + .oa_format = I915_OA_FORMAT_C4_B8, + .metrics_set = 1, + }; + struct i915_perf_stream *stream; + + stream = kzalloc(sizeof(*stream), GFP_KERNEL); + if (!stream) + return NULL; + + stream->perf = perf; + + mutex_lock(&perf->lock); + if (i915_oa_stream_init(stream, ¶m, &props)) { + kfree(stream); + stream = NULL; + } + mutex_unlock(&perf->lock); + + return stream; +} + +static void stream_destroy(struct i915_perf_stream *stream) +{ + struct i915_perf *perf = stream->perf; + + mutex_lock(&perf->lock); + i915_perf_destroy_locked(stream); + mutex_unlock(&perf->lock); +} + +static int live_sanitycheck(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct i915_perf_stream *stream; + + /* Quick check we can create a perf stream */ + + stream = test_stream(&i915->perf); + if (!stream) + return -EINVAL; + + stream_destroy(stream); + return 0; +} + +static int write_timestamp(struct i915_request *rq, int slot) +{ + u32 *cs; + int len; + + cs = intel_ring_begin(rq, 6); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + len = 5; + if (INTEL_GEN(rq->i915) >= 8) + len++; + + *cs++ = GFX_OP_PIPE_CONTROL(len); + *cs++ = PIPE_CONTROL_GLOBAL_GTT_IVB | + PIPE_CONTROL_STORE_DATA_INDEX | + PIPE_CONTROL_WRITE_TIMESTAMP; + *cs++ = slot * sizeof(u32); + *cs++ = 0; + *cs++ = 0; + *cs++ = 0; + + intel_ring_advance(rq, cs); + + return 0; +} + +static ktime_t poll_status(struct i915_request *rq, int slot) +{ + while (!intel_read_status_page(rq->engine, slot) && + !i915_request_completed(rq)) + cpu_relax(); + + return ktime_get(); +} + +static int live_noa_delay(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct i915_perf_stream *stream; + struct i915_request *rq; + ktime_t t0, t1; + u64 expected; + u32 delay; + int err; + int i; + + /* Check that the GPU delays matches expectations */ + + stream = test_stream(&i915->perf); + if (!stream) + return -ENOMEM; + + expected = atomic64_read(&stream->perf->noa_programming_delay); + + if (stream->engine->class != RENDER_CLASS) { + err = -ENODEV; + goto out; + } + + for (i = 0; i < 4; i++) + intel_write_status_page(stream->engine, 0x100 + i, 0); + + rq = i915_request_create(stream->engine->kernel_context); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto out; + } + + if (rq->engine->emit_init_breadcrumb && + rq->timeline->has_initial_breadcrumb) { + err = rq->engine->emit_init_breadcrumb(rq); + if (err) { + i915_request_add(rq); + goto out; + } + } + + err = write_timestamp(rq, 0x100); + if (err) { + i915_request_add(rq); + goto out; + } + + err = rq->engine->emit_bb_start(rq, + i915_ggtt_offset(stream->noa_wait), 0, + I915_DISPATCH_SECURE); + if (err) { + i915_request_add(rq); + goto out; + } + + err = write_timestamp(rq, 0x102); + if (err) { + i915_request_add(rq); + goto out; + } + + i915_request_get(rq); + i915_request_add(rq); + + preempt_disable(); + t0 = poll_status(rq, 0x100); + t1 = poll_status(rq, 0x102); + preempt_enable(); + + pr_info("CPU delay: %lluns, expected %lluns\n", + ktime_sub(t1, t0), expected); + + delay = intel_read_status_page(stream->engine, 0x102); + delay -= intel_read_status_page(stream->engine, 0x100); + delay = div_u64(mul_u32_u32(delay, 1000 * 1000), + RUNTIME_INFO(i915)->cs_timestamp_frequency_khz); + pr_info("GPU delay: %uns, expected %lluns\n", + delay, expected); + + if (4 * delay < 3 * expected || 2 * delay > 3 * expected) { + pr_err("GPU delay [%uus] outside of expected threshold! [%lluus, %lluus]\n", + delay / 1000, + div_u64(3 * expected, 4000), + div_u64(3 * expected, 2000)); + err = -EINVAL; + } + + i915_request_put(rq); +out: + stream_destroy(stream); + return err; +} + +int i915_perf_live_selftests(struct drm_i915_private *i915) +{ + static const struct i915_subtest tests[] = { + SUBTEST(live_sanitycheck), + SUBTEST(live_noa_delay), + }; + struct i915_perf *perf = &i915->perf; + + if (!perf->metrics_kobj || !perf->ops.enable_metric_set) + return 0; + + if (intel_gt_is_wedged(&i915->gt)) + return 0; + + return i915_subtests(tests, i915); +} From patchwork Tue May 5 08:32:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283337 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY080X0wz9sSk; Tue, 5 May 2020 18:33:28 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt11-0003oy-4N; Tue, 05 May 2020 08:33:23 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0Q-0003Nd-MG for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:46 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0P-0004sj-Ev for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:45 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 12/26] drm/i915/perf: execute OA configuration from command stream Date: Tue, 5 May 2020 11:32:25 +0300 Message-Id: <20200505083239.2428948-13-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 We haven't run into issues with programming the global OA/NOA registers configuration from CPU so far, but HW engineers actually recommend doing this from the command streamer. On TGL in particular one of the clock domain in which some of that programming goes might not be powered when we poke things from the CPU. Since we have a command buffer prepared for the execbuffer side of things, we can reuse that approach here too. This also allows us to significantly reduce the amount of time we hold the main lock. v2: Drop the global lock as much as possible v3: Take global lock to pin global v4: Create i915 request in emit_oa_config() to avoid deadlocks (Lionel) v5: Move locking to the stream (Lionel) v6: Move active reconfiguration request into i915_perf_stream (Lionel) v7: Pin VMA outside request creation (Chris) Lock VMA before move to active (Chris) v8: Fix double free on stream->initial_oa_config_bo (Lionel) Don't allow interruption when waiting on active config request (Lionel) Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-3-chris@chris-wilson.co.uk (cherry picked from commit 15d0ace1f876e01b9745cb22ee32e3770fe3a6d5) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 199 ++++++++++++++++++++++++------- 1 file changed, 156 insertions(+), 43 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index a603992a8304..4ce830add8b6 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1767,56 +1767,181 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) return 0; err_unpin: - __i915_vma_unpin(vma); + i915_vma_unpin_and_release(&vma, 0); err_unref: i915_gem_object_put(bo); return ret; } -static void config_oa_regs(struct intel_uncore *uncore, - const struct i915_oa_reg *regs, - u32 n_regs) +static u32 *write_cs_mi_lri(u32 *cs, + const struct i915_oa_reg *reg_data, + u32 n_regs) { u32 i; for (i = 0; i < n_regs; i++) { - const struct i915_oa_reg *reg = regs + i; + if ((i % MI_LOAD_REGISTER_IMM_MAX_REGS) == 0) { + u32 n_lri = min_t(u32, + n_regs - i, + MI_LOAD_REGISTER_IMM_MAX_REGS); + + *cs++ = MI_LOAD_REGISTER_IMM(n_lri); + } + *cs++ = i915_mmio_reg_offset(reg_data[i].addr); + *cs++ = reg_data[i].value; + } + + return cs; +} + +static int num_lri_dwords(int num_regs) +{ + int count = 0; + + if (num_regs > 0) { + count += DIV_ROUND_UP(num_regs, MI_LOAD_REGISTER_IMM_MAX_REGS); + count += num_regs * 2; + } + + return count; +} + +static struct i915_oa_config_bo * +alloc_oa_config_buffer(struct i915_perf_stream *stream, + struct i915_oa_config *oa_config) +{ + struct drm_i915_gem_object *obj; + struct i915_oa_config_bo *oa_bo; + size_t config_length = 0; + u32 *cs; + int err; + + oa_bo = kzalloc(sizeof(*oa_bo), GFP_KERNEL); + if (!oa_bo) + return ERR_PTR(-ENOMEM); + + config_length += num_lri_dwords(oa_config->mux_regs_len); + config_length += num_lri_dwords(oa_config->b_counter_regs_len); + config_length += num_lri_dwords(oa_config->flex_regs_len); + config_length++; /* MI_BATCH_BUFFER_END */ + config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE); + + obj = i915_gem_object_create_shmem(stream->perf->i915, config_length); + if (IS_ERR(obj)) { + err = PTR_ERR(obj); + goto err_free; + } + + cs = i915_gem_object_pin_map(obj, I915_MAP_WB); + if (IS_ERR(cs)) { + err = PTR_ERR(cs); + goto err_oa_bo; + } - intel_uncore_write(uncore, reg->addr, reg->value); + cs = write_cs_mi_lri(cs, + oa_config->mux_regs, + oa_config->mux_regs_len); + cs = write_cs_mi_lri(cs, + oa_config->b_counter_regs, + oa_config->b_counter_regs_len); + cs = write_cs_mi_lri(cs, + oa_config->flex_regs, + oa_config->flex_regs_len); + + *cs++ = MI_BATCH_BUFFER_END; + + i915_gem_object_flush_map(obj); + i915_gem_object_unpin_map(obj); + + oa_bo->vma = i915_vma_instance(obj, + &stream->engine->gt->ggtt->vm, + NULL); + if (IS_ERR(oa_bo->vma)) { + err = PTR_ERR(oa_bo->vma); + goto err_oa_bo; } + + oa_bo->oa_config = i915_oa_config_get(oa_config); + llist_add(&oa_bo->node, &stream->oa_config_bos); + + return oa_bo; + +err_oa_bo: + i915_gem_object_put(obj); +err_free: + kfree(oa_bo); + return ERR_PTR(err); } -static void delay_after_mux(void) +static struct i915_vma * +get_oa_vma(struct i915_perf_stream *stream, struct i915_oa_config *oa_config) { + struct i915_oa_config_bo *oa_bo; + /* - * It apparently takes a fairly long time for a new MUX - * configuration to be be applied after these register writes. - * This delay duration was derived empirically based on the - * render_basic config but hopefully it covers the maximum - * configuration latency. - * - * As a fallback, the checks in _append_oa_reports() to skip - * invalid OA reports do also seem to work to discard reports - * generated before this config has completed - albeit not - * silently. - * - * Unfortunately this is essentially a magic number, since we - * don't currently know of a reliable mechanism for predicting - * how long the MUX config will take to apply and besides - * seeing invalid reports we don't know of a reliable way to - * explicitly check that the MUX config has landed. - * - * It's even possible we've miss characterized the underlying - * problem - it just seems like the simplest explanation why - * a delay at this location would mitigate any invalid reports. + * Look for the buffer in the already allocated BOs attached + * to the stream. */ - usleep_range(15000, 20000); + llist_for_each_entry(oa_bo, stream->oa_config_bos.first, node) { + if (oa_bo->oa_config == oa_config && + memcmp(oa_bo->oa_config->uuid, + oa_config->uuid, + sizeof(oa_config->uuid)) == 0) + goto out; + } + + oa_bo = alloc_oa_config_buffer(stream, oa_config); + if (IS_ERR(oa_bo)) + return ERR_CAST(oa_bo); + +out: + return i915_vma_get(oa_bo->vma); +} + +static int emit_oa_config(struct i915_perf_stream *stream, + struct intel_context *ce) +{ + struct i915_request *rq; + struct i915_vma *vma; + int err; + + vma = get_oa_vma(stream, stream->oa_config); + if (IS_ERR(vma)) + return PTR_ERR(vma); + + err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH); + if (err) + goto err_vma_put; + + rq = i915_request_create(ce); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto err_vma_unpin; + } + + i915_vma_lock(vma); + err = i915_request_await_object(rq, vma->obj, 0); + if (!err) + err = i915_vma_move_to_active(vma, rq, 0); + i915_vma_unlock(vma); + if (err) + goto err_add_request; + + err = rq->engine->emit_bb_start(rq, + vma->node.start, 0, + I915_DISPATCH_SECURE); +err_add_request: + i915_request_add(rq); +err_vma_unpin: + i915_vma_unpin(vma); +err_vma_put: + i915_vma_put(vma); + return err; } static int hsw_enable_metric_set(struct i915_perf_stream *stream) { struct intel_uncore *uncore = stream->uncore; - const struct i915_oa_config *oa_config = stream->oa_config; /* * PRM: @@ -1833,13 +1958,7 @@ static int hsw_enable_metric_set(struct i915_perf_stream *stream) intel_uncore_rmw(uncore, GEN6_UCGCTL1, 0, GEN6_CSUNIT_CLOCK_GATE_DISABLE); - config_oa_regs(uncore, oa_config->mux_regs, oa_config->mux_regs_len); - delay_after_mux(); - - config_oa_regs(uncore, oa_config->b_counter_regs, - oa_config->b_counter_regs_len); - - return 0; + return emit_oa_config(stream, stream->engine->kernel_context); } static void hsw_disable_metric_set(struct i915_perf_stream *stream) @@ -2196,13 +2315,7 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream) if (ret) return ret; - config_oa_regs(uncore, oa_config->mux_regs, oa_config->mux_regs_len); - delay_after_mux(); - - config_oa_regs(uncore, oa_config->b_counter_regs, - oa_config->b_counter_regs_len); - - return 0; + return emit_oa_config(stream, stream->engine->kernel_context); } static void gen8_disable_metric_set(struct i915_perf_stream *stream) From patchwork Tue May 5 08:32:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283338 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0867DBz9sP7; Tue, 5 May 2020 18:33:28 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt11-0003pl-Nd; Tue, 05 May 2020 08:33:23 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0R-0003Nh-7h for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:47 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0P-0004sj-Pu for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:45 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 13/26] drm/i915/perf: Prefer using the pinned_ctx for emitting delays on config Date: Tue, 5 May 2020 11:32:26 +0300 Message-Id: <20200505083239.2428948-14-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Chris Wilson BugLink: https://bugs.launchpad.net/bugs/1871957 When we are watching a particular context, we want the OA config to be applied inline with that context such that it takes effect before the next submission. Signed-off-by: Chris Wilson Cc: Lionel Landwerlin Reviewed-by: Lionel Landwerlin Link: https://patchwork.freedesktop.org/patch/msgid/20191012091056.28686-1-chris@chris-wilson.co.uk (cherry picked from commit 5f5c382ecfdd06e17316d1c9f1362522c20cdfef) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 4ce830add8b6..742a0e81e097 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1939,6 +1939,11 @@ static int emit_oa_config(struct i915_perf_stream *stream, return err; } +static struct intel_context *oa_context(struct i915_perf_stream *stream) +{ + return stream->pinned_ctx ?: stream->engine->kernel_context; +} + static int hsw_enable_metric_set(struct i915_perf_stream *stream) { struct intel_uncore *uncore = stream->uncore; @@ -1958,7 +1963,7 @@ static int hsw_enable_metric_set(struct i915_perf_stream *stream) intel_uncore_rmw(uncore, GEN6_UCGCTL1, 0, GEN6_CSUNIT_CLOCK_GATE_DISABLE); - return emit_oa_config(stream, stream->engine->kernel_context); + return emit_oa_config(stream, oa_context(stream)); } static void hsw_disable_metric_set(struct i915_perf_stream *stream) @@ -2315,7 +2320,7 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream) if (ret) return ret; - return emit_oa_config(stream, stream->engine->kernel_context); + return emit_oa_config(stream, oa_context(stream)); } static void gen8_disable_metric_set(struct i915_perf_stream *stream) From patchwork Tue May 5 08:32:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283339 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0D1tQYz9sP7; Tue, 5 May 2020 18:33:32 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt15-0003rk-1N; Tue, 05 May 2020 08:33:27 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0R-0003No-Ng for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:47 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0Q-0004sj-5P for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:46 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 14/26] drm/i915/perf: Avoid polluting the i915_oa_config with error pointers Date: Tue, 5 May 2020 11:32:27 +0300 Message-Id: <20200505083239.2428948-15-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Chris Wilson BugLink: https://bugs.launchpad.net/bugs/1871957 Use a local variable to track the allocation errors to avoid polluting the struct and keep the free simple. Reported-by: kbuild test robot Reported-by: Dan Carpenter Signed-off-by: Chris Wilson Cc: Lionel Landwerlin Reviewed-by: Lionel Landwerlin Link: https://patchwork.freedesktop.org/patch/msgid/20191013095211.2922-1-chris@chris-wilson.co.uk (cherry picked from commit c2fba936d3047fc5ea1458549daa19844ef243a4) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 52 +++++++++++++++----------------- 1 file changed, 25 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 742a0e81e097..b4c5978cb8cd 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -384,12 +384,9 @@ void i915_oa_config_release(struct kref *ref) struct i915_oa_config *oa_config = container_of(ref, typeof(*oa_config), ref); - if (!PTR_ERR(oa_config->flex_regs)) - kfree(oa_config->flex_regs); - if (!PTR_ERR(oa_config->b_counter_regs)) - kfree(oa_config->b_counter_regs); - if (!PTR_ERR(oa_config->mux_regs)) - kfree(oa_config->mux_regs); + kfree(oa_config->flex_regs); + kfree(oa_config->b_counter_regs); + kfree(oa_config->mux_regs); kfree_rcu(oa_config, rcu); } @@ -3669,6 +3666,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, struct i915_perf *perf = &to_i915(dev)->perf; struct drm_i915_perf_oa_config *args = data; struct i915_oa_config *oa_config, *tmp; + static struct i915_oa_reg *regs; int err, id; if (!perf->i915) { @@ -3714,30 +3712,30 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, memcpy(oa_config->uuid, args->uuid, sizeof(args->uuid)); oa_config->mux_regs_len = args->n_mux_regs; - oa_config->mux_regs = - alloc_oa_regs(perf, - perf->ops.is_valid_mux_reg, - u64_to_user_ptr(args->mux_regs_ptr), - args->n_mux_regs); + regs = alloc_oa_regs(perf, + perf->ops.is_valid_mux_reg, + u64_to_user_ptr(args->mux_regs_ptr), + args->n_mux_regs); - if (IS_ERR(oa_config->mux_regs)) { + if (IS_ERR(regs)) { DRM_DEBUG("Failed to create OA config for mux_regs\n"); - err = PTR_ERR(oa_config->mux_regs); + err = PTR_ERR(regs); goto reg_err; } + oa_config->mux_regs = regs; oa_config->b_counter_regs_len = args->n_boolean_regs; - oa_config->b_counter_regs = - alloc_oa_regs(perf, - perf->ops.is_valid_b_counter_reg, - u64_to_user_ptr(args->boolean_regs_ptr), - args->n_boolean_regs); + regs = alloc_oa_regs(perf, + perf->ops.is_valid_b_counter_reg, + u64_to_user_ptr(args->boolean_regs_ptr), + args->n_boolean_regs); - if (IS_ERR(oa_config->b_counter_regs)) { + if (IS_ERR(regs)) { DRM_DEBUG("Failed to create OA config for b_counter_regs\n"); - err = PTR_ERR(oa_config->b_counter_regs); + err = PTR_ERR(regs); goto reg_err; } + oa_config->b_counter_regs = regs; if (INTEL_GEN(perf->i915) < 8) { if (args->n_flex_regs != 0) { @@ -3746,17 +3744,17 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, } } else { oa_config->flex_regs_len = args->n_flex_regs; - oa_config->flex_regs = - alloc_oa_regs(perf, - perf->ops.is_valid_flex_reg, - u64_to_user_ptr(args->flex_regs_ptr), - args->n_flex_regs); + regs = alloc_oa_regs(perf, + perf->ops.is_valid_flex_reg, + u64_to_user_ptr(args->flex_regs_ptr), + args->n_flex_regs); - if (IS_ERR(oa_config->flex_regs)) { + if (IS_ERR(regs)) { DRM_DEBUG("Failed to create OA config for flex_regs\n"); - err = PTR_ERR(oa_config->flex_regs); + err = PTR_ERR(regs); goto reg_err; } + oa_config->flex_regs = regs; } err = mutex_lock_interruptible(&perf->metrics_lock); From patchwork Tue May 5 08:32:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283341 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0K19B0z9sTC; Tue, 5 May 2020 18:33:37 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt19-0003vu-Iv; Tue, 05 May 2020 08:33:31 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0T-0003O1-BP for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:49 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0Q-0004sj-I7 for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:46 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 15/26] drm/i915/perf: introduce a versioning of the i915-perf uapi Date: Tue, 5 May 2020 11:32:28 +0300 Message-Id: <20200505083239.2428948-16-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 Reporting this version will help application figure out what level of the support the running kernel provides. v2: Add i915_perf_ioctl_version() (Chris) Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-1-chris@chris-wilson.co.uk (cherry picked from commit b8d49f28aa03e4678e450e588b10c0faf96e4118) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_getparam.c | 4 ++++ drivers/gpu/drm/i915/i915_perf.c | 10 ++++++++++ drivers/gpu/drm/i915/i915_perf.h | 1 + include/uapi/drm/i915_drm.h | 21 +++++++++++++++++++++ 4 files changed, 36 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 9f1517af5b7f..97981fbfb4af 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -5,6 +5,7 @@ #include "gt/intel_engine_user.h" #include "i915_drv.h" +#include "i915_perf.h" int i915_getparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) @@ -156,6 +157,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_MMAP_GTT_COHERENT: value = INTEL_INFO(i915)->has_coherent_ggtt; break; + case I915_PARAM_PERF_REVISION: + value = i915_perf_ioctl_version(); + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index b4c5978cb8cd..3f9802a0785b 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -4055,6 +4055,16 @@ void i915_perf_fini(struct drm_i915_private *i915) perf->i915 = NULL; } +/** + * i915_perf_ioctl_version - Version of the i915-perf subsystem + * + * This version number is used by userspace to detect available features. + */ +int i915_perf_ioctl_version(void) +{ + return 1; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/i915_perf.c" #endif diff --git a/drivers/gpu/drm/i915/i915_perf.h b/drivers/gpu/drm/i915/i915_perf.h index 5a83c95e71ce..41cedcbe7ef3 100644 --- a/drivers/gpu/drm/i915/i915_perf.h +++ b/drivers/gpu/drm/i915/i915_perf.h @@ -22,6 +22,7 @@ void i915_perf_init(struct drm_i915_private *i915); void i915_perf_fini(struct drm_i915_private *i915); void i915_perf_register(struct drm_i915_private *i915); void i915_perf_unregister(struct drm_i915_private *i915); +int i915_perf_ioctl_version(void); int i915_perf_open_ioctl(struct drm_device *dev, void *data, struct drm_file *file); diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 469dc512cca3..ea02b168ac3a 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -611,6 +611,13 @@ typedef struct drm_i915_irq_wait { * See I915_EXEC_FENCE_OUT and I915_EXEC_FENCE_SUBMIT. */ #define I915_PARAM_HAS_EXEC_SUBMIT_FENCE 53 + +/* + * Revision of the i915-perf uAPI. The value returned helps determine what + * i915-perf features are available. See drm_i915_perf_property_id. + */ +#define I915_PARAM_PERF_REVISION 54 + /* Must be kept compact -- no holes and well documented */ typedef struct drm_i915_getparam { @@ -1844,23 +1851,31 @@ enum drm_i915_perf_property_id { * Open the stream for a specific context handle (as used with * execbuffer2). A stream opened for a specific context this way * won't typically require root privileges. + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_CTX_HANDLE = 1, /** * A value of 1 requests the inclusion of raw OA unit reports as * part of stream samples. + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_SAMPLE_OA, /** * The value specifies which set of OA unit metrics should be * be configured, defining the contents of any OA unit reports. + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_OA_METRICS_SET, /** * The value specifies the size and layout of OA unit reports. + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_OA_FORMAT, @@ -1870,6 +1885,8 @@ enum drm_i915_perf_property_id { * from this exponent as follows: * * 80ns * 2^(period_exponent + 1) + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_OA_EXPONENT, @@ -1901,6 +1918,8 @@ struct drm_i915_perf_open_param { * to close and re-open a stream with the same configuration. * * It's undefined whether any pending data for the stream will be lost. + * + * This ioctl is available in perf revision 1. */ #define I915_PERF_IOCTL_ENABLE _IO('i', 0x0) @@ -1908,6 +1927,8 @@ struct drm_i915_perf_open_param { * Disable data capture for a stream. * * It is an error to try and read a stream that is disabled. + * + * This ioctl is available in perf revision 1. */ #define I915_PERF_IOCTL_DISABLE _IO('i', 0x1) From patchwork Tue May 5 08:32:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283340 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0H0Gndz9sT1; Tue, 5 May 2020 18:33:35 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt16-0003te-Fg; Tue, 05 May 2020 08:33:28 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0T-0003OQ-81 for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:49 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0R-0004sj-1y for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:47 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 16/26] drm/i915: add support for perf configuration queries Date: Tue, 5 May 2020 11:32:29 +0300 Message-Id: <20200505083239.2428948-17-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 Listing configurations at the moment is supported only through sysfs. This might cause issues for applications wanting to list configurations from a container where sysfs isn't available. This change adds a way to query the number of configurations and their content through the i915 query uAPI. v2: Fix sparse warnings (Lionel) Add support to query configuration using uuid (Lionel) v3: Fix some inconsistency in uapi header (Lionel) Fix unlocking when not locked issue (Lionel) Add debug messages (Lionel) v4: Fix missing unlock (Dan) v5: Drop lock when copying config content to userspace (Chris) v6: Drop lock when copying config list to userspace (Chris) Fix deadlock when calling i915_perf_get_oa_config() under perf.metrics_lock (Lionel) Add i915_oa_config_get() (Chris) Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-2-chris@chris-wilson.co.uk (cherry picked from commit 4f6ccc74a85cbb4cdd373c374dc76398dc7603a1) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 3 +- drivers/gpu/drm/i915/i915_query.c | 296 ++++++++++++++++++++++++++++++ include/uapi/drm/i915_drm.h | 62 ++++++- 3 files changed, 358 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 3f9802a0785b..5d50976318ce 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -3844,8 +3844,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data, GEM_BUG_ON(*arg != oa_config->id); - sysfs_remove_group(perf->metrics_kobj, - &oa_config->sysfs_metric); + sysfs_remove_group(perf->metrics_kobj, &oa_config->sysfs_metric); idr_remove(&perf->metrics_idr, *arg); diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index ad9240a0817a..41a4c41a2755 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -7,6 +7,7 @@ #include #include "i915_drv.h" +#include "i915_perf.h" #include "i915_query.h" #include @@ -142,10 +143,305 @@ query_engine_info(struct drm_i915_private *i915, return len; } +static int can_copy_perf_config_registers_or_number(u32 user_n_regs, + u64 user_regs_ptr, + u32 kernel_n_regs) +{ + /* + * We'll just put the number of registers, and won't copy the + * register. + */ + if (user_n_regs == 0) + return 0; + + if (user_n_regs < kernel_n_regs) + return -EINVAL; + + if (!access_ok(u64_to_user_ptr(user_regs_ptr), + 2 * sizeof(u32) * kernel_n_regs)) + return -EFAULT; + + return 0; +} + +static int copy_perf_config_registers_or_number(const struct i915_oa_reg *kernel_regs, + u32 kernel_n_regs, + u64 user_regs_ptr, + u32 *user_n_regs) +{ + u32 r; + + if (*user_n_regs == 0) { + *user_n_regs = kernel_n_regs; + return 0; + } + + *user_n_regs = kernel_n_regs; + + for (r = 0; r < kernel_n_regs; r++) { + u32 __user *user_reg_ptr = + u64_to_user_ptr(user_regs_ptr + sizeof(u32) * r * 2); + u32 __user *user_val_ptr = + u64_to_user_ptr(user_regs_ptr + sizeof(u32) * r * 2 + + sizeof(u32)); + int ret; + + ret = __put_user(i915_mmio_reg_offset(kernel_regs[r].addr), + user_reg_ptr); + if (ret) + return -EFAULT; + + ret = __put_user(kernel_regs[r].value, user_val_ptr); + if (ret) + return -EFAULT; + } + + return 0; +} + +static int query_perf_config_data(struct drm_i915_private *i915, + struct drm_i915_query_item *query_item, + bool use_uuid) +{ + struct drm_i915_query_perf_config __user *user_query_config_ptr = + u64_to_user_ptr(query_item->data_ptr); + struct drm_i915_perf_oa_config __user *user_config_ptr = + u64_to_user_ptr(query_item->data_ptr + + sizeof(struct drm_i915_query_perf_config)); + struct drm_i915_perf_oa_config user_config; + struct i915_perf *perf = &i915->perf; + struct i915_oa_config *oa_config; + char uuid[UUID_STRING_LEN + 1]; + u64 config_id; + u32 flags, total_size; + int ret; + + if (!perf->i915) + return -ENODEV; + + total_size = + sizeof(struct drm_i915_query_perf_config) + + sizeof(struct drm_i915_perf_oa_config); + + if (query_item->length == 0) + return total_size; + + if (query_item->length < total_size) { + DRM_DEBUG("Invalid query config data item size=%u expected=%u\n", + query_item->length, total_size); + return -EINVAL; + } + + if (!access_ok(user_query_config_ptr, total_size)) + return -EFAULT; + + if (__get_user(flags, &user_query_config_ptr->flags)) + return -EFAULT; + + if (flags != 0) + return -EINVAL; + + if (use_uuid) { + struct i915_oa_config *tmp; + int id; + + BUILD_BUG_ON(sizeof(user_query_config_ptr->uuid) >= sizeof(uuid)); + + memset(&uuid, 0, sizeof(uuid)); + if (__copy_from_user(uuid, user_query_config_ptr->uuid, + sizeof(user_query_config_ptr->uuid))) + return -EFAULT; + + oa_config = NULL; + rcu_read_lock(); + idr_for_each_entry(&perf->metrics_idr, tmp, id) { + if (!strcmp(tmp->uuid, uuid)) { + oa_config = i915_oa_config_get(tmp); + break; + } + } + rcu_read_unlock(); + } else { + if (__get_user(config_id, &user_query_config_ptr->config)) + return -EFAULT; + + oa_config = i915_perf_get_oa_config(perf, config_id); + } + if (!oa_config) + return -ENOENT; + + if (__copy_from_user(&user_config, user_config_ptr, + sizeof(user_config))) { + ret = -EFAULT; + goto out; + } + + ret = can_copy_perf_config_registers_or_number(user_config.n_boolean_regs, + user_config.boolean_regs_ptr, + oa_config->b_counter_regs_len); + if (ret) + goto out; + + ret = can_copy_perf_config_registers_or_number(user_config.n_flex_regs, + user_config.flex_regs_ptr, + oa_config->flex_regs_len); + if (ret) + goto out; + + ret = can_copy_perf_config_registers_or_number(user_config.n_mux_regs, + user_config.mux_regs_ptr, + oa_config->mux_regs_len); + if (ret) + goto out; + + ret = copy_perf_config_registers_or_number(oa_config->b_counter_regs, + oa_config->b_counter_regs_len, + user_config.boolean_regs_ptr, + &user_config.n_boolean_regs); + if (ret) + goto out; + + ret = copy_perf_config_registers_or_number(oa_config->flex_regs, + oa_config->flex_regs_len, + user_config.flex_regs_ptr, + &user_config.n_flex_regs); + if (ret) + goto out; + + ret = copy_perf_config_registers_or_number(oa_config->mux_regs, + oa_config->mux_regs_len, + user_config.mux_regs_ptr, + &user_config.n_mux_regs); + if (ret) + goto out; + + memcpy(user_config.uuid, oa_config->uuid, sizeof(user_config.uuid)); + + if (__copy_to_user(user_config_ptr, &user_config, + sizeof(user_config))) { + ret = -EFAULT; + goto out; + } + + ret = total_size; + +out: + i915_oa_config_put(oa_config); + return ret; +} + +static size_t sizeof_perf_config_list(size_t count) +{ + return sizeof(struct drm_i915_query_perf_config) + sizeof(u64) * count; +} + +static size_t sizeof_perf_metrics(struct i915_perf *perf) +{ + struct i915_oa_config *tmp; + size_t i; + int id; + + i = 1; + rcu_read_lock(); + idr_for_each_entry(&perf->metrics_idr, tmp, id) + i++; + rcu_read_unlock(); + + return sizeof_perf_config_list(i); +} + +static int query_perf_config_list(struct drm_i915_private *i915, + struct drm_i915_query_item *query_item) +{ + struct drm_i915_query_perf_config __user *user_query_config_ptr = + u64_to_user_ptr(query_item->data_ptr); + struct i915_perf *perf = &i915->perf; + u64 *oa_config_ids = NULL; + int alloc, n_configs; + u32 flags; + int ret; + + if (!perf->i915) + return -ENODEV; + + if (query_item->length == 0) + return sizeof_perf_metrics(perf); + + if (get_user(flags, &user_query_config_ptr->flags)) + return -EFAULT; + + if (flags != 0) + return -EINVAL; + + n_configs = 1; + do { + struct i915_oa_config *tmp; + u64 *ids; + int id; + + ids = krealloc(oa_config_ids, + n_configs * sizeof(*oa_config_ids), + GFP_KERNEL); + if (!ids) + return -ENOMEM; + + alloc = fetch_and_zero(&n_configs); + + ids[n_configs++] = 1ull; /* reserved for test_config */ + rcu_read_lock(); + idr_for_each_entry(&perf->metrics_idr, tmp, id) { + if (n_configs < alloc) + ids[n_configs] = id; + n_configs++; + } + rcu_read_unlock(); + + oa_config_ids = ids; + } while (n_configs > alloc); + + if (query_item->length < sizeof_perf_config_list(n_configs)) { + DRM_DEBUG("Invalid query config list item size=%u expected=%zu\n", + query_item->length, + sizeof_perf_config_list(n_configs)); + kfree(oa_config_ids); + return -EINVAL; + } + + if (put_user(n_configs, &user_query_config_ptr->config)) { + kfree(oa_config_ids); + return -EFAULT; + } + + ret = copy_to_user(user_query_config_ptr + 1, + oa_config_ids, + n_configs * sizeof(*oa_config_ids)); + kfree(oa_config_ids); + if (ret) + return -EFAULT; + + return sizeof_perf_config_list(n_configs); +} + +static int query_perf_config(struct drm_i915_private *i915, + struct drm_i915_query_item *query_item) +{ + switch (query_item->flags) { + case DRM_I915_QUERY_PERF_CONFIG_LIST: + return query_perf_config_list(i915, query_item); + case DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID: + return query_perf_config_data(i915, query_item, true); + case DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID: + return query_perf_config_data(i915, query_item, false); + default: + return -EINVAL; + } +} + static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv, struct drm_i915_query_item *query_item) = { query_topology_info, query_engine_info, + query_perf_config, }; int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index ea02b168ac3a..f6b452e79a54 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -2005,6 +2005,7 @@ struct drm_i915_query_item { __u64 query_id; #define DRM_I915_QUERY_TOPOLOGY_INFO 1 #define DRM_I915_QUERY_ENGINE_INFO 2 +#define DRM_I915_QUERY_PERF_CONFIG 3 /* Must be kept compact -- no holes and well documented */ /* @@ -2016,9 +2017,18 @@ struct drm_i915_query_item { __s32 length; /* - * Unused for now. Must be cleared to zero. + * When query_id == DRM_I915_QUERY_TOPOLOGY_INFO, must be 0. + * + * When query_id == DRM_I915_QUERY_PERF_CONFIG, must be one of the + * following : + * - DRM_I915_QUERY_PERF_CONFIG_LIST + * - DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID + * - DRM_I915_QUERY_PERF_CONFIG_FOR_UUID */ __u32 flags; +#define DRM_I915_QUERY_PERF_CONFIG_LIST 1 +#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID 2 +#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID 3 /* * Data will be written at the location pointed by data_ptr when the @@ -2144,6 +2154,56 @@ struct drm_i915_query_engine_info { struct drm_i915_engine_info engines[]; }; +/* + * Data written by the kernel with query DRM_I915_QUERY_PERF_CONFIG. + */ +struct drm_i915_query_perf_config { + union { + /* + * When query_item.flags == DRM_I915_QUERY_PERF_CONFIG_LIST, i915 sets + * this fields to the number of configurations available. + */ + __u64 n_configs; + + /* + * When query_id == DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID, + * i915 will use the value in this field as configuration + * identifier to decide what data to write into config_ptr. + */ + __u64 config; + + /* + * When query_id == DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID, + * i915 will use the value in this field as configuration + * identifier to decide what data to write into config_ptr. + * + * String formatted like "%08x-%04x-%04x-%04x-%012x" + */ + char uuid[36]; + }; + + /* + * Unused for now. Must be cleared to zero. + */ + __u32 flags; + + /* + * When query_item.flags == DRM_I915_QUERY_PERF_CONFIG_LIST, i915 will + * write an array of __u64 of configuration identifiers. + * + * When query_item.flags == DRM_I915_QUERY_PERF_CONFIG_DATA, i915 will + * write a struct drm_i915_perf_oa_config. If the following fields of + * drm_i915_perf_oa_config are set not set to 0, i915 will write into + * the associated pointers the values of submitted when the + * configuration was created : + * + * - n_mux_regs + * - n_boolean_regs + * - n_flex_regs + */ + __u8 data[]; +}; + #if defined(__cplusplus) } #endif From patchwork Tue May 5 08:32:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283344 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0S0cKRz9sP7; Tue, 5 May 2020 18:33:44 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt1G-00043h-El; Tue, 05 May 2020 08:33:38 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0T-0003On-Jq for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:49 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0R-0004sj-EN for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:47 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 17/26] drm/i915/perf: Allow dynamic reconfiguration of the OA stream Date: Tue, 5 May 2020 11:32:30 +0300 Message-Id: <20200505083239.2428948-18-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Chris Wilson BugLink: https://bugs.launchpad.net/bugs/1871957 Introduce a new perf_ioctl command to change the OA configuration of the active stream. This allows the OA stream to be reconfigured between batch buffers, giving greater flexibility in sampling. We inject a request into the OA context to reconfigure the stream asynchronously on the GPU in between and ordered with execbuffer calls. Original patch for dynamic reconfiguration by Lionel Landwerlin. Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Chris Wilson Reviewed-by: Lionel Landwerlin Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-3-chris@chris-wilson.co.uk (cherry picked from commit 7831e9a965ea2ca91855995d62197bc8078bb762) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 46 +++++++++++++++++++++++++++++++- include/uapi/drm/i915_drm.h | 13 +++++++++ 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 5d50976318ce..1b4d8daaf130 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2900,6 +2900,40 @@ static void i915_perf_disable_locked(struct i915_perf_stream *stream) stream->ops->disable(stream); } +static long i915_perf_config_locked(struct i915_perf_stream *stream, + unsigned long metrics_set) +{ + struct i915_oa_config *config; + long ret = stream->oa_config->id; + + config = i915_perf_get_oa_config(stream->perf, metrics_set); + if (!config) + return -EINVAL; + + if (config != stream->oa_config) { + int err; + + /* + * If OA is bound to a specific context, emit the + * reconfiguration inline from that context. The update + * will then be ordered with respect to submission on that + * context. + * + * When set globally, we use a low priority kernel context, + * so it will effectively take effect when idle. + */ + err = emit_oa_config(stream, oa_context(stream)); + if (err == 0) + config = xchg(&stream->oa_config, config); + else + ret = err; + } + + i915_oa_config_put(config); + + return ret; +} + /** * i915_perf_ioctl - support ioctl() usage with i915 perf stream FDs * @stream: An i915 perf stream @@ -2923,6 +2957,8 @@ static long i915_perf_ioctl_locked(struct i915_perf_stream *stream, case I915_PERF_IOCTL_DISABLE: i915_perf_disable_locked(stream); return 0; + case I915_PERF_IOCTL_CONFIG: + return i915_perf_config_locked(stream, arg); } return -EINVAL; @@ -4061,7 +4097,15 @@ void i915_perf_fini(struct drm_i915_private *i915) */ int i915_perf_ioctl_version(void) { - return 1; + /* + * 1: Initial version + * I915_PERF_IOCTL_ENABLE + * I915_PERF_IOCTL_DISABLE + * + * 2: Added runtime modification of OA config. + * I915_PERF_IOCTL_CONFIG + */ + return 2; } #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index f6b452e79a54..70dcddd7fd7a 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1932,6 +1932,19 @@ struct drm_i915_perf_open_param { */ #define I915_PERF_IOCTL_DISABLE _IO('i', 0x1) +/** + * Change metrics_set captured by a stream. + * + * If the stream is bound to a specific context, the configuration change + * will performed inline with that context such that it takes effect before + * the next execbuf submission. + * + * Returns the previously bound metrics set id, or a negative error code. + * + * This ioctl is available in perf revision 2. + */ +#define I915_PERF_IOCTL_CONFIG _IO('i', 0x2) + /** * Common to all i915 perf records */ From patchwork Tue May 5 08:32:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283334 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GXzw60dXz9sP7; Tue, 5 May 2020 18:33:16 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt0n-0003df-26; Tue, 05 May 2020 08:33:09 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0U-0003Ov-BZ for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:50 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0R-0004sj-Q4 for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:47 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 18/26] drm/i915/perf: allow holding preemption on filtered ctx Date: Tue, 5 May 2020 11:32:31 +0300 Message-Id: <20200505083239.2428948-19-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 We would like to make use of perf in Vulkan. The Vulkan API is much lower level than OpenGL, with applications directly exposed to the concept of command buffers (pretty much equivalent to our batch buffers). In Vulkan, queries are always limited in scope to a command buffer. In OpenGL, the lack of command buffer concept meant that queries' duration could span multiple command buffers. With that restriction gone in Vulkan, we would like to simplify measuring performance just by measuring the deltas between the counter snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the more complex scheme we currently have in the GL driver, using 2 MI_RECORD_PERF_COUNT commands and doing some post processing on the stream of OA reports, coming from the global OA buffer, to remove any unrelated deltas in between the 2 MI_RECORD_PERF_COUNT. Disabling preemption only apply to a single context with which want to query performance counters for and is considered a privileged operation, by default protected by CAP_SYS_ADMIN. It is possible to enable it for a normal user by disabling the paranoid stream setting. v2: Store preemption setting in intel_context (Chris) v3: Use priorities to avoid preemption rather than the HW mechanism v4: Just modify the port priority reporting function v5: Add nopreempt flag on gem context and always flag requests appropriately, regarless of OA reconfiguration. Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-4-chris@chris-wilson.co.uk (cherry picked from commit 9cd20ef7803cc53a00c6eb7198b3d870ac7b3766) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/gem/i915_gem_context.h | 18 ++++++++++ .../gpu/drm/i915/gem/i915_gem_context_types.h | 1 + .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 3 ++ drivers/gpu/drm/i915/i915_perf.c | 34 +++++++++++++++++-- drivers/gpu/drm/i915/i915_perf_types.h | 8 +++++ include/uapi/drm/i915_drm.h | 11 ++++++ 6 files changed, 72 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h index 176978608b6f..555d107d3530 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h @@ -127,6 +127,24 @@ static inline void i915_gem_context_unpin_hw_id(struct i915_gem_context *ctx) atomic_dec(&ctx->hw_id_pin_count); } +static inline bool +i915_gem_context_nopreempt(const struct i915_gem_context *ctx) +{ + return test_bit(CONTEXT_NOPREEMPT, &ctx->flags); +} + +static inline void +i915_gem_context_set_nopreempt(struct i915_gem_context *ctx) +{ + set_bit(CONTEXT_NOPREEMPT, &ctx->flags); +} + +static inline void +i915_gem_context_clear_nopreempt(struct i915_gem_context *ctx) +{ + clear_bit(CONTEXT_NOPREEMPT, &ctx->flags); +} + static inline bool i915_gem_context_is_kernel(struct i915_gem_context *ctx) { return !ctx->file_priv; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index 00537b9d7006..7dfc14041b4f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -146,6 +146,7 @@ struct i915_gem_context { #define CONTEXT_CLOSED 1 #define CONTEXT_FORCE_SINGLE_SUBMISSION 2 #define CONTEXT_USER_ENGINES 3 +#define CONTEXT_NOPREEMPT 4 /** * @hw_id: - unique identifier for the context diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 3d8dff2d894a..c5e586a0b45e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -2102,6 +2102,9 @@ static int eb_submit(struct i915_execbuffer *eb) if (err) return err; + if (i915_gem_context_nopreempt(eb->gem_context)) + eb->request->flags |= I915_REQUEST_NOPREEMPT; + return 0; } diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 1b4d8daaf130..0e236c6c5ccd 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -344,6 +344,8 @@ static const struct i915_oa_format gen8_plus_oa_formats[I915_OA_FORMAT_MAX] = { * struct perf_open_properties - for validated properties given to open a stream * @sample_flags: `DRM_I915_PERF_PROP_SAMPLE_*` properties are tracked as flags * @single_context: Whether a single or all gpu contexts should be monitored + * @hold_preemption: Whether the preemption is disabled for the filtered + * context * @ctx_handle: A gem ctx handle for use with @single_context * @metrics_set: An ID for an OA unit metric set advertised via sysfs * @oa_format: An OA unit HW report format @@ -359,6 +361,7 @@ struct perf_open_properties { u32 sample_flags; u64 single_context:1; + u64 hold_preemption:1; u64 ctx_handle; /* OA sampling state */ @@ -2543,6 +2546,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, if (WARN_ON(stream->oa_buffer.format_size == 0)) return -EINVAL; + stream->hold_preemption = props->hold_preemption; + stream->oa_buffer.format = perf->oa_formats[props->oa_format].format; @@ -2872,6 +2877,9 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream) if (stream->ops->enable) stream->ops->enable(stream); + + if (stream->hold_preemption) + i915_gem_context_set_nopreempt(stream->ctx); } /** @@ -2896,6 +2904,9 @@ static void i915_perf_disable_locked(struct i915_perf_stream *stream) /* Allow stream->ops->disable() to refer to this */ stream->enabled = false; + if (stream->hold_preemption) + i915_gem_context_clear_nopreempt(stream->ctx); + if (stream->ops->disable) stream->ops->disable(stream); } @@ -3105,6 +3116,15 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf, } } + if (props->hold_preemption) { + if (!props->single_context) { + DRM_DEBUG("preemption disable with no context\n"); + ret = -EINVAL; + goto err; + } + privileged_op = true; + } + /* * On Haswell the OA unit supports clock gating off for a specific * context and in this mode there's no visibility of metrics for the @@ -3119,7 +3139,7 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf, * MI_REPORT_PERF_COUNT commands and so consider it a privileged op to * enable the OA unit by default. */ - if (IS_HASWELL(perf->i915) && specific_ctx) + if (IS_HASWELL(perf->i915) && specific_ctx && !props->hold_preemption) privileged_op = false; /* Similar to perf's kernel.perf_paranoid_cpu sysctl option @@ -3129,7 +3149,7 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf, */ if (privileged_op && i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) { - DRM_DEBUG("Insufficient privileges to open system-wide i915 perf stream\n"); + DRM_DEBUG("Insufficient privileges to open i915 perf stream\n"); ret = -EACCES; goto err_ctx; } @@ -3331,6 +3351,9 @@ static int read_properties_unlocked(struct i915_perf *perf, props->oa_periodic = true; props->oa_period_exponent = value; break; + case DRM_I915_PERF_PROP_HOLD_PREEMPTION: + props->hold_preemption = !!value; + break; case DRM_I915_PERF_PROP_MAX: MISSING_CASE(id); return -EINVAL; @@ -4104,8 +4127,13 @@ int i915_perf_ioctl_version(void) * * 2: Added runtime modification of OA config. * I915_PERF_IOCTL_CONFIG + * + * 3: Add DRM_I915_PERF_PROP_HOLD_PREEMPTION parameter to hold + * preemption on a particular context so that performance data is + * accessible from a delta of MI_RPC reports without looking at the + * OA buffer. */ - return 2; + return 3; } #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h index d35a3c1946c3..a1f733fc905a 100644 --- a/drivers/gpu/drm/i915/i915_perf_types.h +++ b/drivers/gpu/drm/i915/i915_perf_types.h @@ -170,6 +170,14 @@ struct i915_perf_stream { */ bool enabled; + /** + * @hold_preemption: Whether preemption is put on hold for command + * submissions done on the @ctx. This is useful for some drivers that + * cannot easily post process the OA buffer context to subtract delta + * of performance counters not associated with @ctx. + */ + bool hold_preemption; + /** * @ops: The callbacks providing the implementation of this specific * type of configured stream. diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 70dcddd7fd7a..c4645c618961 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1890,6 +1890,17 @@ enum drm_i915_perf_property_id { */ DRM_I915_PERF_PROP_OA_EXPONENT, + /** + * Specifying this property is only valid when specify a context to + * filter with DRM_I915_PERF_PROP_CTX_HANDLE. Specifying this property + * will hold preemption of the particular context we want to gather + * performance data about. The execbuf2 submissions must include a + * drm_i915_gem_execbuffer_ext_perf parameter for this to apply. + * + * This property is available in perf revision 3. + */ + DRM_I915_PERF_PROP_HOLD_PREEMPTION, + DRM_I915_PERF_PROP_MAX /* non-ABI */ }; From patchwork Tue May 5 08:32:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283342 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0K1SgLz9sTJ; Tue, 5 May 2020 18:33:37 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt1A-0003wV-1w; Tue, 05 May 2020 08:33:32 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0U-0003P6-IJ for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:50 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0S-0004sj-4p for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:48 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 19/26] drm/i915/perf: fix oa config reconfiguration Date: Tue, 5 May 2020 11:32:32 +0300 Message-Id: <20200505083239.2428948-20-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 The current logic just reapplies the same configuration already stored into stream->oa_config instead of the newly selected one. Signed-off-by: Lionel Landwerlin Fixes: 7831e9a965ea ("drm/i915/perf: Allow dynamic reconfiguration of the OA stream") Cc: Chris Wilson Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191019214647.27866-1-lionel.g.landwerlin@intel.com (cherry picked from commit 8814c6d01f7e82e6be70ac04d8bca0fc90757418) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 0e236c6c5ccd..c32aed19d2e0 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1899,13 +1899,14 @@ get_oa_vma(struct i915_perf_stream *stream, struct i915_oa_config *oa_config) } static int emit_oa_config(struct i915_perf_stream *stream, + struct i915_oa_config *oa_config, struct intel_context *ce) { struct i915_request *rq; struct i915_vma *vma; int err; - vma = get_oa_vma(stream, stream->oa_config); + vma = get_oa_vma(stream, oa_config); if (IS_ERR(vma)) return PTR_ERR(vma); @@ -1963,7 +1964,7 @@ static int hsw_enable_metric_set(struct i915_perf_stream *stream) intel_uncore_rmw(uncore, GEN6_UCGCTL1, 0, GEN6_CSUNIT_CLOCK_GATE_DISABLE); - return emit_oa_config(stream, oa_context(stream)); + return emit_oa_config(stream, stream->oa_config, oa_context(stream)); } static void hsw_disable_metric_set(struct i915_perf_stream *stream) @@ -2279,7 +2280,7 @@ static int gen8_configure_all_contexts(struct i915_perf_stream *stream, static int gen8_enable_metric_set(struct i915_perf_stream *stream) { struct intel_uncore *uncore = stream->uncore; - const struct i915_oa_config *oa_config = stream->oa_config; + struct i915_oa_config *oa_config = stream->oa_config; int ret; /* @@ -2320,7 +2321,7 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream) if (ret) return ret; - return emit_oa_config(stream, oa_context(stream)); + return emit_oa_config(stream, oa_config, oa_context(stream)); } static void gen8_disable_metric_set(struct i915_perf_stream *stream) @@ -2933,7 +2934,7 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream, * When set globally, we use a low priority kernel context, * so it will effectively take effect when idle. */ - err = emit_oa_config(stream, oa_context(stream)); + err = emit_oa_config(stream, config, oa_context(stream)); if (err == 0) config = xchg(&stream->oa_config, config); else From patchwork Tue May 5 08:32:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283350 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0l0gTfz9sSs; Tue, 5 May 2020 18:33:59 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt1V-0004Jf-M0; Tue, 05 May 2020 08:33:53 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0y-0003mA-8P for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:33:20 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0S-0004sj-Gf for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:48 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 20/26] drm/i915/perf: Add helper macros for comparing with whitelisted registers Date: Tue, 5 May 2020 11:32:33 +0300 Message-Id: <20200505083239.2428948-21-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Umesh Nerlige Ramappa BugLink: https://bugs.launchpad.net/bugs/1871957 Add helper macros for range and equality comparisons and use them to check with whitelisted registers in oa configurations. Signed-off-by: Umesh Nerlige Ramappa Reviewed-by: Lionel Landwerlin Signed-off-by: Lionel Landwerlin Link: https://patchwork.freedesktop.org/patch/msgid/20191025193746.47155-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit fc21523041107a4b2d47b05e1ad2a360659dbc4f) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 54 +++++++++++++++++--------------- 1 file changed, 28 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index c32aed19d2e0..f5a911d9c53c 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -3553,56 +3553,58 @@ static bool gen8_is_valid_flex_addr(struct i915_perf *perf, u32 addr) return false; } +#define ADDR_IN_RANGE(addr, start, end) \ + ((addr) >= (start) && \ + (addr) <= (end)) + +#define REG_IN_RANGE(addr, start, end) \ + ((addr) >= i915_mmio_reg_offset(start) && \ + (addr) <= i915_mmio_reg_offset(end)) + +#define REG_EQUAL(addr, mmio) \ + ((addr) == i915_mmio_reg_offset(mmio)) + static bool gen7_is_valid_b_counter_addr(struct i915_perf *perf, u32 addr) { - return (addr >= i915_mmio_reg_offset(OASTARTTRIG1) && - addr <= i915_mmio_reg_offset(OASTARTTRIG8)) || - (addr >= i915_mmio_reg_offset(OAREPORTTRIG1) && - addr <= i915_mmio_reg_offset(OAREPORTTRIG8)) || - (addr >= i915_mmio_reg_offset(OACEC0_0) && - addr <= i915_mmio_reg_offset(OACEC7_1)); + return REG_IN_RANGE(addr, OASTARTTRIG1, OASTARTTRIG8) || + REG_IN_RANGE(addr, OAREPORTTRIG1, OAREPORTTRIG8) || + REG_IN_RANGE(addr, OACEC0_0, OACEC7_1); } static bool gen7_is_valid_mux_addr(struct i915_perf *perf, u32 addr) { - return addr == i915_mmio_reg_offset(HALF_SLICE_CHICKEN2) || - (addr >= i915_mmio_reg_offset(MICRO_BP0_0) && - addr <= i915_mmio_reg_offset(NOA_WRITE)) || - (addr >= i915_mmio_reg_offset(OA_PERFCNT1_LO) && - addr <= i915_mmio_reg_offset(OA_PERFCNT2_HI)) || - (addr >= i915_mmio_reg_offset(OA_PERFMATRIX_LO) && - addr <= i915_mmio_reg_offset(OA_PERFMATRIX_HI)); + return REG_EQUAL(addr, HALF_SLICE_CHICKEN2) || + REG_IN_RANGE(addr, MICRO_BP0_0, NOA_WRITE) || + REG_IN_RANGE(addr, OA_PERFCNT1_LO, OA_PERFCNT2_HI) || + REG_IN_RANGE(addr, OA_PERFMATRIX_LO, OA_PERFMATRIX_HI); } static bool gen8_is_valid_mux_addr(struct i915_perf *perf, u32 addr) { return gen7_is_valid_mux_addr(perf, addr) || - addr == i915_mmio_reg_offset(WAIT_FOR_RC6_EXIT) || - (addr >= i915_mmio_reg_offset(RPM_CONFIG0) && - addr <= i915_mmio_reg_offset(NOA_CONFIG(8))); + REG_EQUAL(addr, WAIT_FOR_RC6_EXIT) || + REG_IN_RANGE(addr, RPM_CONFIG0, NOA_CONFIG(8)); } static bool gen10_is_valid_mux_addr(struct i915_perf *perf, u32 addr) { return gen8_is_valid_mux_addr(perf, addr) || - addr == i915_mmio_reg_offset(GEN10_NOA_WRITE_HIGH) || - (addr >= i915_mmio_reg_offset(OA_PERFCNT3_LO) && - addr <= i915_mmio_reg_offset(OA_PERFCNT4_HI)); + REG_EQUAL(addr, GEN10_NOA_WRITE_HIGH) || + REG_IN_RANGE(addr, OA_PERFCNT3_LO, OA_PERFCNT4_HI); } static bool hsw_is_valid_mux_addr(struct i915_perf *perf, u32 addr) { return gen7_is_valid_mux_addr(perf, addr) || - (addr >= 0x25100 && addr <= 0x2FF90) || - (addr >= i915_mmio_reg_offset(HSW_MBVID2_NOA0) && - addr <= i915_mmio_reg_offset(HSW_MBVID2_NOA9)) || - addr == i915_mmio_reg_offset(HSW_MBVID2_MISR0); + ADDR_IN_RANGE(addr, 0x25100, 0x2FF90) || + REG_IN_RANGE(addr, HSW_MBVID2_NOA0, HSW_MBVID2_NOA9) || + REG_EQUAL(addr, HSW_MBVID2_MISR0); } static bool chv_is_valid_mux_addr(struct i915_perf *perf, u32 addr) { return gen7_is_valid_mux_addr(perf, addr) || - (addr >= 0x182300 && addr <= 0x1823A4); + ADDR_IN_RANGE(addr, 0x182300, 0x1823A4); } static u32 mask_reg_value(u32 reg, u32 val) @@ -3611,14 +3613,14 @@ static u32 mask_reg_value(u32 reg, u32 val) * WaDisableSTUnitPowerOptimization workaround. Make sure the value * programmed by userspace doesn't change this. */ - if (i915_mmio_reg_offset(HALF_SLICE_CHICKEN2) == reg) + if (REG_EQUAL(reg, HALF_SLICE_CHICKEN2)) val = val & ~_MASKED_BIT_ENABLE(GEN8_ST_PO_DISABLE); /* WAIT_FOR_RC6_EXIT has only one bit fullfilling the function * indicated by its name and a bunch of selection fields used by OA * configs. */ - if (i915_mmio_reg_offset(WAIT_FOR_RC6_EXIT) == reg) + if (REG_EQUAL(reg, WAIT_FOR_RC6_EXIT)) val = val & ~_MASKED_BIT_ENABLE(HSW_WAIT_FOR_RC6_EXIT_ENABLE); return val; From patchwork Tue May 5 08:32:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283349 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0k0Hv2z9sP7; Tue, 5 May 2020 18:33:58 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt1T-0004HB-Ul; Tue, 05 May 2020 08:33:51 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0w-0003le-VE for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:33:18 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0S-0004sj-SB for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:49 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 21/26] drm/i915/perf: always consider holding preemption a privileged op Date: Tue, 5 May 2020 11:32:34 +0300 Message-Id: <20200505083239.2428948-22-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 The ordering of the checks in the existing code can lead to holding preemption not being considered as privileged op. Signed-off-by: Lionel Landwerlin Fixes: 9cd20ef7803c ("drm/i915/perf: allow holding preemption on filtered ctx") Reviewed-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191111095308.2550-1-lionel.g.landwerlin@intel.com (cherry picked from commit 0b0120d4c7b013eba59b33254febc0a6e4049e13) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index f5a911d9c53c..b7960fce1460 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -3117,15 +3117,6 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf, } } - if (props->hold_preemption) { - if (!props->single_context) { - DRM_DEBUG("preemption disable with no context\n"); - ret = -EINVAL; - goto err; - } - privileged_op = true; - } - /* * On Haswell the OA unit supports clock gating off for a specific * context and in this mode there's no visibility of metrics for the @@ -3140,9 +3131,18 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf, * MI_REPORT_PERF_COUNT commands and so consider it a privileged op to * enable the OA unit by default. */ - if (IS_HASWELL(perf->i915) && specific_ctx && !props->hold_preemption) + if (IS_HASWELL(perf->i915) && specific_ctx) privileged_op = false; + if (props->hold_preemption) { + if (!props->single_context) { + DRM_DEBUG("preemption disable with no context\n"); + ret = -EINVAL; + goto err; + } + privileged_op = true; + } + /* Similar to perf's kernel.perf_paranoid_cpu sysctl option * we check a dev.i915.perf_stream_paranoid sysctl option * to determine if it's ok to access system wide OA counters From patchwork Tue May 5 08:32:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283347 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0f2Kgdz9sSk; Tue, 5 May 2020 18:33:54 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt1R-0004ER-Dx; Tue, 05 May 2020 08:33:49 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0w-0003kM-62 for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:33:18 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0T-0004sj-6r for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:49 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 22/26] drm/i915/perf: don't forget noa wait after oa config Date: Tue, 5 May 2020 11:32:35 +0300 Message-Id: <20200505083239.2428948-23-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 I'm observing incoherence metric values, changing from run to run. It appears the patches introducing noa wait & reconfiguration from command stream switched places in the series multiple times during the review. This lead to the dependency of one onto the order to go missing... Signed-off-by: Lionel Landwerlin Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream") Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191113154639.27144-1-lionel.g.landwerlin@intel.com (cherry picked from commit 93937659dc644f708def8fa58cb63c5c9f499f26) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index b7960fce1460..bf9431552048 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1823,7 +1823,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, config_length += num_lri_dwords(oa_config->mux_regs_len); config_length += num_lri_dwords(oa_config->b_counter_regs_len); config_length += num_lri_dwords(oa_config->flex_regs_len); - config_length++; /* MI_BATCH_BUFFER_END */ + config_length += 3; /* MI_BATCH_BUFFER_START */ config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE); obj = i915_gem_object_create_shmem(stream->perf->i915, config_length); @@ -1848,7 +1848,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, oa_config->flex_regs, oa_config->flex_regs_len); - *cs++ = MI_BATCH_BUFFER_END; + /* Jump into the active wait. */ + *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ? + MI_BATCH_BUFFER_START : + MI_BATCH_BUFFER_START_GEN8); + *cs++ = i915_ggtt_offset(stream->noa_wait); + *cs++ = 0; i915_gem_object_flush_map(obj); i915_gem_object_unpin_map(obj); From patchwork Tue May 5 08:32:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283348 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0g2kkQz9sT0; Tue, 5 May 2020 18:33:55 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt1S-0004FW-BH; Tue, 05 May 2020 08:33:50 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0u-0003i1-SQ for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:33:16 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0T-0004sj-Ia for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:49 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 23/26] drm/i915/perf: Add preemption check while waiting for OA Date: Tue, 5 May 2020 11:32:36 +0300 Message-Id: <20200505083239.2428948-24-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Lionel Landwerlin BugLink: https://bugs.launchpad.net/bugs/1871957 While we're waiting for the OA configuration to apply, let's give a chance to other contexts that might need to run other workloads. Signed-off-by: Lionel Landwerlin Suggested-by: Chris Wilson Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191114140224.21818-1-lionel.g.landwerlin@intel.com (cherry picked from commit dd590f680089af64b68282c7758177ec224f60d2) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index bf9431552048..b9e1e49f353b 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1730,6 +1730,8 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) *cs++ = MI_MATH_ADD; *cs++ = MI_MATH_STOREINV(MI_MATH_REG(JUMP_PREDICATE), MI_MATH_REG_CF); + *cs++ = MI_ARB_CHECK; + /* * Transfer the result into the predicate register to be used for the * predicated jump. From patchwork Tue May 5 08:32:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283346 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY0c3W66z9sSk; Tue, 5 May 2020 18:33:52 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt1P-0004Bt-15; Tue, 05 May 2020 08:33:47 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0s-0003gT-S4 for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:33:14 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0T-0004sj-VU for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:50 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 24/26] drm/i915/perf: drop pointless static qualifier in i915_perf_add_config_ioctl() Date: Tue, 5 May 2020 11:32:37 +0300 Message-Id: <20200505083239.2428948-25-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Mao Wenan BugLink: https://bugs.launchpad.net/bugs/1871957 There is no need to have the 'T *v' variable static since new value always be assigned before use it. Reported-by: Hulk Robot Signed-off-by: Mao Wenan Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20191204010154.152396-1-maowenan@huawei.com (cherry picked from commit c415ef2a267cdfda429bb519e651c82132ef89fa) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index b9e1e49f353b..e6622cf6030e 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -3735,7 +3735,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, struct i915_perf *perf = &to_i915(dev)->perf; struct drm_i915_perf_oa_config *args = data; struct i915_oa_config *oa_config, *tmp; - static struct i915_oa_reg *regs; + struct i915_oa_reg *regs; int err, id; if (!perf->i915) { From patchwork Tue May 5 08:32:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283352 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY3Q5rqqz9sSx; Tue, 5 May 2020 18:36:18 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt3n-00051o-DU; Tue, 05 May 2020 08:36:15 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt3l-00051O-48 for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:36:13 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0U-0004sj-BS for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:50 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 25/26] drm/i915/perf: Manually acquire engine-wakeref around use of kernel_context Date: Tue, 5 May 2020 11:32:38 +0300 Message-Id: <20200505083239.2428948-26-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Chris Wilson BugLink: https://bugs.launchpad.net/bugs/1871957 The engine->kernel_context is a special case for request emission. Since it is used as the barrier within the engine's wakeref, we must acquire the wakeref before submitting a request to the kernel_context. Reported-by: Lionel Landwerlin Signed-off-by: Chris Wilson Cc: Lionel Landwerlin Reviewed-by: Mika Kuoppala Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-3-chris@chris-wilson.co.uk (cherry picked from commit d236e2ac535aee1ee19684427edd3654bac143a4) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index e6622cf6030e..0d11cafb976a 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2132,7 +2132,9 @@ static int gen8_modify_self(struct intel_context *ce, struct i915_request *rq; int err; + intel_engine_pm_get(ce->engine); rq = i915_request_create(ce); + intel_engine_pm_put(ce->engine); if (IS_ERR(rq)) return PTR_ERR(rq); From patchwork Tue May 5 08:32:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timo Aaltonen X-Patchwork-Id: 1283353 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49GY3R39WJz9sT0; Tue, 5 May 2020 18:36:19 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jVt3n-00051a-6v; Tue, 05 May 2020 08:36:15 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt3k-00051G-Um for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:36:12 +0000 Received: from [192.194.81.50] (helo=leon.tyrell) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jVt0U-0004sj-NV for kernel-team@lists.ubuntu.com; Tue, 05 May 2020 08:32:50 +0000 From: Timo Aaltonen To: kernel-team@lists.ubuntu.com Subject: [PATCH 26/26] drm/i915/perf: Reintroduce wait on OA configuration completion Date: Tue, 5 May 2020 11:32:39 +0300 Message-Id: <20200505083239.2428948-27-tjaalton@ubuntu.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200505083239.2428948-1-tjaalton@ubuntu.com> References: <20200505083239.2428948-1-tjaalton@ubuntu.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Chris Wilson BugLink: https://bugs.launchpad.net/bugs/1871957 We still need to wait for the initial OA configuration to happen before we enable OA report writes to the OA buffer. Reported-by: Lionel Landwerlin Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream") Closes: https://gitlab.freedesktop.org/drm/intel/issues/1356 Testcase: igt/perf/stream-open-close Signed-off-by: Chris Wilson Cc: Lionel Landwerlin Reviewed-by: Lionel Landwerlin Link: https://patchwork.freedesktop.org/patch/msgid/20200302085812.4172450-7-chris@chris-wilson.co.uk (cherry picked from commit 4b4e973d5eb89244b67d3223b60f752d0479f253) Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_perf.c | 51 +++++++++++++++++++------- drivers/gpu/drm/i915/i915_perf_types.h | 3 +- 2 files changed, 39 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 0d11cafb976a..075b09c0ee20 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1905,9 +1905,10 @@ get_oa_vma(struct i915_perf_stream *stream, struct i915_oa_config *oa_config) return i915_vma_get(oa_bo->vma); } -static int emit_oa_config(struct i915_perf_stream *stream, - struct i915_oa_config *oa_config, - struct intel_context *ce) +static struct i915_request * +emit_oa_config(struct i915_perf_stream *stream, + struct i915_oa_config *oa_config, + struct intel_context *ce) { struct i915_request *rq; struct i915_vma *vma; @@ -1915,7 +1916,7 @@ static int emit_oa_config(struct i915_perf_stream *stream, vma = get_oa_vma(stream, oa_config); if (IS_ERR(vma)) - return PTR_ERR(vma); + return ERR_CAST(vma); err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH); if (err) @@ -1938,13 +1939,17 @@ static int emit_oa_config(struct i915_perf_stream *stream, err = rq->engine->emit_bb_start(rq, vma->node.start, 0, I915_DISPATCH_SECURE); + if (err) + goto err_add_request; + + i915_request_get(rq); err_add_request: i915_request_add(rq); err_vma_unpin: i915_vma_unpin(vma); err_vma_put: i915_vma_put(vma); - return err; + return err ? ERR_PTR(err) : rq; } static struct intel_context *oa_context(struct i915_perf_stream *stream) @@ -1952,7 +1957,8 @@ static struct intel_context *oa_context(struct i915_perf_stream *stream) return stream->pinned_ctx ?: stream->engine->kernel_context; } -static int hsw_enable_metric_set(struct i915_perf_stream *stream) +static struct i915_request * +hsw_enable_metric_set(struct i915_perf_stream *stream) { struct intel_uncore *uncore = stream->uncore; @@ -2286,7 +2292,8 @@ static int gen8_configure_all_contexts(struct i915_perf_stream *stream, return 0; } -static int gen8_enable_metric_set(struct i915_perf_stream *stream) +static struct i915_request * +gen8_enable_metric_set(struct i915_perf_stream *stream) { struct intel_uncore *uncore = stream->uncore; struct i915_oa_config *oa_config = stream->oa_config; @@ -2328,7 +2335,7 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream) */ ret = gen8_configure_all_contexts(stream, oa_config); if (ret) - return ret; + return ERR_PTR(ret); return emit_oa_config(stream, oa_config, oa_context(stream)); } @@ -2476,6 +2483,20 @@ static const struct i915_perf_stream_ops i915_oa_stream_ops = { .read = i915_oa_read, }; +static int i915_perf_stream_enable_sync(struct i915_perf_stream *stream) +{ + struct i915_request *rq; + + rq = stream->perf->ops.enable_metric_set(stream); + if (IS_ERR(rq)) + return PTR_ERR(rq); + + i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT); + i915_request_put(rq); + + return 0; +} + /** * i915_oa_stream_init - validate combined props for OA stream and init * @stream: An i915 perf stream @@ -2612,7 +2633,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, stream->ops = &i915_oa_stream_ops; perf->exclusive_stream = stream; - ret = perf->ops.enable_metric_set(stream); + ret = i915_perf_stream_enable_sync(stream); if (ret) { DRM_DEBUG("Unable to enable metric set\n"); goto err_enable; @@ -2932,7 +2953,7 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream, return -EINVAL; if (config != stream->oa_config) { - int err; + struct i915_request *rq; /* * If OA is bound to a specific context, emit the @@ -2943,11 +2964,13 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream, * When set globally, we use a low priority kernel context, * so it will effectively take effect when idle. */ - err = emit_oa_config(stream, config, oa_context(stream)); - if (err == 0) + rq = emit_oa_config(stream, config, oa_context(stream)); + if (!IS_ERR(rq)) { config = xchg(&stream->oa_config, config); - else - ret = err; + i915_request_put(rq); + } else { + ret = PTR_ERR(rq); + } } i915_oa_config_put(config); diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h index a1f733fc905a..4784539ee953 100644 --- a/drivers/gpu/drm/i915/i915_perf_types.h +++ b/drivers/gpu/drm/i915/i915_perf_types.h @@ -310,7 +310,8 @@ struct i915_oa_ops { * counter reports being sampled. May apply system constraints such as * disabling EU clock gating as required. */ - int (*enable_metric_set)(struct i915_perf_stream *stream); + struct i915_request * + (*enable_metric_set)(struct i915_perf_stream *stream); /** * @disable_metric_set: Remove system constraints associated with using