[060/110] skiboot 6.0.9 release notes

Message ID	20190531061351.22973-61-stewart@linux.ibm.com
State	Accepted
Headers	show Return-Path: <skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org> Gateway: Authorized Use Only! Violators will be prosecuted for <skiboot@lists.ozlabs.org> from <stewart@linux.ibm.com>; Fri, 31 May 2019 07:14:08 +0100 Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 31 May 2019 07:14:05 +0100 From: Stewart Smith <stewart@linux.ibm.com> To: skiboot@lists.ozlabs.org Date: Fri, 31 May 2019 16:13:01 +1000 In-Reply-To: <20190531061351.22973-1-stewart@linux.ibm.com> References: <20190531061351.22973-1-stewart@linux.ibm.com> MIME-Version: 1.0 Message-Id: <20190531061351.22973-61-stewart@linux.ibm.com> Subject: [Skiboot] [PATCH 060/110] skiboot 6.0.9 release notes Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" <skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org>
Series	Big documentation cleanup/expansion \| expand [000/110] Big documentation cleanup/expansion [001/110] doc: flesh out OPAL return codes documentation [002/110] doc/opal-spec: update email address [003/110] doc/overview: flesh out overview [004/110] doc: Giant OPAL API table [005/110] doc: Make OPAL_CONSOLE_* docs pretty [006/110] doc: Make OPAL_RTC_READ docs pretty [007/110] doc: make OPAL_RTC_WRITE docs pretty [008/110] doc: prettify OPAL_REINIT_CPUS [009/110] doc: prettify FSP Code update docs [010/110] doc: prettify and expand OPAL_HANDLE_HMI2 docs [011/110] doc: prettify OPAL powercap docs [012/110] doc: prettify opal IMC counters calls [013/110] doc: prettify OPAL_SENSOR_GROUP_ENABLE and OPAL_SENSOR_GROUP_CLEAR [014/110] doc: prettify OPAL_QUIESCE [015/110] doc: prettify OPAL_PCI_SET_P2P [016/110] doc: prettify OPAL_[GET\|SET]_PBCQ_TUNNEL_BAR [017/110] doc: prettify and RSTify OPAL_NMMU_SET_PTCR [018/110] doc: prettify OPAL_PCI_TCE_KILL [019/110] doc: prettify OPAL_INT_* calls [020/110] doc: prettify OPAL_PCI_[GET\|SET]_POWER_STATE [021/110] doc: prettify OPAL_GET_DEVICE_TREE [022/110] doc: prettify OPAL_PCI_GET_PRESENCE_STATE [023/110] doc: prettify OPAL_PRD_MSG [024/110] doc: prettify OPAL_IPMI_(SEND\|RECV) [025/110] doc: prettify OPAL_SLW_SET_REG [026/110] doc: prettify OPAL_CHECK_TOKEN [027/110] doc: prettify OPAL_PCI_SET_PHB_CAPI_MODE [028/110] doc: make OPAL_TEST docs pretty [029/110] doc: Add anchors to 5.1.13 and 5.3.0 release notes [030/110] doc: combine OPAL_RTC_READ and OPAL_RTC_WRITE pages [031/110] doc: Make OPAL_CEC_POWER_DOWN docs pretty [032/110] doc: Add example to OPAL_CEC_POWER_DOWN [033/110] doc: flesh out OPAL_CEC_REBOOT docs [034/110] doc: Flesh out NVRAM docs [035/110] doc: Fix up OPAL_HANDLE_INTERRUPT links [036/110] doc: Add details on removed calls [037/110] Remove remnants of OPAL_PCI_SET_PHB_TCE_MEMORY [038/110] Remove last remnants of OPAL_PCI_SET_PHB_TCE_MEMORY and OPAL_PCI_SET_HUB_TCE_MEMORY [039/110] doc: Add PCI Config Space OPAL call docs [040/110] doc: Make OPAL_ELOG_* pretty, mark OPAL_ELOG_WRITE unimplemented [041/110] Document the long removed OPAL_REGISTER_OPAL_EXCEPTION_HANDLER call [042/110] doc: fix OPAL_CONSOLE_* links [043/110] doc: OPAL_START_CPU and OPAL_RETURN_CPU [044/110] doc: Add links to XIVE calls [045/110] doc: Combine and extend OPAL_SENSOR_READ[_U64] docs [046/110] doc: Clean up OPAL power shift ratio docs [047/110] doc: combine OPAL_SENSOR_GROUP_ENABLE and OPAL_SENSOR_GROUP_CLEAR [048/110] doc: Extend OPAL_LEDS_[GET\|SET]_INDICATOR [049/110] doc: prettify and flesh out OPAL_FLASH_* call documentation [050/110] doc: Add OPAL_I2C_REQUEST documentation [051/110] doc: Flesh out OPAL_(READ\|WRITE)_TPO and OPAL_GET_DPO_STATUS docs [052/110] doc: Flesh out OPAL_(UN)REGISTER_DUMP_REGION docs [053/110] doc: prettify and flesh out OPAL_XSCOM_READ and OPAL_XSCOM_WRITE [054/110] doc: flesh out and prettify OPAL_LPC_(READ\|WRITE) [055/110] doc: misc formatting fixes [056/110] doc: Document OPAL_DUMP_* calls [057/110] Add missing 5.4.9 release notes [058/110] skiboot 6.0.3 release notes [059/110] skiboot v6.0.12 release notes [060/110] skiboot 6.0.9 release notes [061/110] doc: Add missing skiboot 5.9.8 release notes [062/110] doc: Add missing skiboot-4.0 release notes [063/110] doc: Add missing skiboot-5.0 release notes [064/110] doc: Add missing skiboot-4.1 release notes [065/110] doc: Add missing skiboot-4.1.1 release notes [066/110] doc: Add skeleton OPAL_PCI_EEH_FREEZE_STATUS docs [067/110] doc: OPAL_PCI_SHPC was never implemented [068/110] doc: Add skeleton OPAL_PCI_EEH_FREEZE_CLEAR docs [069/110] Remove unused OPAL_PCI_EEH_FREEZE_STATUS2 [070/110] doc: prettify OPAL_PCI_PHB_MMIO_ENABLE [071/110] doc: prettify OPAL_PCI_SET_PHB_MEM_WINDOW [072/110] doc: prettify OPAL_PCI_MAP_PE_MMIO_WINDOW [073/110] Remove never implemented OPAL_PCI_SET_PHB_TABLE_MEMORY and document why [074/110] doc: prettify OPAL_PCI_SET_PE [075/110] doc: prettify OPAL_PCI_SET_PELTV [076/110] doc: prettify OPAL_PCI_SET_MVE [077/110] doc: prettify OPAL_PCI_SET_MVE_ENABLE [078/110] Remove unused OPAL_PCI_GET_XIVE_REISSUE and OPAL_PCI_SET_XIVE_REISSUE [079/110] doc: prettify OPAL_PCI_SET_XIVE_PE [080/110] Remove unused OPAL_GET_XIVE_SOURCE [081/110] doc: prettify OPAL_GET_MSI_[32\|64] [082/110] doc: Document OPAL_QUERY_CPU_STATUS [083/110] doc: prettify OPAL_PCI_MAP_PE_DMA_WINDOW[_REAL] [084/110] doc: Add skeleton OPAL_PCI_RESET docs [085/110] doc: Skeleton OPAL_PCI_GET_HUB_DIAG_DATA docs [086/110] doc: Flesh out OPAL_PCI_GET_PHB_DIAG_DATA2 docs [087/110] Remove remnants of OPAL_PCI_GET_PHB_DIAG_DATA [088/110] doc: OPAL_PCI_FENCE_PHB was never implemented [089/110] doc: Add skeleton for OPAL_PCI_REINIT [090/110] doc: OPAL_PCI_MASK_PE_ERROR was never implemented [091/110] doc: Mark OPAL_SET_SLOT_LED_STATUS as never implemented [092/110] doc: Add OPAL_GET_EPOW_STATUS docs [093/110] doc: OPAL_SET_SYSTEM_ATTENTION_LED was never implemented [094/110] doc: OPAL_RESERVED[12], reserved but never used [095/110] doc: Add skeleton for OPAL_PCI_NEXT_ERROR [096/110] doc: Skeleton OPAL_PCI_POLL docs [097/110] doc: Add OPAL_PCI_MSI_EOI skeleton docs [098/110] doc: Add OPAL_ELOG_WRITE to a list of future calls [099/110] doc: Add skeleton OPAL_RESYNC_TIMEBASE docs [100/110] doc: prettify OPAL_GET_MSG [101/110] doc: prettify OPAL_CHECK_ASYNC_COMPLETION [102/110] doc: prettify OPAL_SYNC_HOST_REBOOT [103/110] doc: Document OPAL_GET_PARAM and OPAL_SET_PARAM [104/110] doc: Document OPAL_ELOG_SEND as not ever used [105/110] doc: Document OPAL_WRITE_OPPANEL_ASYNC [106/110] doc: Skeleton OPAL_PCI_ERR_INJECT docs [107/110] doc: skeleton OPAL_PCI_EEH_FREEZE_SET docs [108/110] doc: Document OPAL_CONFIG_CPU_IDLE_STATE [109/110] doc: prettify OPAL_GET_XIVE and OPAL_SET_XIVE [110/110] doc: fixup misc broken links

Message ID

20190531061351.22973-61-stewart@linux.ibm.com

State

Accepted

Headers

From: Stewart Smith <stewart@linux.ibm.com>
To: skiboot@lists.ozlabs.org
Date: Fri, 31 May 2019 16:13:01 +1000
In-Reply-To: <20190531061351.22973-1-stewart@linux.ibm.com>
References: <20190531061351.22973-1-stewart@linux.ibm.com>
MIME-Version: 1.0
Message-Id: <20190531061351.22973-61-stewart@linux.ibm.com>
Subject: [Skiboot] [PATCH 060/110] skiboot 6.0.9 release notes
Precedence: list
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org
Sender: "Skiboot"
	<skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org>

Series

Big documentation cleanup/expansion | expand

Commit Message

Stewart Smith May 31, 2019, 6:13 a.m. UTC

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 19484fa4338f84617ae8282f17b43d658afc3589)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
---
 doc/release-notes/skiboot-6.0.9.rst | 139 ++++++++++++++++++++++++++++
 1 file changed, 139 insertions(+)
 create mode 100644 doc/release-notes/skiboot-6.0.9.rst

diff --git a/doc/release-notes/skiboot-6.0.9.rst b/doc/release-notes/skiboot-6.0.9.rst
new file mode 100644
index 000000000000..87eb797843ff
--- /dev/null
+++ b/doc/release-notes/skiboot-6.0.9.rst
@@ -0,0 +1,139 @@ 
+.. _skiboot-6.0.9:
+
+=============
+skiboot-6.0.9
+=============
+
+skiboot 6.0.9 was released on Friday October 12th, 2018. It replaces
+:ref:`skiboot-6.0.8` as the current stable release in the 6.0.x series.
+
+It is recommended that 6.0.9 be used instead of any previous 6.0.x version
+due to the bug fixes it contains.
+
+The bug fixes are:
+
+- opal/hmi: Ignore debug trigger inject core FIR.
+
+  Core FIR[60] is a side effect of the work around for the CI Vector Load
+  issue in DD2.1. Usually this gets delivered as HMI with HMER[17] where
+  Linux already ignores it. But it looks like in some cases we may happen
+  to see CORE_FIR[60] while we are already in Malfunction Alert HMI
+  (HMER[0]) due to other reasons e.g. CAPI recovery or NPU xstop. If that
+  happens then just ignore it instead of crashing kernel as not recoverable.
+
+- opal/hmi: Handle early HMIs on thread0 when secondaries are still in OPAL.
+
+  When primary thread receives a CORE level HMI for timer facility errors
+  while secondaries are still in OPAL, thread 0 ends up in rendez-vous
+  waiting for secondaries to get into hmi handling. This is because OPAL
+  runs with MSR(EE=0) and hence HMIs are delayed on secondary threads until
+  they are given to Linux OS. Fix this by adding a check for secondary
+  state and force them in hmi handling by queuing job on secondary threads.
+
+  I have tested this by injecting HDEC parity error very early during Linux
+  kernel boot. Recovery works fine for non-TB errors. But if TB is bad at
+  this very eary stage we already doomed.
+
+  Without this patch we see: ::
+
+    [  285.046347408,7] OPAL: Start CPU 0x0843 (PIR 0x0843) -> 0x000000000000a83c
+    [  285.051160609,7] OPAL: Start CPU 0x0844 (PIR 0x0844) -> 0x000000000000a83c
+    [  285.055359021,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
+    [  285.055361439,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:0: TFMR(2e12002870e14000) Timer Facility Error
+    [  286.232183823,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc1)
+    [  287.409002056,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc1)
+    [  289.073820164,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc1)
+    [  290.250638683,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc2)
+    [  291.427456821,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc2)
+    [  293.092274807,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc2)
+    [  294.269092904,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc3)
+    [  295.445910944,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc3)
+    [  297.110728970,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc3)
+
+  After this patch: ::
+
+    [  259.401719351,7] OPAL: Start CPU 0x0841 (PIR 0x0841) -> 0x000000000000a83c
+    [  259.406259572,7] OPAL: Start CPU 0x0842 (PIR 0x0842) -> 0x000000000000a83c
+    [  259.410615534,7] OPAL: Start CPU 0x0843 (PIR 0x0843) -> 0x000000000000a83c
+    [  259.415444519,7] OPAL: Start CPU 0x0844 (PIR 0x0844) -> 0x000000000000a83c
+    [  259.419641401,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
+    [  259.419644124,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:0: TFMR(2e12002870e04000) Timer Facility Error
+    [  259.419650678,7] HMI: Sending hmi job to thread 1
+    [  259.419652744,7] HMI: Sending hmi job to thread 2
+    [  259.419653051,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
+    [  259.419654725,7] HMI: Sending hmi job to thread 3
+    [  259.419654916,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
+    [  259.419658025,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
+    [  259.419658406,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:2: TFMR(2e12002870e04000) Timer Facility Error
+    [  259.419663095,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:3: TFMR(2e12002870e04000) Timer Facility Error
+    [  259.419655234,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:1: TFMR(2e12002870e04000) Timer Facility Error
+    [  259.425109779,7] OPAL: Start CPU 0x0845 (PIR 0x0845) -> 0x000000000000a83c
+    [  259.429870681,7] OPAL: Start CPU 0x0846 (PIR 0x0846) -> 0x000000000000a83c
+    [  259.434549250,7] OPAL: Start CPU 0x0847 (PIR 0x0847) -> 0x000000000000a83c
+
+- hw/bt.c: quieten all the noisy BT/IPMI messages
+- npu2: Use correct kill type for TCE invalidation
+
+  kill_type is enum of OPAL_PCI_TCE_KILL_PAGES, OPAL_PCI_TCE_KILL_PE,
+  OPAL_PCI_TCE_KILL_ALL and phb4_tce_kill() gets it right but
+  npu2_tce_kill() uses OPAL_PCI_TCE_KILL which is an OPAL API token.
+
+- hw/npu2-opencapi: Fix setting of supported OpenCAPI templates
+
+  In opal_npu_tl_set(), we made a typo that means the OPAL_NPU_TL_SET call
+  may not clear the enable bits for templates that were previously enabled
+  but are now disabled.
+
+  Fix the typo so we clear NPU2_OTL_CONFIG1_TX_TEMP2_EN as well as
+  TEMP{1,3}_EN.
+
+- phb4: Workaround PHB errata with CFG write UR/CA errors
+
+  If the PHB encounters a UR or CA status on a CFG write, it will
+  incorrectly freeze the wrong PE. Instead of using the PE# specified
+  in the CONFIG_ADDRESS register, it will use the PE# of whatever
+  MMIO occurred last.
+
+  Work around this disabling freeze on such errors
+
+- phb4: Handle allocation errors in phb4_eeh_dump_regs()
+
+  If the zalloc fails (and it can be a rather large allocation),
+  we will overwite memory at 0 instead of failing.
+
+- phb4: Don't try to access non-existent PEST entries
+
+  In a POWER9 chip, some PHB4s have 256 PEs, some have 512.
+
+  Currently, the diagnostics code retrieves 512 unconditionally,
+  which is wrong and causes us to incorrectly report bogus values
+  for the "high" PEs on the small PHBs.
+
+  Use the actual number of implemented PEs instead
+
+- phb4: Don't probe a PHB if its garded
+
+  Presently phb4_probe_stack() causes an exception while trying to probe
+  a PHB if its garded. This causes skiboot to go into a reboot loop with
+  following exception log: ::
+
+     ***********************************************
+     Fatal MCE at 000000003006ecd4   .probe_phb4+0x570
+     CFAR : 00000000300b98a0
+     <snip>
+     Aborting!
+     CPU 0018 Backtrace:
+     S: 0000000031cc37e0 R: 000000003001a51c   ._abort+0x4c
+     S: 0000000031cc3860 R: 0000000030028170   .exception_entry+0x180
+     S: 0000000031cc3a40 R: 0000000000001f10 *
+     S: 0000000031cc3c20 R: 000000003006ecb0   .probe_phb4+0x54c
+     S: 0000000031cc3e30 R: 0000000030014ca4   .main_cpu_entry+0x5b0
+     S: 0000000031cc3f00 R: 0000000030002700   boot_entry+0x1b8
+
+  This is caused as phb4_probe_stack() will ignore all xscom read/write
+  errors to enable PHB Bars and then tries to perform an mmio to read
+  PHB Version registers that cause the fatal MCE.
+
+  We fix this by ignoring the PHB probe if the first xscom_write() to
+  populate the PHB Bar register fails, which indicates that there is
+  something wrong with the PHB.

[060/110] skiboot 6.0.9 release notes

Commit Message

Patch