Message ID | 20241106184722.17230-3-christopher.s.hall@intel.com |
---|---|
State | Changes Requested |
Headers | show |
Series | igc: Fix PTM timeout | expand |
Dear Christopher, Thank you for the patch. I’d use the more specific summary/title below: igc: Lengthen hardware retry time to 4 μs to prevent timeouts Am 06.11.24 um 19:47 schrieb Christopher S M Hall: > Lengthen the hardware retry timer to four microseconds. > > The i225/i226 hardware retries if it receives an inappropriate response > from the upstream device. If the device retries too quickly, the root > port does not respond. Any idea why? Is it documented somewhere? > The issue can be reproduced with the following: > > $ sudo phc2sys -R 1000 -O 0 -i tsn0 -m > > Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to > quickly reproduce the issue. > > PHC2SYS exits with: > > "ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction > fails Why four microseconds, and not some other value? > Fixes: 6b8aa753a9f9 ("igc: Decrease PTM short interval from 10 us to 1 us") > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> > Tested-by: Avigail Dahan <avigailx.dahan@intel.com> > Signed-off-by: Christopher S M Hall <christopher.s.hall@intel.com> > --- > drivers/net/ethernet/intel/igc/igc_defines.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h > index 2ff292f5f63b..84521a4c35b4 100644 > --- a/drivers/net/ethernet/intel/igc/igc_defines.h > +++ b/drivers/net/ethernet/intel/igc/igc_defines.h > @@ -574,7 +574,7 @@ > #define IGC_PTM_CTRL_SHRT_CYC(usec) (((usec) & 0x3f) << 2) > #define IGC_PTM_CTRL_PTM_TO(usec) (((usec) & 0xff) << 8) > > -#define IGC_PTM_SHORT_CYC_DEFAULT 1 /* Default short cycle interval */ > +#define IGC_PTM_SHORT_CYC_DEFAULT 4 /* Default short cycle interval */ > #define IGC_PTM_CYC_TIME_DEFAULT 5 /* Default PTM cycle time */ > #define IGC_PTM_TIMEOUT_DEFAULT 255 /* Default timeout for PTM errors */ Kind regards, Paul
Hi Paul, > From: Paul Menzel <pmenzel@molgen.mpg.de> > Sent: Wednesday, November 06, 2024 3:14 PM > Subject: Re: [Intel-wired-lan] [PATCH iwl-net v3 2/6] igc: Lengthen the > hardware retry time to prevent timeouts > > Dear Christopher, > > > Thank you for the patch. > > I’d use the more specific summary/title below: Will do. > > igc: Lengthen hardware retry time to 4 μs to prevent timeouts > > Am 06.11.24 um 19:47 schrieb Christopher S M Hall: > > Lengthen the hardware retry timer to four microseconds. > > > > The i225/i226 hardware retries if it receives an inappropriate response > > from the upstream device. If the device retries too quickly, the root > > port does not respond. > > Any idea why? Is it documented somewhere? I do not. Theoretically, 1 us should work, but it does not. It could be a root port problem or an issue with i225/i226 NIC. I am not able to directly observe the state of either. 4 us has worked in all my testing I am comfortable with that value. 2 us also works, but given the limited hardware at my disposal I doubled the value to 4 us to be safe. PTM is not time critical. Typically, software initiates a transaction between 8 and 32 times per second. There is no performance impact for PTM or any other function of the card. The timeout occurs rarely, but if the retry time is too short the PTM state machine does not recover. > > The issue can be reproduced with the following: > > > > $ sudo phc2sys -R 1000 -O 0 -i tsn0 -m > > > > Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to > > quickly reproduce the issue. > > > > PHC2SYS exits with: > > > > "ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM > transaction > > fails > > Why four microseconds, and not some other value? See above. > > Fixes: 6b8aa753a9f9 ("igc: Decrease PTM short interval from 10 us to 1 us") > > > > -#define IGC_PTM_SHORT_CYC_DEFAULT 1 /* Default short cycle > interval */ > > +#define IGC_PTM_SHORT_CYC_DEFAULT 4 /* Default short cycle > interval */ > Kind regards, > > Paul Thanks, Chris
[Cc: +Sasha] Dear Christopher, Am 07.11.24 um 00:53 schrieb Hall, Christopher S: >> From: Paul Menzel <pmenzel@molgen.mpg.de> >> Sent: Wednesday, November 06, 2024 3:14 PM > >> Subject: Re: [Intel-wired-lan] [PATCH iwl-net v3 2/6] igc: Lengthen the >> hardware retry time to prevent timeouts >> I’d use the more specific summary/title below: > > Will do. > >> igc: Lengthen hardware retry time to 4 μs to prevent timeouts >> >> Am 06.11.24 um 19:47 schrieb Christopher S M Hall: >>> Lengthen the hardware retry timer to four microseconds. >>> >>> The i225/i226 hardware retries if it receives an inappropriate response >>> from the upstream device. If the device retries too quickly, the root >>> port does not respond. >> >> Any idea why? Is it documented somewhere? > > I do not. Theoretically, 1 us should work, but it does not. It could be a root > port problem or an issue with i225/i226 NIC. I am not able to directly observe > the state of either. 4 us has worked in all my testing I am comfortable with > that value. 2 us also works, but given the limited hardware at my disposal > I doubled the value to 4 us to be safe. PTM is not time critical. Typically, > software initiates a transaction between 8 and 32 times per second. There > is no performance impact for PTM or any other function of the card. The > timeout occurs rarely, but if the retry time is too short the PTM state > machine does not recover. Thank you for clearing this up. If it’s not time critical, why not revert the original patch and go back to 10 μs. The referenced commit 6b8aa753a9f9 (igc: Decrease PTM short interval from 10 us to 1 us) also says, that 1 μs was suggested by the hardware team. Were you able to talk to them? >>> The issue can be reproduced with the following: >>> >>> $ sudo phc2sys -R 1000 -O 0 -i tsn0 -m >>> >>> Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to >>> quickly reproduce the issue. >>> >>> PHC2SYS exits with: >>> >>> "ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction >>> fails >> >> Why four microseconds, and not some other value? > > See above. It’d be great, if you extended the commit message. >>> Fixes: 6b8aa753a9f9 ("igc: Decrease PTM short interval from 10 us to 1 us") >>> >>> -#define IGC_PTM_SHORT_CYC_DEFAULT 1 /* Default short cycle interval */ >>> +#define IGC_PTM_SHORT_CYC_DEFAULT 4 /* Default short cycle interval */ Maybe also add a comment, that 1 μs should work, but does not. Kind regards, Paul
[Cc: -Sasha, 550 #5.1.0 Address rejected.] Am 07.11.24 um 06:48 schrieb Paul Menzel: > [Cc: +Sasha] > > Dear Christopher, > > > Am 07.11.24 um 00:53 schrieb Hall, Christopher S: > >>> From: Paul Menzel <pmenzel@molgen.mpg.de> >>> Sent: Wednesday, November 06, 2024 3:14 PM >> >>> Subject: Re: [Intel-wired-lan] [PATCH iwl-net v3 2/6] igc: Lengthen the hardware retry time to prevent timeouts > >>> I’d use the more specific summary/title below: >> >> Will do. >> >>> igc: Lengthen hardware retry time to 4 μs to prevent timeouts >>> >>> Am 06.11.24 um 19:47 schrieb Christopher S M Hall: >>>> Lengthen the hardware retry timer to four microseconds. >>>> >>>> The i225/i226 hardware retries if it receives an inappropriate response >>>> from the upstream device. If the device retries too quickly, the root >>>> port does not respond. >>> >>> Any idea why? Is it documented somewhere? >> >> I do not. Theoretically, 1 us should work, but it does not. It could be a root >> port problem or an issue with i225/i226 NIC. I am not able to directly observe >> the state of either. 4 us has worked in all my testing I am comfortable with >> that value. 2 us also works, but given the limited hardware at my disposal >> I doubled the value to 4 us to be safe. PTM is not time critical. Typically, >> software initiates a transaction between 8 and 32 times per second. There >> is no performance impact for PTM or any other function of the card. The >> timeout occurs rarely, but if the retry time is too short the PTM state >> machine does not recover. > > Thank you for clearing this up. If it’s not time critical, why not > revert the original patch and go back to 10 μs. > > The referenced commit 6b8aa753a9f9 (igc: Decrease PTM short interval > from 10 us to 1 us) also says, that 1 μs was suggested by the hardware > team. Were you able to talk to them? > >>>> The issue can be reproduced with the following: >>>> >>>> $ sudo phc2sys -R 1000 -O 0 -i tsn0 -m >>>> >>>> Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to >>>> quickly reproduce the issue. >>>> >>>> PHC2SYS exits with: >>>> >>>> "ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction >>>> fails >>> >>> Why four microseconds, and not some other value? >> >> See above. > > It’d be great, if you extended the commit message. > >>>> Fixes: 6b8aa753a9f9 ("igc: Decrease PTM short interval from 10 us to >>>> 1 us") >>>> >>>> -#define IGC_PTM_SHORT_CYC_DEFAULT 1 /* Default short cycle interval */ >>>> +#define IGC_PTM_SHORT_CYC_DEFAULT 4 /* Default short cycle interval */ > > Maybe also add a comment, that 1 μs should work, but does not. > > > Kind regards, > > Paul
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h index 2ff292f5f63b..84521a4c35b4 100644 --- a/drivers/net/ethernet/intel/igc/igc_defines.h +++ b/drivers/net/ethernet/intel/igc/igc_defines.h @@ -574,7 +574,7 @@ #define IGC_PTM_CTRL_SHRT_CYC(usec) (((usec) & 0x3f) << 2) #define IGC_PTM_CTRL_PTM_TO(usec) (((usec) & 0xff) << 8) -#define IGC_PTM_SHORT_CYC_DEFAULT 1 /* Default short cycle interval */ +#define IGC_PTM_SHORT_CYC_DEFAULT 4 /* Default short cycle interval */ #define IGC_PTM_CYC_TIME_DEFAULT 5 /* Default PTM cycle time */ #define IGC_PTM_TIMEOUT_DEFAULT 255 /* Default timeout for PTM errors */