Message ID | 20201209143707.13503-2-erez.geva.ext@siemens.com |
---|---|
State | Superseded |
Headers | show |
Series | Add sending TX hardware timestamp for TC ETF Qdisc | expand |
On Wed, Dec 9, 2020 at 10:25 AM Geva, Erez <erez.geva.ext@siemens.com> wrote: > > > On 09/12/2020 15:48, Willem de Bruijn wrote: > > On Wed, Dec 9, 2020 at 9:37 AM Erez Geva <erez.geva.ext@siemens.com> wrote: > >> > >> Configure and send TX sending hardware timestamp from > >> user space application to the socket layer, > >> to provide to the TC ETC Qdisc, and pass it to > >> the interface network driver. > >> > >> - New flag for the SO_TXTIME socket option. > >> - New access auxiliary data header to pass the > >> TX sending hardware timestamp. > >> - Add the hardware timestamp to the socket cookie. > >> - Copy the TX sending hardware timestamp to the socket cookie. > >> > >> Signed-off-by: Erez Geva <erez.geva.ext@siemens.com> > > > > Hardware offload of pacing is definitely useful. > > > Thanks for your comment. > I agree, it is not limited of use. > > > I don't think this needs a new separate h/w variant of SO_TXTIME. > > > I only extend SO_TXTIME. The patchset passes a separate timestamp from skb->tstamp along through the ip cookie, cork (transmit_hw_time) and with the skb in shinfo. I don't see the need for two timestamps, one tied to software and one to hardware. When would we want to pace twice? > > Indeed, we want pacing offload to work for existing applications. > > > As the conversion of the PHC and the system clock is dynamic over time. > How do you propse to achive it? Can you elaborate on this concern? The simplest solution for offloading pacing would be to interpret skb->tstamp either for software pacing, or skip software pacing if the device advertises a NETIF_F hardware pacing feature. Clockbase is an issue. The device driver may have to convert to whatever format the device expects when copying skb->tstamp in the device tx descriptor. > > > It only requires that pacing qdiscs, both sch_etf and sch_fq, > > optionally skip queuing in their .enqueue callback and instead allow > > the skb to pass to the device driver as is, with skb->tstamp set. Only > > to devices that advertise support for h/w pacing offload. > > > I did not use "Fair Queue traffic policing". > As for ETF, it is all about ordering packets from different applications. > How can we achive it with skiping queuing? > Could you elaborate on this point? The qdisc can only defer pacing to hardware if hardware can ensure the same invariants on ordering, of course. Btw: this is quite a long list of CC:s
On 09/12/2020 18:37, Willem de Bruijn wrote: > On Wed, Dec 9, 2020 at 10:25 AM Geva, Erez <erez.geva.ext@siemens.com> wrote: >> >> >> On 09/12/2020 15:48, Willem de Bruijn wrote: >>> On Wed, Dec 9, 2020 at 9:37 AM Erez Geva <erez.geva.ext@siemens.com> wrote: >>>> >>>> Configure and send TX sending hardware timestamp from >>>> user space application to the socket layer, >>>> to provide to the TC ETC Qdisc, and pass it to >>>> the interface network driver. >>>> >>>> - New flag for the SO_TXTIME socket option. >>>> - New access auxiliary data header to pass the >>>> TX sending hardware timestamp. >>>> - Add the hardware timestamp to the socket cookie. >>>> - Copy the TX sending hardware timestamp to the socket cookie. >>>> >>>> Signed-off-by: Erez Geva <erez.geva.ext@siemens.com> >>> >>> Hardware offload of pacing is definitely useful. >>> >> Thanks for your comment. >> I agree, it is not limited of use. >> >>> I don't think this needs a new separate h/w variant of SO_TXTIME. >>> >> I only extend SO_TXTIME. > > The patchset passes a separate timestamp from skb->tstamp along > through the ip cookie, cork (transmit_hw_time) and with the skb in > shinfo. > > I don't see the need for two timestamps, one tied to software and one > to hardware. When would we want to pace twice? As the Net-Link uses system clock and the network interface hardware uses it's own PHC. The current ETF depends on synchronizing the system clock and the PHC. >>> Indeed, we want pacing offload to work for existing applications. >>> >> As the conversion of the PHC and the system clock is dynamic over time. >> How do you propse to achive it? > > Can you elaborate on this concern? Using single time stamp have 3 possible solutions: 1. Current solution, synchronize the system clock and the PHC. Application uses the system clock. The ETF can use the system clock for ordering and pass the packet to the driver on time The network interface hardware compare the time-stamp to the PHC. 2. The application convert the PHC time-stamp to system clock based. The ETF works as solution 1 The network driver convert the system clock time-stamp back to PHC time-stamp. This solution need a new Net-Link flag and modify the relevant network drivers. Yet this solution have 2 problems: * As applications today are not aware that system clock and PHC are not synchronized and therefore do not perform any conversion, most of them only use the system clock. * As the conversion in the network driver happens ~300 - 600 microseconds after the application send the packet. And as the PHC and system clock frequencies and offset can change during this period. The conversion will produce a different PHC time-stamp from the application original time-stamp. We require a precession of 1 nanoseconds of the PHC time-stamp. 3. The application uses PHC time-stamp for skb->tstamp The ETF convert the PHC time-stamp to system clock time-stamp. This solution require implementations on supporting reading PHC clocks from IRQ/kernel thread context in kernel space. Just for clarification: ETF as all Net-Link, only uses system clock (the TAI) The network interface hardware only uses the PHC. Nor Net-Link neither the driver perform any conversions. The Kernel does not provide and clock conversion beside system clock. Linux kernel is a single clock system. > > The simplest solution for offloading pacing would be to interpret > skb->tstamp either for software pacing, or skip software pacing if the > device advertises a NETIF_F hardware pacing feature. That will defy the purpose of ETF. ETF exist for ordering packets. Why should the device driver defer it? Simply do not use the QDISC for this interface. > > Clockbase is an issue. The device driver may have to convert to > whatever format the device expects when copying skb->tstamp in the > device tx descriptor. We do hope our definition is clear. In the current kernel skb->tstamp uses system clock. The hardware time-stamp is PHC based, as it is used today for PTP two steps. We only propose to use the same hardware time-stamp. Passing the hardware time-stamp to the skb->tstamp might seems a bit tricky The gaol is the leave the driver unaware to whether we * Synchronizing the PHC and system clock * The ETF pass the hardware time-stamp to skb->tstamp Only the applications and the ETF are aware. The application can detect by checking the ETF flag. The ETF flags are part of the network administration. That also configure the PTP and the system clock synchronization. > >> >>> It only requires that pacing qdiscs, both sch_etf and sch_fq, >>> optionally skip queuing in their .enqueue callback and instead allow >>> the skb to pass to the device driver as is, with skb->tstamp set. Only >>> to devices that advertise support for h/w pacing offload. >>> >> I did not use "Fair Queue traffic policing". >> As for ETF, it is all about ordering packets from different applications. >> How can we achive it with skiping queuing? >> Could you elaborate on this point? > > The qdisc can only defer pacing to hardware if hardware can ensure the > same invariants on ordering, of course. Yes, this is why we suggest ETF order packets using the hardware time-stamp. And pass the packet based on system time. So ETF query the system clock only and not the PHC. > > Btw: this is quite a long list of CC:s > I need to update my company colleagues as well as the Linux group.
Hi Erez, Thank you for the patch! Yet something to improve: [auto build test ERROR on b65054597872ce3aefbc6a666385eabdf9e288da] url: https://github.com/0day-ci/linux/commits/Erez-Geva/Add-sending-TX-hardware-timestamp-for-TC-ETF-Qdisc/20201210-000521 base: b65054597872ce3aefbc6a666385eabdf9e288da config: mips-randconfig-r026-20201209 (attached as .config) compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1968804ac726e7674d5de22bc2204b45857da344) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install mips cross compiling tool for clang build # apt-get install binutils-mips-linux-gnu # https://github.com/0day-ci/linux/commit/8a8f634bc74db16dc551cfcf3b63c1183f98eaac git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Erez-Geva/Add-sending-TX-hardware-timestamp-for-TC-ETF-Qdisc/20201210-000521 git checkout 8a8f634bc74db16dc551cfcf3b63c1183f98eaac # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=mips If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> net/core/sock.c:2383:7: error: use of undeclared identifier 'SCM_HW_TXTIME'; did you mean 'SOCK_HW_TXTIME'? case SCM_HW_TXTIME: ^~~~~~~~~~~~~ SOCK_HW_TXTIME include/net/sock.h:862:2: note: 'SOCK_HW_TXTIME' declared here SOCK_HW_TXTIME, ^ 1 error generated. vim +2383 net/core/sock.c 2351 2352 int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg, 2353 struct sockcm_cookie *sockc) 2354 { 2355 u32 tsflags; 2356 2357 switch (cmsg->cmsg_type) { 2358 case SO_MARK: 2359 if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) 2360 return -EPERM; 2361 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32))) 2362 return -EINVAL; 2363 sockc->mark = *(u32 *)CMSG_DATA(cmsg); 2364 break; 2365 case SO_TIMESTAMPING_OLD: 2366 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32))) 2367 return -EINVAL; 2368 2369 tsflags = *(u32 *)CMSG_DATA(cmsg); 2370 if (tsflags & ~SOF_TIMESTAMPING_TX_RECORD_MASK) 2371 return -EINVAL; 2372 2373 sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK; 2374 sockc->tsflags |= tsflags; 2375 break; 2376 case SCM_TXTIME: 2377 if (!sock_flag(sk, SOCK_TXTIME)) 2378 return -EINVAL; 2379 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64))) 2380 return -EINVAL; 2381 sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg)); 2382 break; > 2383 case SCM_HW_TXTIME: 2384 if (!sock_flag(sk, SOCK_HW_TXTIME)) 2385 return -EINVAL; 2386 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64))) 2387 return -EINVAL; 2388 sockc->transmit_hw_time = get_unaligned((u64 *)CMSG_DATA(cmsg)); 2389 break; 2390 /* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */ 2391 case SCM_RIGHTS: 2392 case SCM_CREDENTIALS: 2393 break; 2394 default: 2395 return -EINVAL; 2396 } 2397 return 0; 2398 } 2399 EXPORT_SYMBOL(__sock_cmsg_send); 2400 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On 10/12/2020 04:11, kernel test robot wrote: > Hi Erez, > > Thank you for the patch! Yet something to improve: > Thanks for the robot, as we rarely use clang for kernel. It is very helpful. > [auto build test ERROR on b65054597872ce3aefbc6a666385eabdf9e288da] > > url: https://github.com/0day-ci/linux/commits/Erez-Geva/Add-sending-TX-hardware-timestamp-for-TC-ETF-Qdisc/20201210-000521 I can not find this commit > base: b65054597872ce3aefbc6a666385eabdf9e288da > config: mips-randconfig-r026-20201209 (attached as .config) > compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1968804ac726e7674d5de22bc2204b45857da344) However the clang in https://download.01.org/0day-ci/cross-package/clang-latest/clang.tar.xz is version 11 > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross Your make cross script tries to download the clang every time. Please separate the download (which is ~400 MB and 2 GB after open) from the compilation. Please use "wget" follow your own instructions in this email. > chmod +x ~/bin/make.cross > # install mips cross compiling tool for clang build > # apt-get install binutils-mips-linux-gnu > # https://github.com/0day-ci/linux/commit/8a8f634bc74db16dc551cfcf3b63c1183f98eaac > git remote add linux-review https://github.com/0day-ci/linux > git fetch --no-tags linux-review Erez-Geva/Add-sending-TX-hardware-timestamp-for-TC-ETF-Qdisc/20201210-000521 This branch is absent > git checkout 8a8f634bc74db16dc551cfcf3b63c1183f98eaac This commit as well > # save the attached .config to linux build tree > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=mips > I use Debian 10.7. I usually compile with GCC. I have not see any errors. When I use clang 11 from download.01.org I get a crash right away. Please add a proper instructions how to use clang on Debian or provide a Docker container with updated clang for testing. > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot <lkp@intel.com> > > All errors (new ones prefixed by >>): > >>> net/core/sock.c:2383:7: error: use of undeclared identifier 'SCM_HW_TXTIME'; did you mean 'SOCK_HW_TXTIME'? > case SCM_HW_TXTIME: > ^~~~~~~~~~~~~ > SOCK_HW_TXTIME > include/net/sock.h:862:2: note: 'SOCK_HW_TXTIME' declared here > SOCK_HW_TXTIME, > ^ > 1 error generated. > > vim +2383 net/core/sock.c > > 2351 > 2352 int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg, > 2353 struct sockcm_cookie *sockc) > 2354 { > 2355 u32 tsflags; > 2356 > 2357 switch (cmsg->cmsg_type) { > 2358 case SO_MARK: > 2359 if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) > 2360 return -EPERM; > 2361 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32))) > 2362 return -EINVAL; > 2363 sockc->mark = *(u32 *)CMSG_DATA(cmsg); > 2364 break; > 2365 case SO_TIMESTAMPING_OLD: > 2366 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32))) > 2367 return -EINVAL; > 2368 > 2369 tsflags = *(u32 *)CMSG_DATA(cmsg); > 2370 if (tsflags & ~SOF_TIMESTAMPING_TX_RECORD_MASK) > 2371 return -EINVAL; > 2372 > 2373 sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK; > 2374 sockc->tsflags |= tsflags; > 2375 break; > 2376 case SCM_TXTIME: > 2377 if (!sock_flag(sk, SOCK_TXTIME)) > 2378 return -EINVAL; > 2379 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64))) > 2380 return -EINVAL; > 2381 sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg)); > 2382 break; >> 2383 case SCM_HW_TXTIME: > 2384 if (!sock_flag(sk, SOCK_HW_TXTIME)) > 2385 return -EINVAL; > 2386 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64))) > 2387 return -EINVAL; > 2388 sockc->transmit_hw_time = get_unaligned((u64 *)CMSG_DATA(cmsg)); > 2389 break; > 2390 /* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */ > 2391 case SCM_RIGHTS: > 2392 case SCM_CREDENTIALS: > 2393 break; > 2394 default: > 2395 return -EINVAL; > 2396 } > 2397 return 0; > 2398 } > 2399 EXPORT_SYMBOL(__sock_cmsg_send); > 2400 > > --- > 0-DAY CI Kernel Test Service, Intel Corporation > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org > Please improve the robot, so we can comply and properly support clang compilation. Thanks Erez
On 10/12/2020 13:33, Geva, Erez (ext) (DI PA CI R&D 3) wrote: > > On 10/12/2020 04:11, kernel test robot wrote: >> Hi Erez, >> >> Thank you for the patch! Yet something to improve: >> > Thanks for the robot, > as we rarely use clang for kernel. It is very helpful. > >> [auto build test ERROR on b65054597872ce3aefbc6a666385eabdf9e288da] >> >> url: https://github.com/0day-ci/linux/commits/Erez-Geva/Add-sending-TX-hardware-timestamp-for-TC-ETF-Qdisc/20201210-000521 > I can not find this commit > >> base: b65054597872ce3aefbc6a666385eabdf9e288da >> config: mips-randconfig-r026-20201209 (attached as .config) >> compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1968804ac726e7674d5de22bc2204b45857da344) > However the clang in > https://download.01.org/0day-ci/cross-package/clang-latest/clang.tar.xz is version 11 > >> reproduce (this is a W=1 build): >> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > Your make cross script tries to download the clang every time. > Please separate the download (which is ~400 MB and 2 GB after open) from the compilation. > > Please use "wget" follow your own instructions in this email. > >> chmod +x ~/bin/make.cross >> # install mips cross compiling tool for clang build >> # apt-get install binutils-mips-linux-gnu >> # https://github.com/0day-ci/linux/commit/8a8f634bc74db16dc551cfcf3b63c1183f98eaac >> git remote add linux-review https://github.com/0day-ci/linux >> git fetch --no-tags linux-review Erez-Geva/Add-sending-TX-hardware-timestamp-for-TC-ETF-Qdisc/20201210-000521 > This branch is absent > >> git checkout 8a8f634bc74db16dc551cfcf3b63c1183f98eaac > This commit as well > >> # save the attached .config to linux build tree >> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=mips >> > I use Debian 10.7. > I usually compile with GCC. I have not see any errors. > > When I use clang 11 from download.01.org I get a crash right away. > Please add a proper instructions how to use clang on Debian or provide > a Docker container with updated clang for testing. > >> If you fix the issue, kindly add following tag as appropriate >> Reported-by: kernel test robot <lkp@intel.com> >> >> All errors (new ones prefixed by >>): >> >>>> net/core/sock.c:2383:7: error: use of undeclared identifier 'SCM_HW_TXTIME'; did you mean 'SOCK_HW_TXTIME'? >> case SCM_HW_TXTIME: >> ^~~~~~~~~~~~~ >> SOCK_HW_TXTIME >> include/net/sock.h:862:2: note: 'SOCK_HW_TXTIME' declared here >> SOCK_HW_TXTIME, >> ^ >> 1 error generated. >> >> vim +2383 net/core/sock.c >> >> 2351 >> 2352 int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg, >> 2353 struct sockcm_cookie *sockc) >> 2354 { >> 2355 u32 tsflags; >> 2356 >> 2357 switch (cmsg->cmsg_type) { >> 2358 case SO_MARK: >> 2359 if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) >> 2360 return -EPERM; >> 2361 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32))) >> 2362 return -EINVAL; >> 2363 sockc->mark = *(u32 *)CMSG_DATA(cmsg); >> 2364 break; >> 2365 case SO_TIMESTAMPING_OLD: >> 2366 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32))) >> 2367 return -EINVAL; >> 2368 >> 2369 tsflags = *(u32 *)CMSG_DATA(cmsg); >> 2370 if (tsflags & ~SOF_TIMESTAMPING_TX_RECORD_MASK) >> 2371 return -EINVAL; >> 2372 >> 2373 sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK; >> 2374 sockc->tsflags |= tsflags; >> 2375 break; >> 2376 case SCM_TXTIME: >> 2377 if (!sock_flag(sk, SOCK_TXTIME)) >> 2378 return -EINVAL; >> 2379 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64))) >> 2380 return -EINVAL; >> 2381 sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg)); >> 2382 break; >>> 2383 case SCM_HW_TXTIME: >> 2384 if (!sock_flag(sk, SOCK_HW_TXTIME)) >> 2385 return -EINVAL; >> 2386 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64))) >> 2387 return -EINVAL; >> 2388 sockc->transmit_hw_time = get_unaligned((u64 *)CMSG_DATA(cmsg)); >> 2389 break; >> 2390 /* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */ >> 2391 case SCM_RIGHTS: >> 2392 case SCM_CREDENTIALS: >> 2393 break; >> 2394 default: >> 2395 return -EINVAL; >> 2396 } >> 2397 return 0; >> 2398 } >> 2399 EXPORT_SYMBOL(__sock_cmsg_send); >> 2400 >> >> --- >> 0-DAY CI Kernel Test Service, Intel Corporation >> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org >> > > Please improve the robot, so we can comply and properly support clang compilation. > > Thanks > Erez > Update, I use the same .config from the Intel robot test. I was trying to compile v5.10-rc6 with GCC cross compiler for mips. # apt-get install -t sid gcc-mips-linux-gnu kernel-test ((v5.10-rc6))$ /usr/bin/mips-linux-gnu-gcc --version mips-linux-gnu-gcc (Debian 10.2.0-17) 10.2.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. kernel-test ((v5.10-rc6))$ cp ../intel_robot.config .config kernel-test ((v5.10-rc6))$ make ARCH=mips CROSS_COMPILE=/usr/bin/mips-linux-gnu- olddefconfig ... kernel-test ((v5.10-rc6))$ make ARCH=mips CROSS_COMPILE=/usr/bin/mips-linux-gnu- all ... CC init/main.o {standard input}: Assembler messages: {standard input}:9103: Error: found '(', expected: ')' {standard input}:9103: Error: found '(', expected: ')' {standard input}:9103: Error: non-constant expression in ".if" statement {standard input}:9103: Error: junk at end of line, first unrecognized character is `(' make[1]: *** [scripts/Makefile.build:283: init/main.o] Error 1 make: *** [Makefile:1797: init] Error 2 Erez
On Wed, Dec 9, 2020 at 3:18 PM Geva, Erez <erez.geva.ext@siemens.com> wrote: > > > On 09/12/2020 18:37, Willem de Bruijn wrote: > > On Wed, Dec 9, 2020 at 10:25 AM Geva, Erez <erez.geva.ext@siemens.com> wrote: > >> > >> > >> On 09/12/2020 15:48, Willem de Bruijn wrote: > >>> On Wed, Dec 9, 2020 at 9:37 AM Erez Geva <erez.geva.ext@siemens.com> wrote: > >>>> > >>>> Configure and send TX sending hardware timestamp from > >>>> user space application to the socket layer, > >>>> to provide to the TC ETC Qdisc, and pass it to > >>>> the interface network driver. > >>>> > >>>> - New flag for the SO_TXTIME socket option. > >>>> - New access auxiliary data header to pass the > >>>> TX sending hardware timestamp. > >>>> - Add the hardware timestamp to the socket cookie. > >>>> - Copy the TX sending hardware timestamp to the socket cookie. > >>>> > >>>> Signed-off-by: Erez Geva <erez.geva.ext@siemens.com> > >>> > >>> Hardware offload of pacing is definitely useful. > >>> > >> Thanks for your comment. > >> I agree, it is not limited of use. > >> > >>> I don't think this needs a new separate h/w variant of SO_TXTIME. > >>> > >> I only extend SO_TXTIME. > > > > The patchset passes a separate timestamp from skb->tstamp along > > through the ip cookie, cork (transmit_hw_time) and with the skb in > > shinfo. > > > > I don't see the need for two timestamps, one tied to software and one > > to hardware. When would we want to pace twice? > > As the Net-Link uses system clock and the network interface hardware uses it's own PHC. > The current ETF depends on synchronizing the system clock and the PHC. If I understand correctly, you are trying to achieve a single delivery time. The need for two separate timestamps passed along is only because the kernel is unable to do the time base conversion. Else, ETF could program the qdisc watchdog in system time and later, on dequeue, convert skb->tstamp to the h/w time base before passing it to the device. It's still not entirely clear to me why the packet has to be held by ETF initially first, if it is held until delivery time by hardware later. But more on that below. So far, the use case sounds a bit narrow and the use of two timestamp fields for a single delivery event a bit of a hack. And one that does impose a cost in the hot path of many workloads by adding a field the ip cookie, cork and writing to (possibly cold) skb_shinfo for every packet. > >>> Indeed, we want pacing offload to work for existing applications. > >>> > >> As the conversion of the PHC and the system clock is dynamic over time. > >> How do you propse to achive it? > > > > Can you elaborate on this concern? > > Using single time stamp have 3 possible solutions: > > 1. Current solution, synchronize the system clock and the PHC. > Application uses the system clock. > The ETF can use the system clock for ordering and pass the packet to the driver on time > The network interface hardware compare the time-stamp to the PHC. > > 2. The application convert the PHC time-stamp to system clock based. > The ETF works as solution 1 > The network driver convert the system clock time-stamp back to PHC time-stamp. > This solution need a new Net-Link flag and modify the relevant network drivers. > Yet this solution have 2 problems: > * As applications today are not aware that system clock and PHC are not synchronized and > therefore do not perform any conversion, most of them only use the system clock. > * As the conversion in the network driver happens ~300 - 600 microseconds after > the application send the packet. > And as the PHC and system clock frequencies and offset can change during this period. > The conversion will produce a different PHC time-stamp from the application original time-stamp. > We require a precession of 1 nanoseconds of the PHC time-stamp. > > 3. The application uses PHC time-stamp for skb->tstamp > The ETF convert the PHC time-stamp to system clock time-stamp. > This solution require implementations on supporting reading PHC clocks > from IRQ/kernel thread context in kernel space. ETF has to release the packet well in advance of the hardware timestamp for the packet to arrive at the device on time. In practice I would expect this delta parameter to be at least at usec timescale. That gives some wiggle room with regard to s/w tstamp, at least. If changes in clock distance are relatively infrequent, could this clock diff be a qdisc parameter, updated infrequently outside the packet path? It would even be preferable if the qdisc and core stack could be ignorant of such hardware clocks and the time base is converted by the device driver when encoding skb->tstamp into the tx descriptor. Is the device hardware clock readable by the driver? From the above, it sounds like this is not trivial. I don't know which exact device you're targeting. Is it an in-tree driver? > Just for clarification: > ETF as all Net-Link, only uses system clock (the TAI) > The network interface hardware only uses the PHC. > Nor Net-Link neither the driver perform any conversions. > The Kernel does not provide and clock conversion beside system clock. > Linux kernel is a single clock system. > > > > > The simplest solution for offloading pacing would be to interpret > > skb->tstamp either for software pacing, or skip software pacing if the > > device advertises a NETIF_F hardware pacing feature. > > That will defy the purpose of ETF. > ETF exist for ordering packets. > Why should the device driver defer it? > Simply do not use the QDISC for this interface. ETF queues packets until their delivery time is reached. It does not order for any other reason than to calculate the next qdisc watchdog event, really. If h/w can do the same and the driver can convert skb->tstamp to the right timebase, indeed no qdisc is needed for pacing. But there may be a need for selective h/w offload if h/w has additional constraints, such as on the number of packets outstanding or time horizon. > > > > Clockbase is an issue. The device driver may have to convert to > > whatever format the device expects when copying skb->tstamp in the > > device tx descriptor. > > We do hope our definition is clear. > In the current kernel skb->tstamp uses system clock. > The hardware time-stamp is PHC based, as it is used today for PTP two steps. > We only propose to use the same hardware time-stamp. > > Passing the hardware time-stamp to the skb->tstamp might seems a bit tricky > The gaol is the leave the driver unaware to whether we > * Synchronizing the PHC and system clock > * The ETF pass the hardware time-stamp to skb->tstamp > Only the applications and the ETF are aware. > The application can detect by checking the ETF flag. > The ETF flags are part of the network administration. > That also configure the PTP and the system clock synchronization. > > > > >> > >>> It only requires that pacing qdiscs, both sch_etf and sch_fq, > >>> optionally skip queuing in their .enqueue callback and instead allow > >>> the skb to pass to the device driver as is, with skb->tstamp set. Only > >>> to devices that advertise support for h/w pacing offload. > >>> > >> I did not use "Fair Queue traffic policing". > >> As for ETF, it is all about ordering packets from different applications. > >> How can we achive it with skiping queuing? > >> Could you elaborate on this point? > > > > The qdisc can only defer pacing to hardware if hardware can ensure the > > same invariants on ordering, of course. > > Yes, this is why we suggest ETF order packets using the hardware time-stamp. > And pass the packet based on system time. > So ETF query the system clock only and not the PHC. On which note: with this patch set all applications have to agree to use h/w time base in etf_enqueue_timesortedlist. In practice that makes this h/w mode a qdisc used by a single process? > > > > Btw: this is quite a long list of CC:s > > > I need to update my company colleagues as well as the Linux group. Of course. But even ignoring that this is still quite a large list (> 40). My response yesterday was actually blocked as a result ;) Retrying now.
diff --git a/include/net/sock.h b/include/net/sock.h index a5c6ae78df77..dd5bfd42b4e2 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -859,6 +859,7 @@ enum sock_flags { SOCK_SELECT_ERR_QUEUE, /* Wake select on error queue */ SOCK_RCU_FREE, /* wait rcu grace period in sk_destruct() */ SOCK_TXTIME, + SOCK_HW_TXTIME, SOCK_XDP, /* XDP is attached */ SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */ }; @@ -1690,6 +1691,7 @@ void sk_send_sigurg(struct sock *sk); struct sockcm_cookie { u64 transmit_time; + u64 transmit_hw_time; u32 mark; u16 tsflags; }; diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index 77f7c1638eb1..16265b00c25a 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -119,6 +119,9 @@ #define SO_DETACH_REUSEPORT_BPF 68 +#define SO_HW_TXTIME 69 +#define SCM_HW_TXTIME SO_HW_TXTIME + #if !defined(__KERNEL__) #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__)) diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h index 7ed0b3d1c00a..dd51c9a99b1f 100644 --- a/include/uapi/linux/net_tstamp.h +++ b/include/uapi/linux/net_tstamp.h @@ -162,8 +162,9 @@ struct scm_ts_pktinfo { enum txtime_flags { SOF_TXTIME_DEADLINE_MODE = (1 << 0), SOF_TXTIME_REPORT_ERRORS = (1 << 1), + SOF_TXTIME_USE_HW_TIMESTAMP = (1 << 2), - SOF_TXTIME_FLAGS_LAST = SOF_TXTIME_REPORT_ERRORS, + SOF_TXTIME_FLAGS_LAST = SOF_TXTIME_USE_HW_TIMESTAMP, SOF_TXTIME_FLAGS_MASK = (SOF_TXTIME_FLAGS_LAST - 1) | SOF_TXTIME_FLAGS_LAST }; diff --git a/net/core/sock.c b/net/core/sock.c index 727ea1cc633c..317dce54321b 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1227,6 +1227,8 @@ int sock_setsockopt(struct socket *sock, int level, int optname, break; } sock_valbool_flag(sk, SOCK_TXTIME, true); + sock_valbool_flag(sk, SOCK_HW_TXTIME, + sk_txtime.flags & SOF_TXTIME_USE_HW_TIMESTAMP); sk->sk_clockid = sk_txtime.clockid; sk->sk_txtime_deadline_mode = !!(sk_txtime.flags & SOF_TXTIME_DEADLINE_MODE); @@ -2378,6 +2380,13 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg, return -EINVAL; sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg)); break; + case SCM_HW_TXTIME: + if (!sock_flag(sk, SOCK_HW_TXTIME)) + return -EINVAL; + if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64))) + return -EINVAL; + sockc->transmit_hw_time = get_unaligned((u64 *)CMSG_DATA(cmsg)); + break; /* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */ case SCM_RIGHTS: case SCM_CREDENTIALS:
Configure and send TX sending hardware timestamp from user space application to the socket layer, to provide to the TC ETC Qdisc, and pass it to the interface network driver. - New flag for the SO_TXTIME socket option. - New access auxiliary data header to pass the TX sending hardware timestamp. - Add the hardware timestamp to the socket cookie. - Copy the TX sending hardware timestamp to the socket cookie. Signed-off-by: Erez Geva <erez.geva.ext@siemens.com> --- include/net/sock.h | 2 ++ include/uapi/asm-generic/socket.h | 3 +++ include/uapi/linux/net_tstamp.h | 3 ++- net/core/sock.c | 9 +++++++++ 4 files changed, 16 insertions(+), 1 deletion(-)