Message ID | 20170412141737.5881-4-mlichvar@redhat.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, Apr 12, 2017 at 10:17 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote: > Extend the skb_shared_hwtstamps structure with the index of the > real interface which received or transmitted the packet and the length > of the packet at layer 2. The original packet is received along with the timestamp. Why is this L2 length needed? > Add a SOF_TIMESTAMPING_OPT_PKTINFO flag to > the SO_TIMESTAMPING option to allow applications to get this information > as struct scm_ts_pktinfo in SCM_TIMESTAMPING_PKTINFO control message. This patch saves skb->dev->ifindex, which is the same as existing SOF_TIMESTAMPING_OPT_CMSG. See also the bug fix for that feature I sent yesterday: http://patchwork.ozlabs.org/patch/750197/ If the intent is to return a different ifindex, I would still suggest using the existing pktinfo infrastructure, but changing the ifindex that is recorded.
On Thu, Apr 13, 2017 at 10:37:07AM -0400, Willem de Bruijn wrote: > On Wed, Apr 12, 2017 at 10:17 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote: > > Extend the skb_shared_hwtstamps structure with the index of the > > real interface which received or transmitted the packet and the length > > of the packet at layer 2. > > The original packet is received along with the timestamp. But only outgoing packets, right? > Why is this L2 length needed? It's needed for incoming packets to allow converting of preamble timestamps to trailer timestamps. > > Add a SOF_TIMESTAMPING_OPT_PKTINFO flag to > > the SO_TIMESTAMPING option to allow applications to get this information > > as struct scm_ts_pktinfo in SCM_TIMESTAMPING_PKTINFO control message. > > This patch saves skb->dev->ifindex, which is the same as existing > SOF_TIMESTAMPING_OPT_CMSG. See also the bug fix for that > feature I sent yesterday: http://patchwork.ozlabs.org/patch/750197/ The main point is that it provides the index of the device which received the packet. It does duplicate the functionality of OPT_CMSG + IP_PKTINFO for outgoing packets, but I thought it might be useful with the TSONLY option. BTW, the original ifindex used to be in skb->skb_iif, but that changed in b6858177. > If the intent is to return a different ifindex, I would still suggest using > the existing pktinfo infrastructure, but changing the ifindex that is > recorded. How would the application get the l2 length? If this (l2 length, if_index) tuple is specific to timestamping, I think it would make sense to keep it out of the IP layer.
On Thu, Apr 13, 2017 at 11:18 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote: > On Thu, Apr 13, 2017 at 10:37:07AM -0400, Willem de Bruijn wrote: >> On Wed, Apr 12, 2017 at 10:17 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote: >> > Extend the skb_shared_hwtstamps structure with the index of the >> > real interface which received or transmitted the packet and the length >> > of the packet at layer 2. >> >> The original packet is received along with the timestamp. > > But only outgoing packets, right? Timestamps for incoming packets are also passed alongside the original packet. >> Why is this L2 length needed? > > It's needed for incoming packets to allow converting of preamble > timestamps to trailer timestamps. Receiving the mac length of a packet sounds like a feature independent from timestamping. Either an ioctl similar to SIOCGIFMTU or, if it may vary due to existince of vlan headers, a new independent cmsg at the SOL_SOCKET layer. >> > Add a SOF_TIMESTAMPING_OPT_PKTINFO flag to >> > the SO_TIMESTAMPING option to allow applications to get this information >> > as struct scm_ts_pktinfo in SCM_TIMESTAMPING_PKTINFO control message. >> >> This patch saves skb->dev->ifindex, which is the same as existing >> SOF_TIMESTAMPING_OPT_CMSG. See also the bug fix for that >> feature I sent yesterday: http://patchwork.ozlabs.org/patch/750197/ > > The main point is that it provides the index of the device which > received the packet. It does duplicate the functionality of OPT_CMSG + > IP_PKTINFO for outgoing packets, but I thought it might be useful with > the TSONLY option. Agreed. I'd prefer to reuse the existing option for that and just extend it to work together with TSONLY. We will have to set serr->header.h4.iif from something other than skb->dev if the skb was allocated fresh in __skb_tstamp_tx without the device association.
On Thu, Apr 13, 2017 at 12:16:09PM -0400, Willem de Bruijn wrote: > On Thu, Apr 13, 2017 at 11:18 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote: > > On Thu, Apr 13, 2017 at 10:37:07AM -0400, Willem de Bruijn wrote: > >> Why is this L2 length needed? > > > > It's needed for incoming packets to allow converting of preamble > > timestamps to trailer timestamps. > > Receiving the mac length of a packet sounds like a feature independent > from timestamping. I agree, but so far nobody suggested another use for this information. Do you have any suggestions? The idea was that if it is useful only with HW timestamping, it would be better to save it only with the timestamp, so there is no performance impact in the more common case when HW timestamping is disabled. Am I overly cautious here? > Either an ioctl similar to SIOCGIFMTU or, if it may > vary due to existince of vlan headers, a new independent cmsg at the > SOL_SOCKET layer. It's not just the VLAN headers. The length of the IP header may vary with IP options, so the offset of the UDP data in the packet cannot be assumed to be constant. Now I'm wondering if it's actually necessary to save the original value of skb->mac_len + skb->len. Would "skb->data - skb->head - skb->mac_header + skb->len" always work as the L2 length for received packets at the time when the cmsg is prepared? As for the original ifindex, it seems to me it does need to be saved to a new field since __netif_receive_skb_core() intentionally overwrites skb->skb_iif. What would be the best place for it, sk_buff or skb_shared_info? And would it really be acceptable to save it for all packets in __netif_receive_skb_core(), even when HW timestamping is disabled? Seeing how the code and the data structures were optimized over time, I have a feeling it would not be accepted.
On Mon, Apr 24, 2017 at 5:00 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote: > On Thu, Apr 13, 2017 at 12:16:09PM -0400, Willem de Bruijn wrote: >> On Thu, Apr 13, 2017 at 11:18 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote: >> > On Thu, Apr 13, 2017 at 10:37:07AM -0400, Willem de Bruijn wrote: >> >> Why is this L2 length needed? >> > >> > It's needed for incoming packets to allow converting of preamble >> > timestamps to trailer timestamps. >> >> Receiving the mac length of a packet sounds like a feature independent >> from timestamping. > > I agree, but so far nobody suggested another use for this information. > Do you have any suggestions? > > The idea was that if it is useful only with HW timestamping, it would > be better to save it only with the timestamp, so there is no > performance impact in the more common case when HW timestamping is > disabled. Am I overly cautious here? The additional cost of a cmsg is zero for sockets that have no cmsg enabled, due to if (inet->cmsg_flags) ip_cmsg_recv_offset(msg, sk, skb, sizeof(struct udphdr), off); But you might be right that there are no uses outside the specific timestamp requirement you have, so if you prefer to use a timestamp option, I won't object further. >> Either an ioctl similar to SIOCGIFMTU or, if it may >> vary due to existince of vlan headers, a new independent cmsg at the >> SOL_SOCKET layer. The latter would require adding the SOL_SOCKET level cmsg processing infra. It is simpler to just add it at the INET/INET6 levels. > It's not just the VLAN headers. The length of the IP header may vary > with IP options, so the offset of the UDP data in the packet cannot be > assumed to be constant. As well as tunnels. > Now I'm wondering if it's actually necessary to save the original > value of skb->mac_len + skb->len. Computing it on recv if needed is definitely preferable to computing on enqueue and storing in an intermediate variable. > Would "skb->data - skb->head - > skb->mac_header + skb->len" always work as the L2 length for received > packets at the time when the cmsg is prepared? (skb->data - skb->head) - skb->mac_header computes the length of data before the mac, such as reserve? Do you mean skb->data - skb->mac_header (or - skb_mac_offset(skb))? > As for the original ifindex, it seems to me it does need to be saved > to a new field since __netif_receive_skb_core() intentionally > overwrites skb->skb_iif. What would be the best place for it, sk_buff > or skb_shared_info? Finding storage space on the receive path will not be easy. One shortcut to avoid storing this information explicitly is to look up the device from skb->napi_id. > And would it really be acceptable to save it for all packets in > __netif_receive_skb_core(), even when HW timestamping is disabled? > Seeing how the code and the data structures were optimized over time, > I have a feeling it would not be accepted. Incurring this cost on all packets for such a rare edge case does sound like a non-starter. It can be called only if the netstamp_needed static key is enabled (false), in __net_timestamp, though.
On Mon, Apr 24, 2017 at 11:18:13AM -0400, Willem de Bruijn wrote: > On Mon, Apr 24, 2017 at 5:00 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote: > > Would "skb->data - skb->head - > > skb->mac_header + skb->len" always work as the L2 length for received > > packets at the time when the cmsg is prepared? > > (skb->data - skb->head) - skb->mac_header computes the length > of data before the mac, such as reserve? data - head includes the reserve, but mac_header does too, so I think it should be just the length of MAC header and everything up to the data. > Do you mean skb->data - > skb->mac_header (or - skb_mac_offset(skb))? That would give me a pointer? If I used skb_mac_offset(), the total length would be just skb->len - skb_mac_offset()? > > As for the original ifindex, it seems to me it does need to be saved > > to a new field since __netif_receive_skb_core() intentionally > > overwrites skb->skb_iif. What would be the best place for it, sk_buff > > or skb_shared_info? > > Finding storage space on the receive path will not be easy. > > One shortcut to avoid storing this information explicitly is to look up > the device from skb->napi_id. Thanks. This looks promising. It will depend on CONFIG_NET_RX_BUSY_POLL, but I guess that's ok. It nicely isolates all costs to the timestamping option.
On Tue, Apr 25, 2017 at 9:56 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote: > On Mon, Apr 24, 2017 at 11:18:13AM -0400, Willem de Bruijn wrote: >> On Mon, Apr 24, 2017 at 5:00 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote: >> > Would "skb->data - skb->head - >> > skb->mac_header + skb->len" always work as the L2 length for received >> > packets at the time when the cmsg is prepared? >> >> (skb->data - skb->head) - skb->mac_header computes the length >> of data before the mac, such as reserve? > > data - head includes the reserve, but mac_header does too, so I think > it should be just the length of MAC header and everything up to the > data. > >> Do you mean skb->data - >> skb->mac_header (or - skb_mac_offset(skb))? > > That would give me a pointer? If I used skb_mac_offset(), the total > length would be just skb->len - skb_mac_offset()? It appears so. The only existing caller first checks skb_mac_header_was_set(skb).
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 96f5069..ed04aaa 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -193,6 +193,14 @@ SOF_TIMESTAMPING_OPT_STATS: the transmit timestamps, such as how long a certain block of data was limited by peer's receiver window. +SOF_TIMESTAMPING_OPT_PKTINFO: + + Optional information about timestamped packets. It includes the + index of the real interface which received or transmitted the + packet and its length at layer 2. If the device driver provides + this information, it will be attached in struct scm_ts_pktinfo as + a separate control message of type SCM_TIMESTAMPING_PKTINFO. + New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate regardless of the setting of sysctl net.core.tstamp_allow_data. diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 741d75c..e91685a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -349,6 +349,8 @@ static inline void skb_frag_size_sub(skb_frag_t *frag, int delta) * struct skb_shared_hwtstamps - hardware time stamps * @hwtstamp: hardware time stamp transformed into duration * since arbitrary point in time + * @if_index: index of the interface which timestamped the packet + * @pkt_length: length of the packet * * Software time stamps generated by ktime_get_real() are stored in * skb->tstamp. @@ -361,6 +363,8 @@ static inline void skb_frag_size_sub(skb_frag_t *frag, int delta) */ struct skb_shared_hwtstamps { ktime_t hwtstamp; + int if_index; + int pkt_length; }; /* Definitions for tx_flags in struct skb_shared_info */ @@ -3322,6 +3326,24 @@ static inline void skb_tx_timestamp(struct sk_buff *skb) } /** + * skb_hw_timestamp - set hardware timestamp with packet information + * + * @skb: A socket buffer. + * @hwtstamp: The hardware timestamp. + * @if_index: The index of the interface which timestamped the packet. + * @pkt_len: The length of the packet. + */ +static inline void skb_hw_timestamp(struct sk_buff *skb, ktime_t hwtstamp, + int if_index, int pkt_length) +{ + struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb); + + hwtstamps->hwtstamp = hwtstamp; + hwtstamps->if_index = if_index; + hwtstamps->pkt_length = pkt_length; +} + +/** * skb_complete_wifi_ack - deliver skb with wifi status * * @skb: the original outgoing packet diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index 2b48856..a5f6e81 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -100,4 +100,6 @@ #define SO_COOKIE 57 +#define SCM_TIMESTAMPING_PKTINFO 58 + #endif /* __ASM_GENERIC_SOCKET_H */ diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h index 07bdce1..66d752f 100644 --- a/include/uapi/linux/errqueue.h +++ b/include/uapi/linux/errqueue.h @@ -43,4 +43,12 @@ enum { SCM_TSTAMP_ACK, /* data acknowledged by peer */ }; +/** + * struct scm_ts_pktinfo - information about HW-timestamped packets + */ +struct scm_ts_pktinfo { + int if_index; + int pkt_length; +}; + #endif /* _UAPI_LINUX_ERRQUEUE_H */ diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h index 0749fb1..8397ecd 100644 --- a/include/uapi/linux/net_tstamp.h +++ b/include/uapi/linux/net_tstamp.h @@ -26,8 +26,9 @@ enum { SOF_TIMESTAMPING_OPT_CMSG = (1<<10), SOF_TIMESTAMPING_OPT_TSONLY = (1<<11), SOF_TIMESTAMPING_OPT_STATS = (1<<12), + SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13), - SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_STATS, + SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_PKTINFO, SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) | SOF_TIMESTAMPING_LAST }; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 9f78109..7ca251f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3888,10 +3888,16 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey; } - if (hwtstamps) - *skb_hwtstamps(skb) = *hwtstamps; - else + if (hwtstamps) { + skb_hwtstamps(skb)->hwtstamp = hwtstamps->hwtstamp; + if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_PKTINFO) { + skb_hwtstamps(skb)->if_index = orig_skb->dev->ifindex; + skb_hwtstamps(skb)->pkt_length = orig_skb->mac_len + + orig_skb->len; + } + } else { skb->tstamp = ktime_get_real(); + } __skb_complete_tx_timestamp(skb, sk, tstype, opt_stats); } diff --git a/net/socket.c b/net/socket.c index eea9970..f272019 100644 --- a/net/socket.c +++ b/net/socket.c @@ -670,6 +670,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, { int need_software_tstamp = sock_flag(sk, SOCK_RCVTSTAMP); struct scm_timestamping tss; + struct scm_ts_pktinfo ts_pktinfo; int empty = 1; struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); @@ -699,8 +700,16 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, empty = 0; if (shhwtstamps && (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) && - ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) + ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) { + if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_PKTINFO && + shhwtstamps->if_index) { + ts_pktinfo.if_index = shhwtstamps->if_index; + ts_pktinfo.pkt_length = shhwtstamps->pkt_length; + put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_PKTINFO, + sizeof(ts_pktinfo), &ts_pktinfo); + } empty = 0; + } if (!empty) { put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING, sizeof(tss), &tss);
Extend the skb_shared_hwtstamps structure with the index of the real interface which received or transmitted the packet and the length of the packet at layer 2. Add a SOF_TIMESTAMPING_OPT_PKTINFO flag to the SO_TIMESTAMPING option to allow applications to get this information as struct scm_ts_pktinfo in SCM_TIMESTAMPING_PKTINFO control message. The index is mainly useful with bonding, bridges and other virtual interfaces, where IP_PKTINFO doesn't provide the index of the real interface. Applications may need it to determine which PHC made the timestamp. With the L2 length it is possible to transpose preamble timestamps to trailer timestamps, which are used in the NTP protocol. Instead of adding a new check to a common path that might slow down processing of received packets without hardware timestamps, the new fields are expected to be set by the drivers together with the hardware timestamp using the skb_hw_timestamp() function. CC: Richard Cochran <richardcochran@gmail.com> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> --- Documentation/networking/timestamping.txt | 8 ++++++++ include/linux/skbuff.h | 22 ++++++++++++++++++++++ include/uapi/asm-generic/socket.h | 2 ++ include/uapi/linux/errqueue.h | 8 ++++++++ include/uapi/linux/net_tstamp.h | 3 ++- net/core/skbuff.c | 12 +++++++++--- net/socket.c | 11 ++++++++++- 7 files changed, 61 insertions(+), 5 deletions(-)