From patchwork Tue Oct 11 05:56:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Khalid Elmously X-Patchwork-Id: 1688462 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=edFVWXfz; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmlRz1mMHz20cX for ; Tue, 11 Oct 2022 16:57:33 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1oi8GU-0008G6-79; Tue, 11 Oct 2022 05:57:18 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1oi8GP-0008Ev-SI for kernel-team@lists.ubuntu.com; Tue, 11 Oct 2022 05:57:13 +0000 Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 50A3F40275 for ; Tue, 11 Oct 2022 05:57:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1665467833; bh=4L/ufE2ZC72j1jd7UH7MIYtNB7EA91i1Fl+vIrxzT7s=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=edFVWXfzq0DU+11r+t6lOVfzWr3NT6YiIGuRhwTaSqS6EjdMg+XdBnRw/Gsi6Hnyw U2aftVWanY7WBGed5IYO7lPqf6RKTQqSw5L3ju7F8SIE6icCRcW5250ClWQynPcN2x SLxVGi5iCS5UxSCL6d/bK8f/uLEK03DJ1Am+YAMgKuiT2d3iqQZO1cPewSoQK5JPpf Do7WUBy5MuFXTUUn2iYdeSEXa3DZmgwDiBvkMA1XJe4f1TVSa2Fl7bihPO3Dfh1wXr KQpgyV111TZ57oP5jmFKq4uJvR4OLwD4Qus73/xZLFPePNOfr576XfJEf1mHV6R1FN iXgObbri1CkaQ== Received: by mail-qv1-f72.google.com with SMTP id f10-20020ad443ea000000b004aca9fabe98so7363863qvu.18 for ; Mon, 10 Oct 2022 22:57:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4L/ufE2ZC72j1jd7UH7MIYtNB7EA91i1Fl+vIrxzT7s=; b=q8AoGv5VZsvOsMXbGwa5VNE0acC9NPZkDOdL7BFKobVo/Og32MrsoIWhTZaGkgpuQX fB5ZbHZkZu/XwoYH0FyixNPwQU+aAEpOqjEIHcq6LNde+bjooZjUJ4hxZcf8SC+eAOAW xZpw8HLGALeJJ+5phLyCAn5zgNHMVVwmKfeWZh001EVjYuQ++EjFeLojdUS+xmtaGzwE /UBwK4u79vRcQuAtXhhhpKjs1Xic4InxZjVBHwPhciX24idbBPwdHI1Ltw1ncaEYkXOW 7Sgk+jRYG4//0ytQmmOR/34PReAhcgpfM99w0ejXQW8MYsr5hBG3L8qArBjrwRrKSmnx +Qxw== X-Gm-Message-State: ACrzQf1SudLxwwszAICac4zoZN3SAQiu2cOWBo7SCQFLKRJK76NRDd9o MrIjiTGjGDRHdSnm/kqodxZd2/UoRcKDcrsGrwn/XyFiefxzeCjStfVCUig+h4KhOZWXGfbyFdS 2FpqtrjIQfrqXCINAT7IiYbeVuGJePHX2d16uwijA9w== X-Received: by 2002:a37:5401:0:b0:6df:aeea:89f0 with SMTP id i1-20020a375401000000b006dfaeea89f0mr14657330qkb.763.1665467831730; Mon, 10 Oct 2022 22:57:11 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6106Y/dql590HfTEG6l1momh0l1hkzp3b6hcnH/ggIgqxW7vKHzjJh6MmipRRP2vmdC/LZ9A== X-Received: by 2002:a37:5401:0:b0:6df:aeea:89f0 with SMTP id i1-20020a375401000000b006dfaeea89f0mr14657317qkb.763.1665467831330; Mon, 10 Oct 2022 22:57:11 -0700 (PDT) Received: from rpi4-work.fuzzbuzz.org ([38.147.253.164]) by smtp.gmail.com with ESMTPSA id z8-20020ac81008000000b0039351b26714sm9848927qti.7.2022.10.10.22.57.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Oct 2022 22:57:10 -0700 (PDT) From: Khalid Elmously To: kernel-team@lists.ubuntu.com Subject: [PATCH 04/19] gve: Add optional metadata descriptor type GVE_TXD_MTD Date: Tue, 11 Oct 2022 01:56:49 -0400 Message-Id: <20221011055704.642271-5-khalid.elmously@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221011055704.642271-1-khalid.elmously@canonical.com> References: <20221011055704.642271-1-khalid.elmously@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Willem de Bruijn BugLink: https://bugs.launchpad.net/bugs/1953575 Allow drivers to pass metadata along with packet data to the device. Introduce a new metadata descriptor type * GVE_TXD_MTD This descriptor is optional. If present it immediate follows the packet descriptor and precedes the segment descriptor. This descriptor may be repeated. Multiple metadata descriptors may follow. There are no immediate uses for this, this is for future proofing. At present devices allow only 1 MTD descriptor. The lower four bits of the type_flags field encode GVE_TXD_MTD. The upper four bits of the type_flags field encodes a *sub*type. Introduce one such metadata descriptor subtype * GVE_MTD_SUBTYPE_PATH This shares path information with the device for network failure discovery and robust response: Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash, and updates this field on error with sk_rethink_txhash. Allow the host stack to do the same. Pass the tx_hash value if set. Also communicate whether the path hash is set, or more exactly, what its type is. Define two common types GVE_MTD_PATH_HASH_NONE GVE_MTD_PATH_HASH_L4 Concrete examples of error conditions that are resolved are mentioned in the commits that add sk_rethink_txhash calls. Such as commit 7788174e8726 ("tcp: change IPv6 flow-label upon receiving spurious retransmission"). Experimental results mirror what the theory suggests: where IPv6 FlowLabel is included in path selection (e.g., LAG/ECMP), flowlabel rotation on TCP timeout avoids the vast majority of TCP disconnects that would otherwise have occurred during link failures in long-haul backbones, when an alternative path is available. Rotation can be applied to various bad connection signals, such as timeouts and spurious retransmissions. In aggregate, such flow level signals can help locate network issues. Define initial common states: GVE_MTD_PATH_STATE_DEFAULT GVE_MTD_PATH_STATE_TIMEOUT GVE_MTD_PATH_STATE_CONGESTION GVE_MTD_PATH_STATE_RETRANSMIT Signed-off-by: Willem de Bruijn Signed-off-by: David Awogbemila Signed-off-by: David S. Miller (cherry picked from commit 497dbb2b97a0642ecd478d441643bf26804d5f96) Signed-off-by: Khalid Elmously --- drivers/net/ethernet/google/gve/gve.h | 1 + drivers/net/ethernet/google/gve/gve_desc.h | 20 ++++++ drivers/net/ethernet/google/gve/gve_tx.c | 73 ++++++++++++++++------ 3 files changed, 74 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h index 2cb193fd104d..d47ef01c17e8 100644 --- a/drivers/net/ethernet/google/gve/gve.h +++ b/drivers/net/ethernet/google/gve/gve.h @@ -221,6 +221,7 @@ struct gve_rx_ring { /* A TX desc ring entry */ union gve_tx_desc { struct gve_tx_pkt_desc pkt; /* first desc for a packet */ + struct gve_tx_mtd_desc mtd; /* optional metadata descriptor */ struct gve_tx_seg_desc seg; /* subsequent descs for a packet */ }; diff --git a/drivers/net/ethernet/google/gve/gve_desc.h b/drivers/net/ethernet/google/gve/gve_desc.h index 05ae6300e984..5068ae5204a7 100644 --- a/drivers/net/ethernet/google/gve/gve_desc.h +++ b/drivers/net/ethernet/google/gve/gve_desc.h @@ -33,6 +33,14 @@ struct gve_tx_pkt_desc { __be64 seg_addr; /* Base address (see note) of this segment */ } __packed; +struct gve_tx_mtd_desc { + u8 type_flags; /* type is lower 4 bits, subtype upper */ + u8 path_state; /* state is lower 4 bits, hash type upper */ + __be16 reserved0; + __be32 path_hash; + __be64 reserved1; +} __packed; + struct gve_tx_seg_desc { u8 type_flags; /* type is lower 4 bits, flags upper */ u8 l3_offset; /* TSO: 2 byte units to start of IPH */ @@ -46,6 +54,7 @@ struct gve_tx_seg_desc { #define GVE_TXD_STD (0x0 << 4) /* Std with Host Address */ #define GVE_TXD_TSO (0x1 << 4) /* TSO with Host Address */ #define GVE_TXD_SEG (0x2 << 4) /* Seg with Host Address */ +#define GVE_TXD_MTD (0x3 << 4) /* Metadata */ /* GVE Transmit Descriptor Flags for Std Pkts */ #define GVE_TXF_L4CSUM BIT(0) /* Need csum offload */ @@ -54,6 +63,17 @@ struct gve_tx_seg_desc { /* GVE Transmit Descriptor Flags for TSO Segs */ #define GVE_TXSF_IPV6 BIT(1) /* IPv6 TSO */ +/* GVE Transmit Descriptor Options for MTD Segs */ +#define GVE_MTD_SUBTYPE_PATH 0 + +#define GVE_MTD_PATH_STATE_DEFAULT 0 +#define GVE_MTD_PATH_STATE_TIMEOUT 1 +#define GVE_MTD_PATH_STATE_CONGESTION 2 +#define GVE_MTD_PATH_STATE_RETRANSMIT 3 + +#define GVE_MTD_PATH_HASH_NONE (0x0 << 4) +#define GVE_MTD_PATH_HASH_L4 (0x1 << 4) + /* GVE Receive Packet Descriptor */ /* The start of an ethernet packet comes 2 bytes into the rx buffer. * gVNIC adds this padding so that both the DMA and the L3/4 protocol header diff --git a/drivers/net/ethernet/google/gve/gve_tx.c b/drivers/net/ethernet/google/gve/gve_tx.c index 9922ce46a635..d043e15311c5 100644 --- a/drivers/net/ethernet/google/gve/gve_tx.c +++ b/drivers/net/ethernet/google/gve/gve_tx.c @@ -295,11 +295,14 @@ static inline int gve_skb_fifo_bytes_required(struct gve_tx_ring *tx, return bytes; } -/* The most descriptors we could need is MAX_SKB_FRAGS + 3 : 1 for each skb frag, - * +1 for the skb linear portion, +1 for when tcp hdr needs to be in separate descriptor, - * and +1 if the payload wraps to the beginning of the FIFO. +/* The most descriptors we could need is MAX_SKB_FRAGS + 4 : + * 1 for each skb frag + * 1 for the skb linear portion + * 1 for when tcp hdr needs to be in separate descriptor + * 1 if the payload wraps to the beginning of the FIFO + * 1 for metadata descriptor */ -#define MAX_TX_DESC_NEEDED (MAX_SKB_FRAGS + 3) +#define MAX_TX_DESC_NEEDED (MAX_SKB_FRAGS + 4) static void gve_tx_unmap_buf(struct device *dev, struct gve_tx_buffer_state *info) { if (info->skb) { @@ -389,6 +392,19 @@ static void gve_tx_fill_pkt_desc(union gve_tx_desc *pkt_desc, pkt_desc->pkt.seg_addr = cpu_to_be64(addr); } +static void gve_tx_fill_mtd_desc(union gve_tx_desc *mtd_desc, + struct sk_buff *skb) +{ + BUILD_BUG_ON(sizeof(mtd_desc->mtd) != sizeof(mtd_desc->pkt)); + + mtd_desc->mtd.type_flags = GVE_TXD_MTD | GVE_MTD_SUBTYPE_PATH; + mtd_desc->mtd.path_state = GVE_MTD_PATH_STATE_DEFAULT | + GVE_MTD_PATH_HASH_L4; + mtd_desc->mtd.path_hash = cpu_to_be32(skb->hash); + mtd_desc->mtd.reserved0 = 0; + mtd_desc->mtd.reserved1 = 0; +} + static void gve_tx_fill_seg_desc(union gve_tx_desc *seg_desc, struct sk_buff *skb, bool is_gso, u16 len, u64 addr) @@ -420,6 +436,7 @@ static int gve_tx_add_skb_copy(struct gve_priv *priv, struct gve_tx_ring *tx, st int pad_bytes, hlen, hdr_nfrags, payload_nfrags, l4_hdr_offset; union gve_tx_desc *pkt_desc, *seg_desc; struct gve_tx_buffer_state *info; + int mtd_desc_nr = !!skb->l4_hash; bool is_gso = skb_is_gso(skb); u32 idx = tx->req & tx->mask; int payload_iov = 2; @@ -451,7 +468,7 @@ static int gve_tx_add_skb_copy(struct gve_priv *priv, struct gve_tx_ring *tx, st &info->iov[payload_iov]); gve_tx_fill_pkt_desc(pkt_desc, skb, is_gso, l4_hdr_offset, - 1 + payload_nfrags, hlen, + 1 + mtd_desc_nr + payload_nfrags, hlen, info->iov[hdr_nfrags - 1].iov_offset); skb_copy_bits(skb, 0, @@ -462,8 +479,13 @@ static int gve_tx_add_skb_copy(struct gve_priv *priv, struct gve_tx_ring *tx, st info->iov[hdr_nfrags - 1].iov_len); copy_offset = hlen; + if (mtd_desc_nr) { + next_idx = (tx->req + 1) & tx->mask; + gve_tx_fill_mtd_desc(&tx->desc[next_idx], skb); + } + for (i = payload_iov; i < payload_nfrags + payload_iov; i++) { - next_idx = (tx->req + 1 + i - payload_iov) & tx->mask; + next_idx = (tx->req + 1 + mtd_desc_nr + i - payload_iov) & tx->mask; seg_desc = &tx->desc[next_idx]; gve_tx_fill_seg_desc(seg_desc, skb, is_gso, @@ -479,16 +501,17 @@ static int gve_tx_add_skb_copy(struct gve_priv *priv, struct gve_tx_ring *tx, st copy_offset += info->iov[i].iov_len; } - return 1 + payload_nfrags; + return 1 + mtd_desc_nr + payload_nfrags; } static int gve_tx_add_skb_no_copy(struct gve_priv *priv, struct gve_tx_ring *tx, struct sk_buff *skb) { const struct skb_shared_info *shinfo = skb_shinfo(skb); - int hlen, payload_nfrags, l4_hdr_offset; - union gve_tx_desc *pkt_desc, *seg_desc; + int hlen, num_descriptors, l4_hdr_offset; + union gve_tx_desc *pkt_desc, *mtd_desc, *seg_desc; struct gve_tx_buffer_state *info; + int mtd_desc_nr = !!skb->l4_hash; bool is_gso = skb_is_gso(skb); u32 idx = tx->req & tx->mask; u64 addr; @@ -517,23 +540,30 @@ static int gve_tx_add_skb_no_copy(struct gve_priv *priv, struct gve_tx_ring *tx, dma_unmap_len_set(info, len, len); dma_unmap_addr_set(info, dma, addr); - payload_nfrags = shinfo->nr_frags; + num_descriptors = 1 + shinfo->nr_frags; + if (hlen < len) + num_descriptors++; + if (mtd_desc_nr) + num_descriptors++; + + gve_tx_fill_pkt_desc(pkt_desc, skb, is_gso, l4_hdr_offset, + num_descriptors, hlen, addr); + + if (mtd_desc_nr) { + idx = (idx + 1) & tx->mask; + mtd_desc = &tx->desc[idx]; + gve_tx_fill_mtd_desc(mtd_desc, skb); + } + if (hlen < len) { /* For gso the rest of the linear portion of the skb needs to * be in its own descriptor. */ - payload_nfrags++; - gve_tx_fill_pkt_desc(pkt_desc, skb, is_gso, l4_hdr_offset, - 1 + payload_nfrags, hlen, addr); - len -= hlen; addr += hlen; - idx = (tx->req + 1) & tx->mask; + idx = (idx + 1) & tx->mask; seg_desc = &tx->desc[idx]; gve_tx_fill_seg_desc(seg_desc, skb, is_gso, len, addr); - } else { - gve_tx_fill_pkt_desc(pkt_desc, skb, is_gso, l4_hdr_offset, - 1 + payload_nfrags, hlen, addr); } for (i = 0; i < shinfo->nr_frags; i++) { @@ -554,11 +584,14 @@ static int gve_tx_add_skb_no_copy(struct gve_priv *priv, struct gve_tx_ring *tx, gve_tx_fill_seg_desc(seg_desc, skb, is_gso, len, addr); } - return 1 + payload_nfrags; + return num_descriptors; unmap_drop: - i += (payload_nfrags == shinfo->nr_frags ? 1 : 2); + i += num_descriptors - shinfo->nr_frags; while (i--) { + /* Skip metadata descriptor, if set */ + if (i == 1 && mtd_desc_nr == 1) + continue; idx--; gve_tx_unmap_buf(tx->dev, &tx->info[idx & tx->mask]); }