From patchwork Tue Mar 19 19:49:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kevin 'ldir' Darbyshire-Bryant X-Patchwork-Id: 1058651 Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=darbyshire-bryant.me.uk Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=darbyshire-bryant.me.uk header.i=@darbyshire-bryant.me.uk header.b="YHwXx+yP"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44P3YQ03vTz9s70 for ; Wed, 20 Mar 2019 06:50:01 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727066AbfCSTuA (ORCPT ); Tue, 19 Mar 2019 15:50:00 -0400 Received: from mail-eopbgr10061.outbound.protection.outlook.com ([40.107.1.61]:52848 "EHLO EUR02-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726768AbfCSTt7 (ORCPT ); Tue, 19 Mar 2019 15:49:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=darbyshire-bryant.me.uk; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=k4xmGQmpOf9cR3siGb0Tz4ZW0MT4g3wgaW1dgoqGW0w=; b=YHwXx+yPDnBj2ex0uVJdp55qg4xkYYYx2jOSouL0UwtNGczT2Hr5Yyu05CnM0UxDwMfQ1aRfyYHeAZ2Dqcx47TKu/w2JPtTGtZrAWZfcc/QsfXJelbFhKfPkdZfHFzALIS/cLDz1GDeoWcmtw0w/yVa1YRKvG4Vlb7VWXu4PvJU= Received: from VI1PR0302MB2750.eurprd03.prod.outlook.com (10.171.105.143) by VI1PR0302MB3439.eurprd03.prod.outlook.com (52.134.14.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1709.14; Tue, 19 Mar 2019 19:49:55 +0000 Received: from VI1PR0302MB2750.eurprd03.prod.outlook.com ([fe80::a8fc:70f:5750:d2d8]) by VI1PR0302MB2750.eurprd03.prod.outlook.com ([fe80::a8fc:70f:5750:d2d8%9]) with mapi id 15.20.1709.015; Tue, 19 Mar 2019 19:49:55 +0000 From: Kevin 'ldir' Darbyshire-Bryant To: "netdev@vger.kernel.org" CC: "jiri@resnulli.us" , "xiyou.wangcong@gmail.com" , "jhs@mojatatu.com" , Kevin 'ldir' Darbyshire-Bryant Subject: [RFC PATCH 0/1 net-next] net: sched: Introduce conndscp action Thread-Topic: [RFC PATCH 0/1 net-next] net: sched: Introduce conndscp action Thread-Index: AQHU3ozsRXuUXqVW7kyH6rHcIb+UvQ== Date: Tue, 19 Mar 2019 19:49:55 +0000 Message-ID: <20190319194929.10798-1-ldir@darbyshire-bryant.me.uk> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: VI1PR06CA0124.eurprd06.prod.outlook.com (2603:10a6:803:a0::17) To VI1PR0302MB2750.eurprd03.prod.outlook.com (2603:10a6:800:e2::15) authentication-results: spf=none (sender IP is ) smtp.mailfrom=ldir@darbyshire-bryant.me.uk; x-ms-exchange-messagesentrepresentingtype: 1 x-mailer: git-send-email 2.17.2 (Apple Git-113) x-originating-ip: [2a02:c7f:1240:ee00::c904] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: bcc1d695-9148-4dab-bc77-08d6aca40f39 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(7021145)(8989299)(4534185)(7022145)(4603075)(4627221)(201702281549075)(8990200)(7048125)(7024125)(7027125)(7023125)(5600127)(711020)(4605104)(2017052603328)(7153060)(7193020); SRVR:VI1PR0302MB3439; x-ms-traffictypediagnostic: VI1PR0302MB3439: x-microsoft-antispam-prvs: x-forefront-prvs: 0981815F2F x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(136003)(376002)(396003)(366004)(346002)(39830400003)(199004)(189003)(54906003)(6116002)(305945005)(6506007)(386003)(316002)(5640700003)(7736002)(97736004)(486006)(86362001)(68736007)(25786009)(99286004)(6512007)(107886003)(2906002)(66574012)(1076003)(1730700003)(14444005)(50226002)(256004)(6486002)(52116002)(6916009)(105586002)(5024004)(106356001)(74482002)(6436002)(14454004)(36756003)(53936002)(4326008)(2501003)(102836004)(46003)(2616005)(508600001)(71190400001)(5660300002)(2351001)(186003)(81156014)(8676002)(8936002)(81166006)(71200400001)(476003); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR0302MB3439; H:VI1PR0302MB2750.eurprd03.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: darbyshire-bryant.me.uk does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: kmel/Ip/1MR5quOmIMaMnBt2JmHizGykm2Nu8+zFZthiXcdrbfu6iAzZz6v2g4xq4nvoDSuysWImWurMNEeC2Ylq4aQnuyJbPCWbLpTo1QWGX8UQyQVL+LuwpuE973gjZKK5ZGxffOXxJnkaB/Pay985ziWWwD/6FZy410MJdD7g0WliBY4ALQPDvLmKNz6tzq9ZTtFX038w+8+rGINOjV9uEICZkakkQyxBeRDA5+X4oj61AqjPSmFqy8lhk65nx8Kbd5vi4kiCZhLCoht7N1zqgJwGEjizr6ODzSdpGaIQd6oU3oRdeeY5mPq91OCnAThORAOg2jUAfQYler49LqRotYNrkY2EbPsuwmTFZNmtCc6DEZ5FrY9byy1k6Ue5thJ1aMpcZVZn7d1PUAYUptAIHN/141lH+OLfSct9OY0= MIME-Version: 1.0 X-OriginatorOrg: darbyshire-bryant.me.uk X-MS-Exchange-CrossTenant-Network-Message-Id: bcc1d695-9148-4dab-bc77-08d6aca40f39 X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Mar 2019 19:49:55.3753 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 9151708b-c553-406f-8e56-694f435154a4 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0302MB3439 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org With nervousness and trepidation I'm submitting the attached RFC patch for 'conndscp'. Conndscp is a new tc filter action module. It is designed to copy DSCPs to conntrack marks and the reverse operation of conntrack mark contained DSCPs to the diffserv field of suitable skbs. The feature is intended for use and has been found useful for restoring ingress classifications based on egress classifications across links that bleach or otherwise change DSCP, typically home ISP Internet links. Restoring DSCP on ingress on the WAN link allows qdiscs such as CAKE to shape inbound packets according to policies that are easier to implement on egress. Ingress classification is traditionally a challenging task since iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT lookups, hence are unable to see internal IPv4 addresses as used on the typical home masquerading gateway. conndscp understands the following parameters: mask - a 32 bit mask of at least 6 contiguous bits where conndscp will place the DSCP in conntrack mark. The DSCP is left-shifted by the number of unset lower bits of the mask before storing into the mark field. statemask - a 32 bit mask of (usually) 1 bit length, outside the area specified by mask. This represents a conditional operation flag - get will only store the DSCP if the flag is unset. set will only restore the DSCP if the flag is set. This is useful to implement a 'one shot' iptables based classification where the 'complicated' iptables rules are only run once to classify the connection on initial (egress) packet and subsequent packets are all marked/restored with the same DSCP. A mask of zero disables the conditional behaviour. mode - get/set/both - get stores the DSCP into the mark, set restores the DSCP into the diffserv field from the mark, both 'gets' the mark and then 'sets' it in that order. optional parameters: zone - conntrack zone control - action related control (reclassify | pipe | drop | continue | ok | goto chain A typical example of using conndscp to restore DSCP values for use with a qdisc (e.g. CAKE) is shown below, using top 6 bits to store the DSCP and the bottom bit of top byte as the state flag. # egress qdisc tc qdisc add dev eth0 cake bandwidth 20000kbit # put an action on the egress interface to get DSCP to connmark->mark # and to set DSCP from the stored connmark. # this seems counter intuitive but it ensures once the mark is set that all # subsequent egress packets have the same stored DSCP avoiding iptables rules # to mark every packet, conndscp does it for us and then CAKE is happy using the # DSCP tc filter add dev eth0 protocol all prio 10 u32 match u32 0 0 flowid 1:1 action \ conndscp mask 0xfc000000 statemask 0x01000000 mode both #ingress qdisc via an ifb tc qdisc add dev eth0 handle ffff: ingress tc qdisc add dev ifb4eth0 cake badnwidth 80000kbit ip link set ifb4eth0 up # redirect all packets arriving on eth0 to ifb4eth0 and restore the DSCP from connmark tc filter add dev eth0 parent ffff: protocol all prio 10 u32 \ match u32 0 0 flowid 1:1 action \ conndscp mask 0xfc000000 statemask 0x01000000 mode set \ mirred egress redirect dev ifb4eth0 #iptables rules using the statemask flag to only do it once iptables -t mangle -N QOS_MARK_eth0 iptables -t mangle -A QOS_MARK_eth0 -m set --match-set Bulk4 dst -j DSCP --set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset" #add more rules similar to above as required # send unmarked packets to the marking chain - conndscp will set the statemask bit # if not already set. iptables -t mangle -A POSTROUTING -o eth0 -m connmark --mark 0x00000000/0x01000000 -g QOS_MARK_eth0 conndscp (almost) shamelessly copies code from connmark and therefore contains the same limitations. I am not a full time programmer, conndscp represents something of the order of a 2 week struggle, my C is awful, kernel & network knowledge worse, though I like to think improving. There are no doubt issues with this patch/feature but I hope constructive feedback, quite possibly in very short words for my simple brain, will knock it into shape. Thanks for your time. Kevin Darbyshire-Bryant (1): net: sched: Introduce conndscp action include/net/tc_act/tc_conndscp.h | 19 ++ include/uapi/linux/tc_act/tc_conndscp.h | 33 +++ net/sched/Kconfig | 13 + net/sched/Makefile | 1 + net/sched/act_conndscp.c | 333 ++++++++++++++++++++++ tools/testing/selftests/tc-testing/config | 1 + 6 files changed, 400 insertions(+) create mode 100644 include/net/tc_act/tc_conndscp.h create mode 100644 include/uapi/linux/tc_act/tc_conndscp.h create mode 100644 net/sched/act_conndscp.c