From patchwork Thu Jul 28 10:12:28 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Chen X-Patchwork-Id: 653674 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3s0SlQ5Tqpz9sBM for ; Thu, 28 Jul 2016 20:29:22 +1000 (AEST) Received: from localhost ([::1]:52216 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSiZ6-0001Qy-Bc for incoming@patchwork.ozlabs.org; Thu, 28 Jul 2016 06:29:20 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59663) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSiJM-0002ym-7A for qemu-devel@nongnu.org; Thu, 28 Jul 2016 06:13:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bSiJJ-0000Of-NY for qemu-devel@nongnu.org; Thu, 28 Jul 2016 06:13:03 -0400 Received: from [59.151.112.132] (port=46342 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSiJI-0000Jx-GS for qemu-devel@nongnu.org; Thu, 28 Jul 2016 06:13:01 -0400 X-IronPort-AV: E=Sophos;i="5.22,518,1449504000"; d="scan'208";a="9251087" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 28 Jul 2016 18:12:48 +0800 Received: from G08CNEXCHPEKD02.g08.fujitsu.local (unknown [10.167.33.83]) by cn.fujitsu.com (Postfix) with ESMTP id 7138B42B5482; Thu, 28 Jul 2016 18:12:49 +0800 (CST) Received: from G08FNSTD140215.g08.fujitsu.local (10.167.226.56) by G08CNEXCHPEKD02.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP Server (TLS) id 14.3.279.2; Thu, 28 Jul 2016 18:12:49 +0800 From: Zhang Chen To: qemu devel , Jason Wang Date: Thu, 28 Jul 2016 18:12:28 +0800 Message-ID: <1469700748-19754-10-git-send-email-zhangchen.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1469700748-19754-1-git-send-email-zhangchen.fnst@cn.fujitsu.com> References: <1469700748-19754-1-git-send-email-zhangchen.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-Originating-IP: [10.167.226.56] X-yoursite-MailScanner-ID: 7138B42B5482.AD9AE X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: zhangchen.fnst@cn.fujitsu.com X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 59.151.112.132 Subject: [Qemu-devel] [PATCH V11 9/9] filter-rewriter: rewrite tcp packet to keep secondary connection X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Zhijian , "eddie . dong" , "Dr . David Alan Gilbert" , Zhang Chen , zhanghailiang Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" We will rewrite tcp packet secondary received and sent. When colo guest is a tcp server. Firstly, client start a tcp handshake. the packet's seq=client_seq, ack=0,flag=SYN. COLO primary guest get this pkt and mirror(filter-mirror) to secondary guest, secondary get it use filter-redirector. Then,primary guest response pkt (seq=primary_seq,ack=client_seq+1,flag=ACK|SYN). secondary guest response pkt (seq=secondary_seq,ack=client_seq+1,flag=ACK|SYN). In here,we use filter-rewriter save the secondary_seq to it's tcp connection. Finally handshake,client send pkt (seq=client_seq+1,ack=primary_seq+1,flag=ACK). Here,filter-rewriter can get primary_seq, and rewrite ack from primary_seq+1 to secondary_seq+1, recalculate checksum. So the secondary tcp connection kept good. When we send/recv packet. client send pkt(seq=client_seq+1+data_len,ack=primary_seq+1,flag=ACK|PSH). filter-rewriter rewrite ack and send to secondary guest. primary guest response pkt (seq=primary_seq+1,ack=client_seq+1+data_len,flag=ACK) secondary guest response pkt (seq=secondary_seq+1,ack=client_seq+1+data_len,flag=ACK) we rewrite secondary guest seq from secondary_seq+1 to primary_seq+1. So tcp connection kept good. In code We use offset( = secondary_seq - primary_seq ) to rewrite seq or ack. handle_primary_tcp_pkt: tcp_pkt->th_ack += offset; handle_secondary_tcp_pkt: tcp_pkt->th_seq -= offset; Signed-off-by: Zhang Chen Signed-off-by: Li Zhijian Signed-off-by: Wen Congyang --- net/colo-base.c | 2 + net/colo-base.h | 7 ++++ net/filter-rewriter.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++- trace-events | 5 +++ 4 files changed, 124 insertions(+), 2 deletions(-) diff --git a/net/colo-base.c b/net/colo-base.c index 20797b5..88811a9 100644 --- a/net/colo-base.c +++ b/net/colo-base.c @@ -123,6 +123,8 @@ Connection *connection_new(ConnectionKey *key) conn->ip_proto = key->ip_proto; conn->processing = false; + conn->offset = 0; + conn->syn_flag = 0; g_queue_init(&conn->primary_list); g_queue_init(&conn->secondary_list); diff --git a/net/colo-base.h b/net/colo-base.h index 8d402a3..e4288cb 100644 --- a/net/colo-base.h +++ b/net/colo-base.h @@ -50,6 +50,13 @@ typedef struct Connection { /* flag to enqueue unprocessed_connections */ bool processing; uint8_t ip_proto; + /* offset = secondary_seq - primary_seq */ + tcp_seq offset; + /* + * we use this flag update offset func + * run once in independent tcp connection + */ + int syn_flag; } Connection; uint32_t connection_key_hash(const void *opaque); diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index 6350080..6a4294d 100644 --- a/net/filter-rewriter.c +++ b/net/filter-rewriter.c @@ -21,6 +21,7 @@ #include "qemu/main-loop.h" #include "qemu/iov.h" #include "net/checksum.h" +#include "trace.h" #define FILTER_COLO_REWRITER(obj) \ OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER) @@ -65,6 +66,93 @@ static int is_tcp_packet(Packet *pkt) } } +/* handle tcp packet from primary guest */ +static int handle_primary_tcp_pkt(NetFilterState *nf, + Connection *conn, + Packet *pkt) +{ + struct tcphdr *tcp_pkt; + + tcp_pkt = (struct tcphdr *)pkt->transport_layer; + if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) { + char *sdebug, *ddebug; + sdebug = strdup(inet_ntoa(pkt->ip->ip_src)); + ddebug = strdup(inet_ntoa(pkt->ip->ip_dst)); + trace_colo_filter_rewriter_pkt_info(__func__, sdebug, ddebug, + ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack), + tcp_pkt->th_flags); + trace_colo_filter_rewriter_conn_offset(conn->offset); + g_free(sdebug); + g_free(ddebug); + } + + if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) { + /* + * we use this flag update offset func + * run once in independent tcp connection + */ + conn->syn_flag = 1; + } + + if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK)) { + if (conn->syn_flag) { + /* + * offset = secondary_seq - primary seq + * ack packet sent by guest from primary node, + * so we use th_ack - 1 get primary_seq + */ + conn->offset -= (ntohl(tcp_pkt->th_ack) - 1); + conn->syn_flag = 0; + } + /* handle packets to the secondary from the primary */ + tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset); + + net_checksum_calculate((uint8_t *)pkt->data, pkt->size); + } + + return 0; +} + +/* handle tcp packet from secondary guest */ +static int handle_secondary_tcp_pkt(NetFilterState *nf, + Connection *conn, + Packet *pkt) +{ + struct tcphdr *tcp_pkt; + + tcp_pkt = (struct tcphdr *)pkt->transport_layer; + + if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) { + char *sdebug, *ddebug; + sdebug = strdup(inet_ntoa(pkt->ip->ip_src)); + ddebug = strdup(inet_ntoa(pkt->ip->ip_dst)); + trace_colo_filter_rewriter_pkt_info(__func__, sdebug, ddebug, + ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack), + tcp_pkt->th_flags); + trace_colo_filter_rewriter_conn_offset(conn->offset); + g_free(sdebug); + g_free(ddebug); + } + + if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN))) { + /* + * save offset = secondary_seq and then + * in handle_primary_tcp_pkt make offset + * = secondary_seq - primary_seq + */ + conn->offset = ntohl(tcp_pkt->th_seq); + } + + if ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK) { + /* handle packets to the primary from the secondary*/ + tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) - conn->offset); + + net_checksum_calculate((uint8_t *)pkt->data, pkt->size); + } + + return 0; +} + static ssize_t colo_rewriter_receive_iov(NetFilterState *nf, NetClientState *sender, unsigned flags, @@ -104,10 +192,30 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf, if (sender == nf->netdev) { /* NET_FILTER_DIRECTION_TX */ - /* handle_primary_tcp_pkt */ + if (!handle_primary_tcp_pkt(nf, conn, pkt)) { + qemu_net_queue_send(s->incoming_queue, sender, 0, + (const uint8_t *)pkt->data, pkt->size, NULL); + packet_destroy(pkt, NULL); + pkt = NULL; + /* + * We block the packet here,after rewrite pkt + * and will send it + */ + return 1; + } } else { /* NET_FILTER_DIRECTION_RX */ - /* handle_secondary_tcp_pkt */ + if (!handle_secondary_tcp_pkt(nf, conn, pkt)) { + qemu_net_queue_send(s->incoming_queue, sender, 0, + (const uint8_t *)pkt->data, pkt->size, NULL); + packet_destroy(pkt, NULL); + pkt = NULL; + /* + * We block the packet here,after rewrite pkt + * and will send it + */ + return 1; + } } } diff --git a/trace-events b/trace-events index ab22eb2..a12279c 100644 --- a/trace-events +++ b/trace-events @@ -1925,3 +1925,8 @@ colo_compare_icmp_miscompare(const char *sta, int size) ": %s = %d" colo_compare_ip_info(int psize, const char *sta, const char *stb, int ssize, const char *stc, const char *std) "ppkt size = %d, ip_src = %s, ip_dst = %s, spkt size = %d, ip_src = %s, ip_dst = %s" colo_old_packet_check_found(int64_t old_time) "%" PRId64 colo_compare_miscompare(void) "" + +# net/filter-rewriter.c +colo_filter_rewriter_debug(void) "" +colo_filter_rewriter_pkt_info(const char *func, const char *src, const char *dst, uint32_t seq, uint32_t ack, uint32_t flag) "%s: src/dst: %s/%s p: seq/ack=%u/%u flags=%x\n" +colo_filter_rewriter_conn_offset(uint32_t offset) ": offset=%u\n"