From patchwork Wed Nov 11 17:28:59 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Liguori X-Patchwork-Id: 38161 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 2CBFEB6F1E for ; Thu, 12 Nov 2009 05:03:50 +1100 (EST) Received: from localhost ([127.0.0.1]:59694 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N8HXn-0005CD-Gi for incoming@patchwork.ozlabs.org; Wed, 11 Nov 2009 13:03:47 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1N8H0j-0006zx-Qq for qemu-devel@nongnu.org; Wed, 11 Nov 2009 12:29:38 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1N8H0b-0006t6-El for qemu-devel@nongnu.org; Wed, 11 Nov 2009 12:29:34 -0500 Received: from [199.232.76.173] (port=48463 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N8H0a-0006sF-DK for qemu-devel@nongnu.org; Wed, 11 Nov 2009 12:29:28 -0500 Received: from e34.co.us.ibm.com ([32.97.110.152]:48558) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1N8H0Z-00022R-Ng for qemu-devel@nongnu.org; Wed, 11 Nov 2009 12:29:28 -0500 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e34.co.us.ibm.com (8.14.3/8.13.1) with ESMTP id nABHOMmw008716 for ; Wed, 11 Nov 2009 10:24:22 -0700 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id nABHT7FS178634 for ; Wed, 11 Nov 2009 10:29:08 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id nABHT7mN007307 for ; Wed, 11 Nov 2009 10:29:07 -0700 Received: from localhost.localdomain (sig-9-65-32-87.mts.ibm.com [9.65.32.87]) by d03av03.boulder.ibm.com (8.14.3/8.13.1/NCO v10.0 AVin) with ESMTP id nABHT43I007179; Wed, 11 Nov 2009 10:29:07 -0700 From: Anthony Liguori To: qemu-devel@nongnu.org Date: Wed, 11 Nov 2009 11:28:59 -0600 Message-Id: <1257960543-26373-7-git-send-email-aliguori@us.ibm.com> X-Mailer: git-send-email 1.6.2.5 In-Reply-To: <1257960543-26373-1-git-send-email-aliguori@us.ibm.com> References: <1257960543-26373-1-git-send-email-aliguori@us.ibm.com> X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) Cc: Anthony Liguori , Luiz Capitulino Subject: [Qemu-devel] [PATCH 07/11] Add a lexer for JSON X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Our JSON parser is a three stage parser. The first stage tokenizes the stream into a set of lexical tokens. Since the lexical grammar is regular, we can use a finite state machine to model it. The state machine will emit tokens as they are identified. Signed-off-by: Anthony Liguori --- Makefile | 2 +- json-lexer.c | 327 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ json-lexer.h | 50 +++++++++ 3 files changed, 378 insertions(+), 1 deletions(-) create mode 100644 json-lexer.c create mode 100644 json-lexer.h diff --git a/Makefile b/Makefile index 116cd70..e5ab879 100644 --- a/Makefile +++ b/Makefile @@ -135,7 +135,7 @@ obj-y += buffered_file.o migration.o migration-tcp.o qemu-sockets.o obj-y += qemu-char.o aio.o savevm.o obj-y += msmouse.o ps2.o obj-y += qdev.o qdev-properties.o -obj-y += qint.o qstring.o qdict.o qlist.o qfloat.o qbool.o +obj-y += qint.o qstring.o qdict.o qlist.o qfloat.o qbool.o json-lexer.o obj-y += qemu-config.o obj-$(CONFIG_BRLAPI) += baum.o diff --git a/json-lexer.c b/json-lexer.c new file mode 100644 index 0000000..53697c5 --- /dev/null +++ b/json-lexer.c @@ -0,0 +1,327 @@ +/* + * JSON lexer + * + * Copyright IBM, Corp. 2009 + * + * Authors: + * Anthony Liguori + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#include "qstring.h" +#include "qlist.h" +#include "qdict.h" +#include "qint.h" +#include "qemu-common.h" +#include "json-lexer.h" + +/* + * \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\" + * '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*' + * 0|([1-9][0-9]*(.[0-9]+)?([eE]([-+])?[0-9]+)) + * [{}\[\],:] + * [a-z]+ + * + */ + +enum json_lexer_state { + ERROR = 0, + IN_DONE_STRING, + IN_DQ_UCODE3, + IN_DQ_UCODE2, + IN_DQ_UCODE1, + IN_DQ_UCODE0, + IN_DQ_STRING_ESCAPE, + IN_DQ_STRING, + IN_SQ_UCODE3, + IN_SQ_UCODE2, + IN_SQ_UCODE1, + IN_SQ_UCODE0, + IN_SQ_STRING_ESCAPE, + IN_SQ_STRING, + IN_ZERO, + IN_DIGITS, + IN_DIGIT, + IN_EXP_E, + IN_MANTISSA, + IN_MANTISSA_DIGITS, + IN_NONZERO_NUMBER, + IN_NEG_NONZERO_NUMBER, + IN_KEYWORD, + IN_ESCAPE, + IN_ESCAPE_L, + IN_ESCAPE_LL, + IN_ESCAPE_DONE, + IN_WHITESPACE, + IN_OPERATOR_DONE, + IN_START, +}; + +#define TERMINAL(state) [0 ... 0x7F] = (state) + +static const uint8_t json_lexer[][256] = { + [IN_DONE_STRING] = { + TERMINAL(JSON_STRING), + }, + + /* double quote string */ + [IN_DQ_UCODE3] = { + ['0' ... '9'] = IN_DQ_STRING, + ['a' ... 'f'] = IN_DQ_STRING, + ['A' ... 'F'] = IN_DQ_STRING, + }, + [IN_DQ_UCODE2] = { + ['0' ... '9'] = IN_DQ_UCODE3, + ['a' ... 'f'] = IN_DQ_UCODE3, + ['A' ... 'F'] = IN_DQ_UCODE3, + }, + [IN_DQ_UCODE1] = { + ['0' ... '9'] = IN_DQ_UCODE2, + ['a' ... 'f'] = IN_DQ_UCODE2, + ['A' ... 'F'] = IN_DQ_UCODE2, + }, + [IN_DQ_UCODE0] = { + ['0' ... '9'] = IN_DQ_UCODE1, + ['a' ... 'f'] = IN_DQ_UCODE1, + ['A' ... 'F'] = IN_DQ_UCODE1, + }, + [IN_DQ_STRING_ESCAPE] = { + ['b'] = IN_DQ_STRING, + ['f'] = IN_DQ_STRING, + ['n'] = IN_DQ_STRING, + ['r'] = IN_DQ_STRING, + ['t'] = IN_DQ_STRING, + ['\''] = IN_DQ_STRING, + ['\"'] = IN_DQ_STRING, + ['u'] = IN_DQ_UCODE0, + }, + [IN_DQ_STRING] = { + [1 ... 0xFF] = IN_DQ_STRING, + ['\\'] = IN_DQ_STRING_ESCAPE, + ['"'] = IN_DONE_STRING, + }, + + /* single quote string */ + [IN_SQ_UCODE3] = { + ['0' ... '9'] = IN_SQ_STRING, + ['a' ... 'f'] = IN_SQ_STRING, + ['A' ... 'F'] = IN_SQ_STRING, + }, + [IN_SQ_UCODE2] = { + ['0' ... '9'] = IN_SQ_UCODE3, + ['a' ... 'f'] = IN_SQ_UCODE3, + ['A' ... 'F'] = IN_SQ_UCODE3, + }, + [IN_SQ_UCODE1] = { + ['0' ... '9'] = IN_SQ_UCODE2, + ['a' ... 'f'] = IN_SQ_UCODE2, + ['A' ... 'F'] = IN_SQ_UCODE2, + }, + [IN_SQ_UCODE0] = { + ['0' ... '9'] = IN_SQ_UCODE1, + ['a' ... 'f'] = IN_SQ_UCODE1, + ['A' ... 'F'] = IN_SQ_UCODE1, + }, + [IN_SQ_STRING_ESCAPE] = { + ['b'] = IN_SQ_STRING, + ['f'] = IN_SQ_STRING, + ['n'] = IN_SQ_STRING, + ['r'] = IN_SQ_STRING, + ['t'] = IN_SQ_STRING, + ['\''] = IN_SQ_STRING, + ['\"'] = IN_SQ_STRING, + ['u'] = IN_SQ_UCODE0, + }, + [IN_SQ_STRING] = { + [1 ... 0xFF] = IN_SQ_STRING, + ['\\'] = IN_SQ_STRING_ESCAPE, + ['\''] = IN_DONE_STRING, + }, + + /* Zero */ + [IN_ZERO] = { + TERMINAL(JSON_INTEGER), + ['0' ... '9'] = ERROR, + ['.'] = IN_MANTISSA, + }, + + /* Float */ + [IN_DIGITS] = { + TERMINAL(JSON_FLOAT), + ['0' ... '9'] = IN_DIGITS, + }, + + [IN_DIGIT] = { + ['0' ... '9'] = IN_DIGITS, + }, + + [IN_EXP_E] = { + ['-'] = IN_DIGIT, + ['+'] = IN_DIGIT, + ['0' ... '9'] = IN_DIGITS, + }, + + [IN_MANTISSA_DIGITS] = { + TERMINAL(JSON_FLOAT), + ['0' ... '9'] = IN_MANTISSA_DIGITS, + ['e'] = IN_EXP_E, + ['E'] = IN_EXP_E, + }, + + [IN_MANTISSA] = { + ['0' ... '9'] = IN_MANTISSA_DIGITS, + }, + + /* Number */ + [IN_NONZERO_NUMBER] = { + TERMINAL(JSON_INTEGER), + ['0' ... '9'] = IN_NONZERO_NUMBER, + ['e'] = IN_EXP_E, + ['E'] = IN_EXP_E, + ['.'] = IN_MANTISSA, + }, + + [IN_NEG_NONZERO_NUMBER] = { + ['0'] = IN_ZERO, + ['1' ... '9'] = IN_NONZERO_NUMBER, + }, + + /* keywords */ + [IN_KEYWORD] = { + TERMINAL(JSON_KEYWORD), + ['a' ... 'z'] = IN_KEYWORD, + }, + + /* whitespace */ + [IN_WHITESPACE] = { + TERMINAL(JSON_SKIP), + [' '] = IN_WHITESPACE, + ['\t'] = IN_WHITESPACE, + ['\r'] = IN_WHITESPACE, + ['\n'] = IN_WHITESPACE, + }, + + /* operator */ + [IN_OPERATOR_DONE] = { + TERMINAL(JSON_OPERATOR), + }, + + /* escape */ + [IN_ESCAPE_DONE] = { + TERMINAL(JSON_ESCAPE), + }, + + [IN_ESCAPE_LL] = { + ['d'] = IN_ESCAPE_DONE, + }, + + [IN_ESCAPE_L] = { + ['d'] = IN_ESCAPE_DONE, + ['l'] = IN_ESCAPE_LL, + }, + + [IN_ESCAPE] = { + ['d'] = IN_ESCAPE_DONE, + ['i'] = IN_ESCAPE_DONE, + ['p'] = IN_ESCAPE_DONE, + ['s'] = IN_ESCAPE_DONE, + ['f'] = IN_ESCAPE_DONE, + ['l'] = IN_ESCAPE_L, + }, + + /* top level rule */ + [IN_START] = { + ['"'] = IN_DQ_STRING, + ['\''] = IN_SQ_STRING, + ['0'] = IN_ZERO, + ['1' ... '9'] = IN_NONZERO_NUMBER, + ['-'] = IN_NEG_NONZERO_NUMBER, + ['{'] = IN_OPERATOR_DONE, + ['}'] = IN_OPERATOR_DONE, + ['['] = IN_OPERATOR_DONE, + [']'] = IN_OPERATOR_DONE, + [','] = IN_OPERATOR_DONE, + [':'] = IN_OPERATOR_DONE, + ['a' ... 'z'] = IN_KEYWORD, + ['%'] = IN_ESCAPE, + [' '] = IN_WHITESPACE, + ['\t'] = IN_WHITESPACE, + ['\r'] = IN_WHITESPACE, + ['\n'] = IN_WHITESPACE, + }, +}; + +void json_lexer_init(JSONLexer *lexer, JSONLexerEmitter func) +{ + lexer->emit = func; + lexer->state = IN_START; + lexer->token = qstring_new(); +} + +static int json_lexer_feed_char(JSONLexer *lexer, char ch) +{ + char buf[2]; + + lexer->x++; + if (ch == '\n') { + lexer->x = 0; + lexer->y++; + } + + lexer->state = json_lexer[lexer->state][(uint8_t)ch]; + + switch (lexer->state) { + case JSON_OPERATOR: + case JSON_ESCAPE: + case JSON_INTEGER: + case JSON_FLOAT: + case JSON_KEYWORD: + case JSON_STRING: + lexer->emit(lexer, lexer->token, lexer->state, lexer->x, lexer->y); + case JSON_SKIP: + lexer->state = json_lexer[IN_START][(uint8_t)ch]; + QDECREF(lexer->token); + lexer->token = qstring_new(); + break; + case ERROR: + return -EINVAL; + default: + break; + } + + buf[0] = ch; + buf[1] = 0; + + qstring_append(lexer->token, buf); + + return 0; +} + +int json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size) +{ + size_t i; + + for (i = 0; i < size; i++) { + int err; + + err = json_lexer_feed_char(lexer, buffer[i]); + if (err < 0) { + return err; + } + } + + return 0; +} + +int json_lexer_flush(JSONLexer *lexer) +{ + return json_lexer_feed_char(lexer, 0); +} + +void json_lexer_destroy(JSONLexer *lexer) +{ + QDECREF(lexer->token); +} diff --git a/json-lexer.h b/json-lexer.h new file mode 100644 index 0000000..3b50c46 --- /dev/null +++ b/json-lexer.h @@ -0,0 +1,50 @@ +/* + * JSON lexer + * + * Copyright IBM, Corp. 2009 + * + * Authors: + * Anthony Liguori + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#ifndef QEMU_JSON_LEXER_H +#define QEMU_JSON_LEXER_H + +#include "qstring.h" +#include "qlist.h" + +typedef enum json_token_type { + JSON_OPERATOR = 100, + JSON_INTEGER, + JSON_FLOAT, + JSON_KEYWORD, + JSON_STRING, + JSON_ESCAPE, + JSON_SKIP, +} JSONTokenType; + +typedef struct JSONLexer JSONLexer; + +typedef void (JSONLexerEmitter)(JSONLexer *, QString *, JSONTokenType, int x, int y); + +struct JSONLexer +{ + JSONLexerEmitter *emit; + int state; + QString *token; + int x, y; +}; + +void json_lexer_init(JSONLexer *lexer, JSONLexerEmitter func); + +int json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size); + +int json_lexer_flush(JSONLexer *lexer); + +void json_lexer_destroy(JSONLexer *lexer); + +#endif