Message ID | 4AD87424.3010000@redhat.com |
---|---|
State | New |
Headers | show |
Paolo Bonzini wrote: > It's 36k, and pulling it in gives the opportunity to customize it. > For example, the attached patch allows to parse a "%BLAH" extension to > JSON that is passed to the callback (since the parsing is done > character-by-character, the callback can consume whatever it wants > after the % sign). Asprintf+parse JSON unfortunately isn't enough > because you'd need to escape all strings. What's the state of this library's upstream? Should we be pushing these changes there and then attempting to package it? I'd rather pull this in a submodule, try to get it packaged properly, and then eventually drop the submodule. I don't want us to fork the library unless we have to. Regards, Anthony Liguori
On 10/16/2009 03:45 PM, Anthony Liguori wrote: > Paolo Bonzini wrote: >> It's 36k, and pulling it in gives the opportunity to customize it. For >> example, the attached patch allows to parse a "%BLAH" extension to >> JSON that is passed to the callback (since the parsing is done >> character-by-character, the callback can consume whatever it wants >> after the % sign). Asprintf+parse JSON unfortunately isn't enough >> because you'd need to escape all strings. > > What's the state of this library's upstream? Should we be pushing these > changes there and then attempting to package it? There's no repository, there's no mention of it in the author's blog, it has seen six changes in two years according to the file's heading. The only reference on da Internet is at http://tech.groups.yahoo.com/group/json/message/928. On the other hand, it's down to the point (it has no object model of it's own), and it is fully asynchronous since it works character-by-character which makes it easier to extend as in my patch above. Paolo
Paolo Bonzini wrote: > On 10/16/2009 03:45 PM, Anthony Liguori wrote: >> Paolo Bonzini wrote: >>> It's 36k, and pulling it in gives the opportunity to customize it. For >>> example, the attached patch allows to parse a "%BLAH" extension to >>> JSON that is passed to the callback (since the parsing is done >>> character-by-character, the callback can consume whatever it wants >>> after the % sign). Asprintf+parse JSON unfortunately isn't enough >>> because you'd need to escape all strings. >> >> What's the state of this library's upstream? Should we be pushing these >> changes there and then attempting to package it? > > There's no repository, there's no mention of it in the author's blog, > it has seen six changes in two years according to the file's heading. > The only reference on da Internet is at > http://tech.groups.yahoo.com/group/json/message/928. > > On the other hand, it's down to the point (it has no object model of > it's own), and it is fully asynchronous since it works > character-by-character which makes it easier to extend as in my patch > above. Ugh! I hate people trying to be clever. The copyright is: /* Copyright (c) 2005 JSON.org Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. The Software shall be used for Good, not Evil. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ "The Software shall be used for Good, not Evil." is added as part of the licensing text. That screws up the otherwise X11 license and is highly unlikely to be GPL compatible. We can't pull this into the tree or even link against it as a library. Try contacting the others and see about getting that silliness removed. Regards, Anthony Liguori
On 10/16/2009 07:38 PM, Anthony Liguori wrote: > > Ugh! I hate people trying to be clever. Grrr, good catch. I wrote this to the guy via Yahoo! but I cannot see his email address, so I've no clue if the email will actually reach him. > Hi, the QEMU project is discussing using your JSON parser. However, > the sentence "The Software shall be used for Good, not Evil" that > appears in the file is too clever and (even though the humorous > intent is obvious) it could be considered GPL-incompatible (or > any-other-license-incompatible for that matter). > > Would you consider removing that sentence from http://fara.cs.uni- > potsdam.de/~jsg/json_parser/JSON_parser.c? If you cannot, you can > send it to me and CC anthony@codemonkey.ws. > > Thanks in advance. > > Paolo Bonzini There can always be a plan B---Dan Berrange found a parser with a similar interface and if the weather doesn't improve I may even give a shot at writing one over the weekend. Paolo
Paolo Bonzini wrote: > > > Thanks in advance. > > > > Paolo Bonzini > > There can always be a plan B---Dan Berrange found a parser with a > similar interface and if the weather doesn't improve I may even give a > shot at writing one over the weekend. I already am :-) Stay tuned, I should have a patch later this afternoon. I'd like to move all of the QObject/json code to a shared library too so that other tools like libvirt can just use that code. Ideally, we would also provide a higher level monitor API too. > Paolo Regards, Anthony Liguori
On 10/16/2009 11:37 PM, Anthony Liguori wrote: > > I already am :-) Stay tuned, I should have a patch later this afternoon. Was it a race? (Seriously, sorry I didn't notice a couple of hours ago). This one is ~5% slower than the "Evil" one, but half the size. Tested against the comments.json file from the "Evil" parser and with valgrind too. Does all the funky Unicode stuff too. Paolo /* * An event-based, asynchronous JSON parser. * * Copyright (C) 2009 Red Hat Inc. * * Authors: * Paolo Bonzini <pbonzini@redhat.com> * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell * copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #include "json.h" #include <string.h> #include <stdlib.h> /* Common character classes. */ #define CASE_XDIGIT \ case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': \ case 'A': case 'B': case 'C': case 'D': case 'E': case 'F' #define CASE_DIGIT \ case '0': case '1': case '2': case '3': case '4': \ case '5': case '6': case '7': case '8': case '9' /* Helper function to go from \uXXXX-encoded UTF-16 to UTF-8. */ static bool hex_to_utf8 (char *buf, char **dest, char *src) { int i, n; uint8_t *p; for (i = n = 0; i < 4; i++) { n <<= 4; switch (src[i]) { CASE_DIGIT: n |= src[i] - '0'; break; CASE_XDIGIT: n |= (src[i] & ~32) - 'A' + 10; break; default: return false; } } p = (uint8_t *)*dest; if (n < 128) { *p++ = n; } else if (n < 2048) { *p++ = 0xC0 | (n >> 6); *p++ = 0x80 | (n & 63); } else if (n < 0xDC00 || n > 0xDFFF) { *p++ = 0xE0 | (n >> 12); *p++ = 0x80 | ((n >> 6) & 63); *p++ = 0x80 | (n & 63); } else { /* Merge with preceding high surrogate. */ if (p - (uint8_t *)buf < 3 || p[-3] != 0xED || p[-2] < 0xA0 || p[-2] > 0xAF) /* 0xD800..0xDBFF */ return false; n += 0x10000 - 0xDC00; n |= ((p[-2] & 15) << 16) | ((p[-1] & 63) << 10); /* Overwrite high surrogate. */ p[-3] = 0xF0 | (n >> 18); p[-2] = 0x80 | ((n >> 12) & 63); p[-1] = 0x80 | ((n >> 6) & 63); *p++ = 0x80 | (n & 63); } *dest = (char *)p; return true; } struct json_parser { struct json_parser_config c; size_t n, alloc; char *buf; size_t sp; uint32_t state, stack[128]; char start_buffer[4]; }; /* Managing the state stack. */ static inline uint32_t *push_state (struct json_parser *p) { p->stack[p->sp++] = p->state; return &p->state; } static inline void pop_state (struct json_parser *p) { p->state = p->stack[--p->sp]; } /* Managing the string/number buffer. */ static inline void clear_buffer (struct json_parser *p) { p->n = 0; } static inline void push_buffer (struct json_parser *p, char c) { if (p->n == p->alloc) { size_t new_alloc = p->alloc * 2; if (p->buf == p->start_buffer) { p->buf = malloc (new_alloc); memcpy (p->buf, p->start_buffer, p->alloc); } else { p->buf = realloc (p->buf, new_alloc); } p->alloc = new_alloc; } p->buf[p->n++] = c; } /* * Parser states are organized like this: * bit 0-7: enum parser_state * bit 8-15: for IN_KEYWORD, index in keyword table * bit 16-31: additional substate (enum parser_cookies) */ enum parser_state { START_PARSE, /* at start of parsing */ IN_KEYWORD, /* parsing keyword (match exactly) */ START_KEY, /* expecting key */ END_KEY, /* expecting colon */ START_VALUE, /* expecting value */ END_VALUE, /* expecting comma or closing parenthesis */ IN_NUMBER, /* parsing number (up to whitespace) */ IN_STRING, /* parsing string */ IN_STRING_BACKSLASH, /* parsing string, copy one char verbatim */ IN_COMMENT, /* comment mini-scanner */ }; enum parser_cookies { IN_UNUSED, IN_TRUE, /* for IN_KEYWORD */ IN_FALSE, IN_NULL, IN_ARRAY, /* for {START,END}_{KEY,VALUE} */ IN_DICT, IN_KEY, /* for IN_STRING */ IN_VALUE, }; #define STATE(state, cookie) \ (((cookie) << 16) | (state)) #define STATE_KEYWORD(n, cookie) \ (((cookie) << 16) | ((n) << 8) | IN_KEYWORD) static const char keyword_table[] = "rue\0alse\0ull"; enum keyword_indices { KW_TRUE = 0, KW_FALSE = 4, KW_NULL = 9, }; /* Parser actions. These transfer to the appropriate state, * and invoke the callbacks. * * If there is a begin/end pair, begin pushes a state * and end pops it. */ static inline bool array_begin (struct json_parser *p) { *push_state (p) = STATE (START_VALUE, IN_ARRAY); return !p->c.array_begin || p->c.array_begin (p->c.data); } static inline bool array_end (struct json_parser *p) { int state_cookie = (p->state >> 16); if (state_cookie != IN_ARRAY) return false; pop_state (p); return !p->c.array_end || p->c.array_end (p->c.data); } static inline bool object_begin (struct json_parser *p) { *push_state (p) = STATE (START_KEY, IN_DICT); return !p->c.object_begin || p->c.object_begin (p->c.data); } static inline bool object_end (struct json_parser *p) { int state_cookie = (p->state >> 16); if (state_cookie != IN_DICT) return false; pop_state (p); return !p->c.object_end || p->c.object_end (p->c.data); } static inline bool key_user (struct json_parser *p) { return p->c.value_user && p->c.key (p->c.data, NULL, 0); } static inline bool number_begin (struct json_parser *p, char ch) { *push_state (p) = IN_NUMBER; push_buffer (p, ch); return true; } static inline bool number_end (struct json_parser *p) { char *end; bool result; long long ll; double d; pop_state (p); push_buffer (p, 0); ll = strtoll (p->buf, &end, 0); if (!*end) result = (!p->c.value_integer || p->c.value_integer (p->c.data, ll)); else { d = strtod (p->buf, &end); result = (!*end && (!p->c.value_float || p->c.value_float (p->c.data, d))); } clear_buffer(p); return result; } static inline bool value_null (struct json_parser *p) { return !p->c.value_null || p->c.value_null (p->c.data); } static inline bool value_boolean (struct json_parser *p, int n) { return !p->c.value_boolean || p->c.value_boolean (p->c.data, n); } static inline bool string_begin (struct json_parser *p, int cookie) { *push_state (p) = STATE (IN_STRING, cookie); return true; } static inline bool string_end (struct json_parser *p, int cookie) { bool result; char *buf, *src, *dest; size_t n; pop_state (p); push_buffer (p, 0); /* Unescape in place. */ for (n = p->n, buf = src = dest = p->buf; n > 0; n--) { if (*src != '\\') { *dest++ = *src++; continue; } if (n < 2) return false; src++; n--; switch (*src++) { case 'b': *dest++ = '\b'; continue; case 'f': *dest++ = '\f'; continue; case 'n': *dest++ = '\n'; continue; case 'r': *dest++ = '\r'; continue; case 't': *dest++ = '\t'; continue; case 'U': case 'u': /* The [uU] has not been removed from n yet, hence subtract 5. */ if (n < 5 || !hex_to_utf8 (buf, &dest, src)) return false; src += 4; n -= 4; continue; default: *dest++ = src[-1]; continue; } } buf = p->buf; n = dest - buf; if (cookie == IN_KEY) result = !p->c.key || p->c.key (p->c.data, buf, n); else result = !p->c.value_string || p->c.value_string (p->c.data, buf, n); clear_buffer(p); return result; } static inline bool value_user (struct json_parser *p) { return p->c.value_user && p->c.value_user (p->c.data); } static inline bool comment (struct json_parser *p) { return !p->c.comment || p->c.comment (p->c.data, p->buf, p->n); } bool json_parser_char(struct json_parser *p, char ch) { for (;;) { int state = p->state & 255; int state_data = (p->state >> 8) & 255; int state_cookie = (p->state >> 16); // printf ("%d %d | %d %d\n", state, ch, state_cookie, p->sp); /* The big ugly parser. Each case will always return or * continue, and we want to check this at link time if * possible. */ #ifndef __OPTIMIZE__ #define link_error abort #endif extern void link_error (void); switch (state) { /* First, however, a helpful definition... */ #define SKIP_WHITE \ switch (ch) { \ case '/': goto do_start_comment; \ case ' ': case '\t': case '\n': case '\r': case '\f': \ return true; \ default: \ break; \ } /* Unlike START_VALUE, this only accepts compound values. */ case START_PARSE: SKIP_WHITE; p->state = STATE (END_VALUE, state_cookie); switch (ch) { case '[': return array_begin (p); case '{': return object_begin (p); default: return false; } link_error (); /* Only strings and user values are accepted here. */ case START_KEY: SKIP_WHITE; p->state = STATE (END_KEY, IN_DICT); switch (ch) { case '"': return string_begin (p, IN_KEY); case '%': return key_user (p); case '}': return object_end (p); default: return false; } link_error (); /* Accept any Javascript literal. Checking p->sp ensures that * something like "[] []" is rejected (the first array is parsed * from START_PARSE. */ case START_VALUE: SKIP_WHITE; if (p->sp == 0) return false; p->state = STATE (END_VALUE, state_cookie); switch (ch) { case 't': *push_state (p) = STATE_KEYWORD(KW_TRUE, IN_TRUE); return true; case 'f': *push_state (p) = STATE_KEYWORD(KW_FALSE, IN_FALSE); return true; case 'n': *push_state (p) = STATE_KEYWORD(KW_NULL, IN_NULL); return true; case '"': return string_begin (p, IN_VALUE); case '-': CASE_DIGIT: return number_begin (p, ch); case '[': return array_begin (p); case '{': return object_begin (p); case '%': return value_user (p); case ']': return array_end (p); default: return false; } link_error (); /* End of a key, look for a colon. */ case END_KEY: SKIP_WHITE; p->state = STATE (START_VALUE, IN_DICT); return (ch == ':'); /* End of a value, look for a comma or closing parenthesis. */ case END_VALUE: SKIP_WHITE; p->state = STATE (state_cookie == IN_DICT ? START_KEY : START_VALUE, state_cookie); switch (ch) { case ',': return true; case '}': return object_end (p); case ']': return array_end (p); default: return false; } link_error (); /* Table-driven keyword scanner. Advance until mismatch or end * of keyword. */ case IN_KEYWORD: if (ch != keyword_table[state_data]) return false; if (keyword_table[state_data + 1] != 0) { p->state = STATE_KEYWORD(state_data + 1, state_cookie); return true; } pop_state (p); switch (state_cookie) { case IN_TRUE: return value_boolean (p, 1); case IN_FALSE: return value_boolean (p, 0); case IN_NULL: return value_null (p); default: abort (); } link_error (); /* Eat until closing quote (special-casing \"). */ case IN_STRING: switch (ch) { case '"': return string_end (p, state_cookie); case '\\': p->state = STATE (IN_STRING_BACKSLASH, state_cookie); default: push_buffer (p, ch); return true; } link_error (); /* Eat any character */ case IN_STRING_BACKSLASH: push_buffer (p, ch); p->state = STATE (IN_STRING, state_cookie); return true; /* Eat until a "bad" character is found, then we refine with * strtod/strtoll. The character we end on is reprocessed in * the new state! */ case IN_NUMBER: switch (ch) { case '+': case '-': case '.': CASE_DIGIT: CASE_XDIGIT: push_buffer (p, ch); return true; default: if (!number_end (p)) return false; continue; } link_error (); /* Parse until '*' '/', then convert the whole comment to a * single blank and rescan. */ do_start_comment: *push_state(p) = IN_COMMENT; if (p->c.comment) push_buffer(p, ch); return true; case IN_COMMENT: if (p->c.comment) push_buffer(p, ch); if (state_cookie == 0 && ch != '*') return false; else if (state_cookie == 0 ) state_cookie = 1; else if (state_cookie == 1 && ch == '*') state_cookie = 2; else if (state_cookie == 2 && ch == '*') state_cookie = 2; else if (state_cookie == 2 && ch == '/') state_cookie = 3; else state_cookie = 1; if (state_cookie < 3) { p->state = STATE(state, state_cookie); return true; } else { comment (p); pop_state (p); ch = ' '; continue; } link_error (); default: abort (); } link_error (); } } bool json_parser_string(struct json_parser *p, char *s, size_t n) { while (n--) if (!json_parser_char(p, *s++)) return false; return true; } struct json_parser *json_parser_new(struct json_parser_config *config) { struct json_parser *p; p = malloc (sizeof *p); memcpy (&p->c, config, sizeof *config); p->n = 0; p->alloc = sizeof p->start_buffer; p->state = START_PARSE; p->buf = p->start_buffer; p->sp = 0; return p; } bool json_parser_destroy(struct json_parser *p) { bool result = (p->state == END_VALUE) && (p->sp == 0); if (p->buf != p->start_buffer) free (p->buf); free (p); return result; } /* * An event-based, asynchronous JSON parser. * * Copyright (C) 2009 Red Hat Inc. * * Authors: * Paolo Bonzini <pbonzini@redhat.com> * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell * copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #ifndef JSON_H #define JSON_H #include <stddef.h> #include <stdint.h> #include <stdbool.h> struct json_parser_config { bool (*array_begin) (void *); bool (*array_end) (void *); bool (*object_begin) (void *); bool (*object_end) (void *); bool (*key) (void *, const char *, size_t); bool (*value_integer) (void *, long long); bool (*value_float) (void *, double); bool (*value_null) (void *); bool (*value_boolean) (void *, int); bool (*value_string) (void *, const char *, size_t); bool (*value_user) (void *); bool (*comment) (void *, const char *, size_t); void *data; }; struct json_parser; struct json_parser *json_parser_new(struct json_parser_config *config); bool json_parser_destroy(struct json_parser *p); bool json_parser_char(struct json_parser *p, char ch); bool json_parser_string(struct json_parser *p, char *buf, size_t n); #endif /* JSON_H */ /* main.c */ /* This program demonstrates a simple application of JSON_parser. It reads a JSON text from STDIN, producing an error message if the text is rejected. % JSON_parser <test/pass1.json */ #include <stdlib.h> #include <stdio.h> #include <string.h> #include <assert.h> #include <locale.h> #include "json.h" #include <stddef.h> #include <stdint.h> #include <stdbool.h> static int level = 0; static int got_key = 0; static void print_indent() { printf ("%*s", 2 * level, ""); } static bool array_begin (void *data) { if (!got_key) print_indent(); else got_key = 0; printf ("[\n"); ++level; return true; } static bool array_end (void *data) { --level; print_indent (); printf ("]\n"); return true; } static bool object_begin (void *data) { if (!got_key) print_indent(); else got_key = 0; printf ("{\n"); ++level; return true; } static bool object_end (void *data) { --level; print_indent (); printf ("}\n"); return true; } static bool key (void *data, const char *buf, size_t n) { got_key = 1; print_indent (); if (buf) printf ("key = '%s', value = ", buf); else printf ("user key = %%%c, value = ", getchar()); return true; } static bool value_integer (void *data, long long ll) { if (!got_key) print_indent(); else got_key = 0; printf ("integer: %lld\n", ll); return true; } static bool value_float (void *data, double d) { if (!got_key) print_indent(); else got_key = 0; printf ("float: %f\n", d); return true; } static bool value_null (void *data) { if (!got_key) print_indent(); else got_key = 0; printf ("null\n"); return true; } static bool value_boolean (void *data, int val) { if (!got_key) print_indent(); else got_key = 0; printf ("%s\n", val ? "true" : "false"); return true; } static bool value_string (void *data, const char *buf, size_t n) { if (!got_key) print_indent(); else got_key = 0; printf ("string: '%s'\n", buf); return true; } static bool value_user (void *data) { if (!got_key) print_indent(); else got_key = 0; printf ("user: %%%c\n", getchar()); return true; } int main(int argc, char* argv[]) { static struct json_parser_config parser_config = { .array_begin = array_begin, .array_end = array_end, .object_begin = object_begin, .object_end = object_end, .key = key, .value_integer = value_integer, .value_float = value_float, .value_null = value_null, .value_boolean = value_boolean, .value_string = value_string, .value_user = value_user, }; struct json_parser *p = json_parser_new(&parser_config); int count = 0; int ch; while ((ch = getchar ()) != EOF && json_parser_char (p, ch)) count++; if (ch != EOF) { fprintf (stderr, "error at character %d\n", count); exit (1); } if (!json_parser_destroy (p)) { fprintf (stderr, "error at end of file\n"); exit (1); } exit (0); }
On Sat, 17 Oct 2009, Paolo Bonzini wrote: > On 10/16/2009 11:37 PM, Anthony Liguori wrote: > > > > I already am :-) Stay tuned, I should have a patch later this afternoon. > > Was it a race? (Seriously, sorry I didn't notice a couple of hours ago). > > This one is ~5% slower than the "Evil" one, but half the size. Tested against > the comments.json file from the "Evil" parser and with valgrind too. Does all > the funky Unicode stuff too. > Just from cursory glance: a. allocation can fail b. strtod is locale dependent
On 10/17/2009 02:38 AM, malc wrote: > a. allocation can fail s/malloc/qemu_malloc/ etc. when it is time to merge. > b. strtod is locale dependent Right, but qemu probably would prefer to always do setlocale (LC_NUMERIC, "C"), or add a c_strtod function like http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob_plain;f=lib/c-strtod.c Thanks for the review, any additional pair of eyes can only help. Paolo
Paolo Bonzini wrote: > On 10/16/2009 11:37 PM, Anthony Liguori wrote: >> >> I already am :-) Stay tuned, I should have a patch later this >> afternoon. > > Was it a race? (Seriously, sorry I didn't notice a couple of hours ago). > > This one is ~5% slower than the "Evil" one, but half the size. Tested > against the comments.json file from the "Evil" parser and with > valgrind too. Does all the funky Unicode stuff too. > > Paolo > /* > * An event-based, asynchronous JSON parser. > * > * Copyright (C) 2009 Red Hat Inc. > * > * Authors: > * Paolo Bonzini <pbonzini@redhat.com> > * > * Permission is hereby granted, free of charge, to any person obtaining a copy > * of this software and associated documentation files (the "Software"), to deal > * in the Software without restriction, including without limitation the rights > * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > * copies of the Software, and to permit persons to whom the Software is > * furnished to do so, subject to the following conditions: > * > * The above copyright notice and this permission notice shall be included in > * all copies or substantial portions of the Software. > * > * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE > * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, > * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > * SOFTWARE. > */ > > > #include "json.h" > #include <string.h> > #include <stdlib.h> > > /* Common character classes. */ > > #define CASE_XDIGIT \ > case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': \ > case 'A': case 'B': case 'C': case 'D': case 'E': case 'F' > > #define CASE_DIGIT \ > case '0': case '1': case '2': case '3': case '4': \ > case '5': case '6': case '7': case '8': case '9' > > /* Helper function to go from \uXXXX-encoded UTF-16 to UTF-8. */ > > static bool hex_to_utf8 (char *buf, char **dest, char *src) > { > int i, n; > uint8_t *p; > > for (i = n = 0; i < 4; i++) { > n <<= 4; > switch (src[i]) > { > CASE_DIGIT: n |= src[i] - '0'; break; > CASE_XDIGIT: n |= (src[i] & ~32) - 'A' + 10; break; > default: return false; > } > } > > p = (uint8_t *)*dest; > if (n < 128) { > *p++ = n; > } else if (n < 2048) { > *p++ = 0xC0 | (n >> 6); > *p++ = 0x80 | (n & 63); > } else if (n < 0xDC00 || n > 0xDFFF) { > *p++ = 0xE0 | (n >> 12); > *p++ = 0x80 | ((n >> 6) & 63); > *p++ = 0x80 | (n & 63); > } else { > /* Merge with preceding high surrogate. */ > if (p - (uint8_t *)buf < 3 > || p[-3] != 0xED > || p[-2] < 0xA0 || p[-2] > 0xAF) /* 0xD800..0xDBFF */ > return false; > > n += 0x10000 - 0xDC00; > n |= ((p[-2] & 15) << 16) | ((p[-1] & 63) << 10); > > /* Overwrite high surrogate. */ > p[-3] = 0xF0 | (n >> 18); > p[-2] = 0x80 | ((n >> 12) & 63); > p[-1] = 0x80 | ((n >> 6) & 63); > *p++ = 0x80 | (n & 63); > } > *dest = (char *)p; > return true; > } > > struct json_parser { > struct json_parser_config c; > size_t n, alloc; > char *buf; > size_t sp; > uint32_t state, stack[128]; > char start_buffer[4]; > }; > Having an explicit stack is unnecessary I think. You can use a very simple scheme to detect the end of messages by simply counting {}, [], and being aware of the lexical rules. Regards, Anthony Liguori
Paolo Bonzini wrote: > On 10/16/2009 11:37 PM, Anthony Liguori wrote: >> >> I already am :-) Stay tuned, I should have a patch later this >> afternoon. > > Was it a race? (Seriously, sorry I didn't notice a couple of hours ago). > > This one is ~5% slower than the "Evil" one, but half the size. Tested > against the comments.json file from the "Evil" parser and with > valgrind too. Does all the funky Unicode stuff too. I haven't benchmarked mine. While yours came out an hour earlier, I included a full test suite, output QObjects, and support vararg parsing so I think I win :-) Regards, Anthony Liguori
On 10/17/2009 03:50 AM, Anthony Liguori wrote: > Paolo Bonzini wrote: >> On 10/16/2009 11:37 PM, Anthony Liguori wrote: >>> >>> I already am :-) Stay tuned, I should have a patch later this afternoon. >> >> Was it a race? (Seriously, sorry I didn't notice a couple of hours ago). >> >> This one is ~5% slower than the "Evil" one, but half the size. Tested >> against the comments.json file from the "Evil" parser and with >> valgrind too. Does all the funky Unicode stuff too. > > I haven't benchmarked mine. While yours came out an hour earlier, I > included a full test suite, output QObjects, and support vararg parsing > so I think I win :-) Heh, Luiz and I had talked offlist and he'd take care of the rest (except the test suite) :-). > Having an explicit stack is unnecessary I think. I'm curious to see yours now---the stack is used to detect things like [{"a":"b"},"c":"d"]. You could do that in the event handlers of course, but that kind of breaks the interface between the parser and event handlers. Paolo
Anthony Liguori wrote: > Paolo Bonzini wrote: >> On 10/16/2009 11:37 PM, Anthony Liguori wrote: >>> >>> I already am :-) Stay tuned, I should have a patch later this >>> afternoon. >> >> Was it a race? (Seriously, sorry I didn't notice a couple of hours >> ago). >> >> This one is ~5% slower than the "Evil" one, but half the size. >> Tested against the comments.json file from the "Evil" parser and with >> valgrind too. Does all the funky Unicode stuff too. > > I haven't benchmarked mine. While yours came out an hour earlier, I > included a full test suite, output QObjects, and support vararg > parsing so I think I win :-) ar.. got mine too, i've been doing for the last 3 weeks slowly; it got a raw/pretty printer, an interruptible parser (on the same idea as JSON_parser.c), it's faster than JSON_parser.c [1], it's completely generic (more like a library than an embedded thing), fully JSON compliant (got a test suite too), support user supplied alloc functions, and callback for integer/float doesn't have their data converted automatically which means that the user of the library can use whatever it want to support the non-limited size JSON number (or just return errors for user that want the limit). the library by itself is 39K with -g last time i've looked. also the library comes with a jsonlint binary that's equivalent to xmllint (well formatting and verification). I'll package thing up and post a link to it on monday.
On Sat, 17 Oct 2009 11:01:33 +0100 Vincent Hanquez <vincent@snarc.org> wrote: > Anthony Liguori wrote: > > Paolo Bonzini wrote: > >> On 10/16/2009 11:37 PM, Anthony Liguori wrote: > >>> > >>> I already am :-) Stay tuned, I should have a patch later this > >>> afternoon. > >> > >> Was it a race? (Seriously, sorry I didn't notice a couple of hours > >> ago). > >> > >> This one is ~5% slower than the "Evil" one, but half the size. > >> Tested against the comments.json file from the "Evil" parser and with > >> valgrind too. Does all the funky Unicode stuff too. > > > > I haven't benchmarked mine. While yours came out an hour earlier, I > > included a full test suite, output QObjects, and support vararg > > parsing so I think I win :-) > ar.. got mine too, i've been doing for the last 3 weeks slowly; Very nice to see all these contributions. > it got a raw/pretty printer, an interruptible parser (on the same idea > as JSON_parser.c), it's faster than JSON_parser.c [1], > it's completely generic (more like a library than an embedded thing), > fully JSON compliant (got a test suite too), support > user supplied alloc functions, and callback for integer/float doesn't > have their data converted automatically which means > that the user of the library can use whatever it want to support the > non-limited size JSON number (or just return errors for user that want > the limit). > > the library by itself is 39K with -g last time i've looked. Integration with QObjects is a killer feature, I think it's the stronger argument against grabbing one from the internet.
On 10/18/2009 04:06 PM, Luiz Capitulino wrote: > Integration with QObjects is a killer feature, I think it's the > stronger argument against grabbing one from the internet. Yeah, I'd say let's go with Anthony's stuff. I'll rebase the encoder on top of it soonish (I still think it's best if JSON encoding lies in QObject like a kind of toString). If we'll need the asynchronous parsing later, we can easily replace it with mine or Vincent's. Paolo
Paolo Bonzini wrote: > On 10/18/2009 04:06 PM, Luiz Capitulino wrote: >> Integration with QObjects is a killer feature, I think it's the >> stronger argument against grabbing one from the internet. > > Yeah, I'd say let's go with Anthony's stuff. I'll rebase the encoder > on top of it soonish (I still think it's best if JSON encoding lies in > QObject like a kind of toString). If we'll need the asynchronous > parsing later, we can easily replace it with mine or Vincent's. One thing I want to add as a feature to the 0.12 release is a nice client API. To have this, we'll need message boundary identification and a JSON encoder. I'll focus on the message boundary identification today. I'd strongly suggest making the JSON encoder live outside of QObject. There are many possible ways to represent a QObject. Think of JSON as a view of the QObject model. The human monitor mode representation is a different view. Regards, Anthony Liguori > Paolo
Luiz Capitulino wrote: >> it got a raw/pretty printer, an interruptible parser (on the same idea >> as JSON_parser.c), it's faster than JSON_parser.c [1], >> it's completely generic (more like a library than an embedded thing), >> fully JSON compliant (got a test suite too), support >> user supplied alloc functions, and callback for integer/float doesn't >> have their data converted automatically which means >> that the user of the library can use whatever it want to support the >> non-limited size JSON number (or just return errors for user that want >> the limit). >> >> the library by itself is 39K with -g last time i've looked. >> > > Integration with QObjects is a killer feature, I think it's the > stronger argument against grabbing one from the internet. > I can't think of any reason why integration with qobject would take more than 50 lines of C on the user side of the library. since the API is completely SAX like (i call it SAJ for obvious reason), you get callback entering/leaving object/array and callback for every values (string, int, float, null, true, false) as a char * + length. for exactly the same reason, integration with glib would take the same 50 lines "effort". note that FTR, obviously i'ld like to have my library used, but i'm happy that any library that is *fully* JSON compliant is used (no extensions however since you're obviously loosing the benefit of using JSON if you create extensions).
On Sun, 18 Oct 2009 09:49:55 -0500 Anthony Liguori <anthony@codemonkey.ws> wrote: > Paolo Bonzini wrote: > > On 10/18/2009 04:06 PM, Luiz Capitulino wrote: > >> Integration with QObjects is a killer feature, I think it's the > >> stronger argument against grabbing one from the internet. > > > > Yeah, I'd say let's go with Anthony's stuff. I'll rebase the encoder > > on top of it soonish (I still think it's best if JSON encoding lies in > > QObject like a kind of toString). If we'll need the asynchronous > > parsing later, we can easily replace it with mine or Vincent's. > > One thing I want to add as a feature to the 0.12 release is a nice > client API. To have this, we'll need message boundary identification > and a JSON encoder. I'll focus on the message boundary identification > today. > > I'd strongly suggest making the JSON encoder live outside of QObject. > There are many possible ways to represent a QObject. Think of JSON as a > view of the QObject model. The human monitor mode representation is a > different view. I agree. QObject's methods should only be used/needed by the object layer itself, if the problem at hand handles high level data types (QInt, QDict, etc) then we need a new type. The right way to have what Paolo is suggesting, would be to have a toString() method in the object layer and allow it to be overridden.
>> I'd strongly suggest making the JSON encoder live outside of QObject. >> There are many possible ways to represent a QObject. Think of JSON as a >> view of the QObject model. The human monitor mode representation is a >> different view. My rationale was that since QObject is tailored over JSON, we might as well declare JSON to be "the" preferred view of the QObject model. The human monitor representation would be provided by qstring_format in my patches (and a QError method would call qstring_format in the appropriate way, returning a C string with the result). I think the different opinions is also due to different background; mine is in Smalltalk where class extensions---aka monkeypatching---are done in a different style than for example in Python. Adding a "write as escaped JSON" method to QString would be akin to monkeypatching. > I agree. > > QObject's methods should only be used/needed by the object layer itself, > if the problem at hand handles high level data types (QInt, QDict, etc) > then we need a new type. > > The right way to have what Paolo is suggesting, would be to have a > toString() method in the object layer and allow it to be overridden. That's exactly what I did in my patches, except I called it encode_json rather than toString. Paolo
On Sun, 18 Oct 2009 16:06:29 +0100 Vincent Hanquez <vincent@snarc.org> wrote: > Luiz Capitulino wrote: > >> it got a raw/pretty printer, an interruptible parser (on the same idea > >> as JSON_parser.c), it's faster than JSON_parser.c [1], > >> it's completely generic (more like a library than an embedded thing), > >> fully JSON compliant (got a test suite too), support > >> user supplied alloc functions, and callback for integer/float doesn't > >> have their data converted automatically which means > >> that the user of the library can use whatever it want to support the > >> non-limited size JSON number (or just return errors for user that want > >> the limit). > >> > >> the library by itself is 39K with -g last time i've looked. > >> > > > > Integration with QObjects is a killer feature, I think it's the > > stronger argument against grabbing one from the internet. > > > I can't think of any reason why integration with qobject would take more > than 50 lines of C on the user side of the library. > since the API is completely SAX like (i call it SAJ for obvious reason), > you get callback entering/leaving object/array > and callback for every values (string, int, float, null, true, false) as > a char * + length. for exactly the same reason, integration with glib > would take the same 50 lines "effort". No lines is a lot better than 50. :) The real problem though is that the parsers I looked at had their own "object model", some of them are quite simple others are more sophisticated than QObject. Making no use of any kind of intermediate representation like this is a feature, as things get simpler. Also, don't get me wrong, but if we would consider your parser we would have to consider the others two or three that are listed in json.org and have a compatible license. > note that FTR, obviously i'ld like to have my library used, but i'm > happy that any library that is *fully* JSON compliant is used (no > extensions however since you're obviously loosing the benefit of using > JSON if you create extensions). This is already settled, I hope.
On 10/18/2009 05:35 PM, Luiz Capitulino wrote: >> (no >> extensions however since you're obviously loosing the benefit of using >> JSON if you create extensions). > This is already settled, I hope. I think he's referring to things such as putting things such as single-quoted strings, or % escapes for formatting. I have no qualms with that as long as what goes on the wire is 100% JSON. Paolo
On Sun, 18 Oct 2009 17:25:47 +0200 Paolo Bonzini <bonzini@gnu.org> wrote: > > >> I'd strongly suggest making the JSON encoder live outside of QObject. > >> There are many possible ways to represent a QObject. Think of JSON as a > >> view of the QObject model. The human monitor mode representation is a > >> different view. > > My rationale was that since QObject is tailored over JSON, we might as > well declare JSON to be "the" preferred view of the QObject model. Maybe this makes sense today as the Monitor is the only heavy user of QObjects, but I don't think we should count on that. As things evolve, I believe more subsystems will start using QObjects and any "particular" view of it will make little sense. To be honest I don't know if this is good, I fear we will end up enhancing QObjects to the extreme to do OOP in QEMU... > The human monitor representation would be provided by qstring_format in > my patches (and a QError method would call qstring_format in the > appropriate way, returning a C string with the result). > > I think the different opinions is also due to different background; mine > is in Smalltalk where class extensions---aka monkeypatching---are done > in a different style than for example in Python. Adding a "write as > escaped JSON" method to QString would be akin to monkeypatching. True. > > I agree. > > > > QObject's methods should only be used/needed by the object layer itself, > > if the problem at hand handles high level data types (QInt, QDict, etc) > > then we need a new type. > > > > The right way to have what Paolo is suggesting, would be to have a > > toString() method in the object layer and allow it to be overridden. > > That's exactly what I did in my patches, except I called it encode_json > rather than toString. Okay, I just took a quick look at them and am looking at Anthony's right now. Anyway, my brainstorm on this would be to have to_string() and have default methods on all types to return a simple string representation. The QJson type could override to_string() if needed, this way specific json bits stays inside the json module. But I see that Anthony has added a qjson type already..
Anthony Liguori wrote: > Paolo Bonzini wrote: >> On 10/18/2009 04:06 PM, Luiz Capitulino wrote: >>> Integration with QObjects is a killer feature, I think it's the >>> stronger argument against grabbing one from the internet. >> >> Yeah, I'd say let's go with Anthony's stuff. I'll rebase the encoder >> on top of it soonish (I still think it's best if JSON encoding lies >> in QObject like a kind of toString). If we'll need the asynchronous >> parsing later, we can easily replace it with mine or Vincent's. > > One thing I want to add as a feature to the 0.12 release is a nice > client API. To have this, we'll need message boundary identification > and a JSON encoder. I'll focus on the message boundary identification > today. Here's a first pass. I'll clean up this afternoon and post a proper patch. It turned out to work pretty well. Regards, Anthony Liguori
Vincent Hanquez wrote: > I can't think of any reason why integration with qobject would take > more than 50 lines of C on the user side of the library. > since the API is completely SAX like (i call it SAJ for obvious > reason), you get callback entering/leaving object/array > and callback for every values (string, int, float, null, true, false) > as a char * + length. for exactly the same reason, integration with > glib would take the same 50 lines "effort". > > note that FTR, obviously i'ld like to have my library used, but i'm > happy that any library that is *fully* JSON compliant is used (no > extensions however since you're obviously loosing the benefit of using > JSON if you create extensions). We need two sets of extensions for use within qemu. Single quoted strings and varargs support. While single quoted strings would be easy to add to any library, vararg support is a bit more tricky as you need to carefully consider where you pop from the varargs list. A simple sprintf() isn't sufficient for embedding QObjects. When generating on-the-wire response traffic, we shouldn't use any of the extensions so it will be 100% json.org compliant. I'm pretty sure if you tried to duplicate the functionality of my patches, it would be much more than 50 lines. That's not saying it's a better json parser, just that we're looking for very particular features from it. Regards, Anthony Liguori
Luiz Capitulino wrote: > Okay, I just took a quick look at them and am looking at Anthony's > right now. > > Anyway, my brainstorm on this would be to have to_string() and have > default methods on all types to return a simple string representation. > What's the value of integrating into the objects verses having a separate function that can apply it to the objects? Prototype languages are very different and it's not typically a good idea to mix styles like this. Regards, Anthony Liguori
Anthony Liguori wrote: > Vincent Hanquez wrote: >> I can't think of any reason why integration with qobject would take >> more than 50 lines of C on the user side of the library. >> since the API is completely SAX like (i call it SAJ for obvious >> reason), you get callback entering/leaving object/array >> and callback for every values (string, int, float, null, true, false) >> as a char * + length. for exactly the same reason, integration with >> glib would take the same 50 lines "effort". >> >> note that FTR, obviously i'ld like to have my library used, but i'm >> happy that any library that is *fully* JSON compliant is used (no >> extensions however since you're obviously loosing the benefit of >> using JSON if you create extensions). > > We need two sets of extensions for use within qemu. Single quoted > strings and varargs support. While single quoted strings would be > easy to add to any library, vararg support is a bit more tricky as you > need to carefully consider where you pop from the varargs list. A > simple sprintf() isn't sufficient for embedding QObjects. care to explain what's a single quoted string and varargs support means in your context ? (just a simple example you do maybe ?) > When generating on-the-wire response traffic, we shouldn't use any of > the extensions so it will be 100% json.org compliant. great. > I'm pretty sure if you tried to duplicate the functionality of my > patches, it would be much more than 50 lines. That's not saying it's > a better json parser, just that we're looking for very particular > features from it. Since it doesn't appears to be linked to json particularly, I don't understand why it's a feature of the parser though.. and then any parser could grow the support you need on top of the parser couldn't they ?
Luiz Capitulino wrote: >> I can't think of any reason why integration with qobject would take more >> than 50 lines of C on the user side of the library. >> since the API is completely SAX like (i call it SAJ for obvious reason), >> you get callback entering/leaving object/array >> and callback for every values (string, int, float, null, true, false) as >> a char * + length. for exactly the same reason, integration with glib >> would take the same 50 lines "effort". >> > > No lines is a lot better than 50. :) > well it all depends on how you see thing; whilst i'm happy to help all sort of integration (qemu in this case), my library has been made for integrating with absolutely any object model. so 50 lines seems like a win to me, because I could do the same thing on a project that use glib, or some QT model using exactly the same engine. Hence the reason why i'm packaging it as a .a/.so library. (not that I particularly object to an embedded use case too). I think that's a win in the end when people can just reuse wheels instead of designing new one for catering for special needs. > The real problem though is that the parsers I looked at had their own > "object model", some of them are quite simple others are more sophisticated > than QObject. Making no use of any kind of intermediate representation like > this is a feature, as things get simpler. > > Also, don't get me wrong, but if we would consider your parser we > would have to consider the others two or three that are listed in > json.org and have a compatible license. > most of the parser there are either, weirdly licensed, have an object model integrated with it, are not interruptible, or are quite complex for no apparent reason; I carefully read all of them, before choosing to reimplement one from scratch.
Anthony Liguori wrote: > Here's a first pass. I'll clean up this afternoon and post a proper > patch. It turned out to work pretty well. It doesn't seems to validate anything ?? or is it just a lexer ? you're also including ' as a string escape value (is that the single quote thing you were talking about ?) which strikes me as invalid JSON.
On 10/18/2009 06:46 PM, Vincent Hanquez wrote: > care to explain what's a single quoted string and varargs support means > in your context ? (just a simple example you do maybe ?) single-quoted string: Being able to parse 'name' in addition to "name", which is convenient because in C the latter would be \"name\". varargs: Being able to call some external function when a %+letter sequence is found, which would fetch the key or value for an external source (for example a varargs list so that you can do a printf-style QObject factory function, where the template is itself written in JSON-like syntax). The important thing anyway is that the encoder is conservative (i.e. 100% valid JSON) in what it emits. This is something everybody totally agrees on. Paolo
On 10/18/2009 06:32 PM, Anthony Liguori wrote: > > What's the value of integrating into the objects verses having a > separate function that can apply it to the objects? That's just different style. Of course you could do a switch(qobject_type(qobject)) instead of using polymorphism. It would be nicer in some ways, and uglier in other ways. toString however seems pervasive enough that it could deserve a place as a QObject virtual method. Anyway, I probably won't have much code in QEMU in the end, so there's no value in arguing when anyway a very nice design is emerging. It looks like Anthony has most of the JSON plumbing in his brain, so it's better if he keeps the flow going. Feel free to steal my code. Once your stuff is settled I'll see what's missing and rebase/resend. Paolo, at one point tempted to s/encode_json/to_string/ and resubmit :-)
On Sun, Oct 18, 2009 at 12:32 PM, Vincent Hanquez <vincent@snarc.org> wrote: > Anthony Liguori wrote: >> >> Here's a first pass. I'll clean up this afternoon and post a proper >> patch. It turned out to work pretty well. > > It doesn't seems to validate anything ?? or is it just a lexer ? That's just a lexer. I posted a parser earlier. However, now I'm thinking I should update the parser to use the lexer. > you're also including ' as a string escape value (is that the single quote > thing you were talking about ?) which strikes me as invalid JSON. It's a compatible extension. We accept strings with those escapes but our encoder won't generate them. > -- > Vincent >
On Sun, 18 Oct 2009 11:32:11 -0500 Anthony Liguori <anthony@codemonkey.ws> wrote: > Luiz Capitulino wrote: > > Okay, I just took a quick look at them and am looking at Anthony's > > right now. > > > > Anyway, my brainstorm on this would be to have to_string() and have > > default methods on all types to return a simple string representation. > > > > What's the value of integrating into the objects verses having a > separate function that can apply it to the objects? Right now it doesn't have any real value, besides being a different style which seems to fit well with the QObject design. In the future it might be needed though, common code might want to change certain methods' behavior before passing QObjects down a call stack.
diff --git a/JSON_parser.c b/JSON_parser.c index 93e98c8..4c360be 100644 --- a/JSON_parser.c +++ b/JSON_parser.c @@ -151,6 +151,7 @@ enum classes { C_E, /* E */ C_ETC, /* everything else */ C_STAR, /* * */ + C_PCT, /* % - user escape */ NR_CLASSES }; @@ -165,7 +166,7 @@ static int ascii_class[128] = { __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, - C_SPACE, C_ETC, C_QUOTE, C_ETC, C_ETC, C_ETC, C_ETC, C_ETC, + C_SPACE, C_ETC, C_QUOTE, C_ETC, C_ETC, C_PCT, C_ETC, C_ETC, C_ETC, C_ETC, C_STAR, C_PLUS, C_COMMA, C_MINUS, C_POINT, C_SLASH, C_ZERO, C_DIGIT, C_DIGIT, C_DIGIT, C_DIGIT, C_DIGIT, C_DIGIT, C_DIGIT, C_DIGIT, C_DIGIT, C_COLON, C_ETC, C_ETC, C_ETC, C_ETC, C_ETC, @@ -239,7 +240,8 @@ enum actions ZX = -19, /* integer detected by zero */ IX = -20, /* integer detected by 1-9 */ EX = -21, /* next char is escaped */ - UC = -22 /* Unicode character read */ + UC = -22, /* Unicode character read */ + XC = -23, /* Escape to callback */ }; @@ -251,43 +253,43 @@ static int state_transition_table[NR_STATES][NR_CLASSES] = { state is OK and if the mode is MODE_DONE. white 1-9 ABCDF etc - space | { } [ ] : , " \ / + - . 0 | a b c d e f l n r s t u | E | * */ -/*start GO*/ {GO,GO,-6,__,-5,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, -/*ok OK*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, -/*object OB*/ {OB,OB,__,-9,__,__,__,__,SB,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, -/*key KE*/ {KE,KE,__,__,__,__,__,__,SB,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, -/*colon CO*/ {CO,CO,__,__,__,__,-2,__,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, -/*value VA*/ {VA,VA,-6,__,-5,__,__,__,SB,__,CB,__,MX,__,ZX,IX,__,__,__,__,__,FA,__,NU,__,__,TR,__,__,__,__,__}, -/*array AR*/ {AR,AR,-6,__,-5,-7,__,__,SB,__,CB,__,MX,__,ZX,IX,__,__,__,__,__,FA,__,NU,__,__,TR,__,__,__,__,__}, -/*string ST*/ {ST,__,ST,ST,ST,ST,ST,ST,-4,EX,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST}, -/*escape ES*/ {__,__,__,__,__,__,__,__,ST,ST,ST,__,__,__,__,__,__,ST,__,__,__,ST,__,ST,ST,__,ST,U1,__,__,__,__}, -/*u1 U1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,U2,U2,U2,U2,U2,U2,U2,U2,__,__,__,__,__,__,U2,U2,__,__}, -/*u2 U2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,U3,U3,U3,U3,U3,U3,U3,U3,__,__,__,__,__,__,U3,U3,__,__}, -/*u3 U3*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,U4,U4,U4,U4,U4,U4,U4,U4,__,__,__,__,__,__,U4,U4,__,__}, -/*u4 U4*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,UC,UC,UC,UC,UC,UC,UC,UC,__,__,__,__,__,__,UC,UC,__,__}, -/*minus MI*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,ZE,IT,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, -/*zero ZE*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,DF,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, -/*int IT*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,DF,IT,IT,__,__,__,__,DE,__,__,__,__,__,__,__,__,DE,__,__}, -/*frac FR*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,__,FR,FR,__,__,__,__,E1,__,__,__,__,__,__,__,__,E1,__,__}, -/*e E1*/ {__,__,__,__,__,__,__,__,__,__,__,E2,E2,__,E3,E3,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, -/*ex E2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,E3,E3,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, -/*exp E3*/ {OK,OK,__,-8,__,-7,__,-3,__,__,__,__,__,__,E3,E3,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, -/*tr T1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,T2,__,__,__,__,__,__,__}, -/*tru T2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,T3,__,__,__,__}, -/*true T3*/ {__,__,__,__,__,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,OK,__,__,__,__,__,__,__,__,__,__,__}, -/*fa F1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,F2,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, -/*fal F2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,F3,__,__,__,__,__,__,__,__,__}, -/*fals F3*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,F4,__,__,__,__,__,__}, -/*false F4*/ {__,__,__,__,__,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,OK,__,__,__,__,__,__,__,__,__,__,__}, -/*nu N1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,N2,__,__,__,__}, -/*nul N2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,N3,__,__,__,__,__,__,__,__,__}, -/*null N3*/ {__,__,__,__,__,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,OK,__,__,__,__,__,__,__,__,__}, -/*/ C1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,C2}, -/*/* C2*/ {C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C3}, -/** C3*/ {C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,CE,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C3}, -/*_. FX*/ {OK,OK,__,-8,__,-7,__,-3,__,__,__,__,__,__,FR,FR,__,__,__,__,E1,__,__,__,__,__,__,__,__,E1,__,__}, -/*\ D1*/ {__,__,__,__,__,__,__,__,__,D2,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, -/*\ D2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,U1,__,__,__,__}, + space | { } [ ] : , " \ / + - . 0 | a b c d e f l n r s t u | E | * % */ +/*start GO*/ {GO,GO,-6,__,-5,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,XC}, +/*ok OK*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, +/*object OB*/ {OB,OB,__,-9,__,__,__,__,SB,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,XC}, +/*key KE*/ {KE,KE,__,__,__,__,__,__,SB,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,XC}, +/*colon CO*/ {CO,CO,__,__,__,__,-2,__,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, +/*value VA*/ {VA,VA,-6,__,-5,__,__,__,SB,__,CB,__,MX,__,ZX,IX,__,__,__,__,__,FA,__,NU,__,__,TR,__,__,__,__,__,XC}, +/*array AR*/ {AR,AR,-6,__,-5,-7,__,__,SB,__,CB,__,MX,__,ZX,IX,__,__,__,__,__,FA,__,NU,__,__,TR,__,__,__,__,__,XC}, +/*string ST*/ {ST,__,ST,ST,ST,ST,ST,ST,-4,EX,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST}, +/*escape ES*/ {__,__,__,__,__,__,__,__,ST,ST,ST,__,__,__,__,__,__,ST,__,__,__,ST,__,ST,ST,__,ST,U1,__,__,__,__,__}, +/*u1 U1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,U2,U2,U2,U2,U2,U2,U2,U2,__,__,__,__,__,__,U2,U2,__,__,__}, +/*u2 U2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,U3,U3,U3,U3,U3,U3,U3,U3,__,__,__,__,__,__,U3,U3,__,__,__}, +/*u3 U3*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,U4,U4,U4,U4,U4,U4,U4,U4,__,__,__,__,__,__,U4,U4,__,__,__}, +/*u4 U4*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,UC,UC,UC,UC,UC,UC,UC,UC,__,__,__,__,__,__,UC,UC,__,__,__}, +/*minus MI*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,ZE,IT,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, +/*zero ZE*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,DF,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, +/*int IT*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,DF,IT,IT,__,__,__,__,DE,__,__,__,__,__,__,__,__,DE,__,__,__}, +/*frac FR*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,__,FR,FR,__,__,__,__,E1,__,__,__,__,__,__,__,__,E1,__,__,__}, +/*e E1*/ {__,__,__,__,__,__,__,__,__,__,__,E2,E2,__,E3,E3,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, +/*ex E2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,E3,E3,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, +/*exp E3*/ {OK,OK,__,-8,__,-7,__,-3,__,__,__,__,__,__,E3,E3,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, +/*tr T1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,T2,__,__,__,__,__,__,__,__}, +/*tru T2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,T3,__,__,__,__,__}, +/*true T3*/ {__,__,__,__,__,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,OK,__,__,__,__,__,__,__,__,__,__,__,__}, +/*fa F1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,F2,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, +/*fal F2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,F3,__,__,__,__,__,__,__,__,__,__}, +/*fals F3*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,F4,__,__,__,__,__,__,__}, +/*false F4*/ {__,__,__,__,__,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,OK,__,__,__,__,__,__,__,__,__,__,__,__}, +/*nu N1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,N2,__,__,__,__,__}, +/*nul N2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,N3,__,__,__,__,__,__,__,__,__,__}, +/*null N3*/ {__,__,__,__,__,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,OK,__,__,__,__,__,__,__,__,__,__}, +/*/ C1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,C2,__}, +/*/* C2*/ {C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C3,C2}, +/** C3*/ {C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,CE,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C3,C2}, +/*_. FX*/ {OK,OK,__,-8,__,-7,__,-3,__,__,__,__,__,__,FR,FR,__,__,__,__,E1,__,__,__,__,__,__,__,__,E1,__,__,__}, +/*\ D1*/ {__,__,__,__,__,__,__,__,__,D2,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__}, +/*\ D2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,U1,__,__,__,__,__}, }; @@ -680,7 +682,6 @@ JSON_parser_char(JSON_parser jc, int next_char) return false; } } - add_char_to_parse_buffer(jc, next_char, next_class); /* @@ -818,6 +819,29 @@ JSON_parser_char(JSON_parser jc, int next_char) jc->state = C1; jc->comment = 1; break; + +/* external callback */ + case XC: + parse_buffer_pop_back_char(jc); + switch (jc->stack[jc->top]) + { + case MODE_KEY: + jc->type = JSON_T_NONE; + jc->state = CO; + if (!(*jc->callback)(jc->ctx, JSON_T_USER_KEY, NULL)) { + return false; + } + break; + default: + jc->state = OK; + jc->type = JSON_T_NONE; + if (!(*jc->callback)(jc->ctx, JSON_T_USER, NULL)) { + return false; + } + break; + } + break; + /* empty } */ case -9: parse_buffer_clear(jc); diff --git a/JSON_parser.h b/JSON_parser.h index 3780aae..50cec2d 100644 --- a/JSON_parser.h +++ b/JSON_parser.h @@ -47,6 +47,8 @@ typedef enum JSON_T_FALSE, JSON_T_STRING, JSON_T_KEY, + JSON_T_USER, + JSON_T_USER_KEY, JSON_T_MAX } JSON_type; diff --git a/comments.json b/comments.json index 244f5e3..ad79ab0 100644 --- a/comments.json +++ b/comments.json @@ -113,4 +113,8 @@ 0.1e1, 1e-1, 1e00,2e+00,2e-00 -,"rosebud", "\u005C"]/** ******/ \ No newline at end of file +,"rosebud", "\u005C",/** %%% *%%%***%**/ +[%s], +{%d:%s}, +{"name":%s}, +%s]/** ******/ diff --git a/main.c b/main.c index 6651e12..226b125 100644 --- a/main.c +++ b/main.c @@ -29,6 +29,7 @@ int main(int argc, char* argv[]) { config.depth = 20; config.callback = &print; + config.callback_ctx = &input; config.allow_comments = 1; config.handle_floats_manually = 0; @@ -142,6 +143,16 @@ static int print(void* ctx, int type, const JSON_value* value) s_IsKey = 0; printf("string: '%s'\n", value->vu.str.value); break; + case JSON_T_USER_KEY: + s_IsKey = 1; + print_indention(); + printf("user key = %%%c, value = ", fgetc(*(FILE**) ctx)); + break; + case JSON_T_USER: + if (!s_IsKey) print_indention(); + s_IsKey = 0; + printf("user: %%%c\n", fgetc(*(FILE**) ctx)); + break; default: assert(0); break;