Wednesday, August 21, 2013

JSON parsing in JavaScript

Recently I've has some blocking issues with some code I wrote a while ago. A colleague tried to re-use that code with a different back-end and he kept experiencing the exception:
JSON.parse: unexpected character
Using Firebug, I promptly inspected the JSON content that was retrieved from the new service and by copying and pasting it into http://jsonlint.com/ it seemed to be valid JSON. However, the exception was still there and clearly indicated an issue with the format of the incoming content. I therefore inspected the code of both client and server and it turns out that (1) the old JavaScript snippet was using the eval function and (2) the new back-end, mimicking what my old testing code was doing, was generating through a servlet some testing JSON just by concatenating some strings and serializing the results as JSON.

The serialization of the service was including some '\n' (new lines) that, in the old ecosystem, improved the visualization of the JSON content apparently without disrupting the activity of the eval function. The serialized content included also some dates in the format 'MM/dd/yyyy HH:mm:ss Z'.
The eval function invokes the JavaScript compiler. Since JSON is a proper subset of JavaScript, the compiler will correctly parse the text and produce an object structure (read more on json.org).
Strangely the same code that was working fine in my configuration, was failing in the configuration used by my colleague that seemed failing while interpreting the format of the dates.

As I had a similar problem months ago, I've decided therefore to move away from the eval function:
    In the old GWT code, making use of JNI:

    public static native JavaScriptObject parseJson(String jsonStr) /*-{
        return eval('(' + jsonStr + ')');
    }

    The equivalent in pure JavaScript would be (remember the parenthesis 
    as they turn the code into an expression that returns, rather than 
    just code to run):

    function parseJson(jsonStr) {
         return eval('(' + jsonStr + ')');
    }
to use the more recent JSON.parse which provides validation of the JSON content unlike eval that is faster but allows the string being parsed to contain absolutely anything including function calls.
Native JSON support is included in newer browsers and in the newest ECMAScript (JavaScript) standard. Similar features were already available with some JS libraries such as JQuery (http://api.jquery.com/jQuery.parseJSON/).
See http://www.w3schools.com/json/json_eval.asp for browser and software support.
When using the JSON.parse it is however necessary to escape the control characters.
According to section 2.5 of the JSON spec at ietf.org/rfc/rfc4627.txt: "All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F)."
In that specific testing case the newline character could have been present in both the JSON values and in between JSON elements.
    // 1) Example of JSON with newline in the value
    [{"name":"Paolo Ciccarese \n"}]

    // 2) Example of JSON with newline between elements
    [\n{"name":"Paolo Ciccarese"}]
The above case 1) can be addressed by escaping the content (for instance replacing '\n' with '\\n'. The case 2) is illegal in JSON and would work only if using the eval function.
In my case the servlets were generating the '\n' because of the use of out.println instead of the harmless out.print
So in my case the parseJSON function became:
    In the old GWT code, making use of JNI:

    public static native JavaScriptObject parseJson(String jsonStr) /*-{
        try {
            var jsonStr = jsonStr      
                .replace(/[\\]/g, '\\\\')
                .replace(/[\/]/g, '\\/')
                .replace(/[\b]/g, '\\b')
                .replace(/[\f]/g, '\\f')
                .replace(/[\n]/g, '\\n')
                .replace(/[\r]/g, '\\r')
                .replace(/[\t]/g, '\\t')
                .replace(/\\'/g, "\\'");
            return JSON.parse(jsonStr);
        } catch (e) {
            alert("Error while parsing the JSON message: " + e);
        }
    }-*/;

    In pure JavaScript would be:

    function parseJson(jsonStr) {
        try {
            var jsonStr = jsonStr      
                .replace(/[\\]/g, '\\\\')
                .replace(/[\/]/g, '\\/')
                .replace(/[\b]/g, '\\b')
                .replace(/[\f]/g, '\\f')
                .replace(/[\n]/g, '\\n')
                .replace(/[\r]/g, '\\r')
                .replace(/[\t]/g, '\\t')
                .replace(/\\'/g, "\\'");
            return JSON.parse(jsonStr);
        } catch (e) {
            alert("Error while parsing the JSON message: " + e);
        }
    }
  
Therefore in the case:
    [{"name":"Paolo Ciccarese \n"}]
the newline is not interpreted as a newline in the JSON source anymore but as a newline in the JSON data, which is perfectly fine.

In conclusion, I am still not sure why the dates interpretation was failing with the eval method. But with the JSON.parse approach the problem is gone.

I also found this: 'A fast and secure JSON parser in JavaScript'. I did not have time to check out yet but it is promising: ' does not attempt to validate the JSON, so may return a surprising result given a syntactically invalid input, but it does not use eval so is deterministic and is guaranteed not to modify any object other than its return value'.