Bug Tracker

Opened 11 years ago

Closed 10 years ago

#4168 closed bug (worksforme)

U+2028 or \u2028 chokes jQuery JSON parser

Reported by: miyagawa Owned by:
Priority: major Milestone: 1.4
Component: ajax Version: 1.3.2
Keywords: Cc:
Blocked by: Blocking:

Description

Steps to reproduce:

echo '{"foo":"ba\u2028c"}' > /path/to/foo.json

<script> $.ajax({ url: "/path/to/foo.json", error: function(xhr, st, err) { alert(st) } }) </script>

Expected: No error Actual: gives "parseerror" error

My server side app generates JSON dynamically using the data in the database and some item has Unicode U+2028 character in its text, and the generated JSON would look like this (simplified):

{"foo":"ba\u2028r"}

jQuery's JSON parser uses built-in eval() function and Safari/Firefox can't parse this and they throw a Syntax Error. Other JSON libraries like Crockford's json_parse.js can parse this JSON successfully.

Note that double escaping \u to
u works around this issue, but that feels wrong and I don't think I should do this on the server side, since that causes other JSON libraries now has \u double-escaped.

Change History (11)

comment:1 in reply to:  description Changed 11 years ago by miyagawa

Replying to miyagawa:

Steps to reproduce:

echo '{"foo":"ba\u2028c"}' > /path/to/foo.json

Sorry I was a little confused.

The JSON in my app actually contains E2 80 A8, the UTF-8 representation of U+2028. By escaping this character to \u2028 on the server side this problem goes away. I will talk to JSON::XS author to escape it.

Close this bug. Thanks :)

comment:2 Changed 11 years ago by dmethvin

Resolution: invalid
Status: newclosed

comment:3 Changed 11 years ago by pcg

Resolution: invalid
Status: closedreopened

please investigate - the original poster was confused about where the bug is and thought the bug was in his generatpr, but in fact, it is in jquery.

verbatim U+2028 characters are valid in JSON strings, and if jquery doesn't parse such json texts correctly, it is simply buggy (or does not support JSON).

Note that if the parser cannot be fixed, and only fails for U+2028 and other line terminators valid in JSON but not in ecmascript, then a trivial pre-pass that replaces all those characters by their \u-escaped form before eval'ing it will fix the json parser. This works because U+2028 and other such characters are not valid syntax in JSON except as verbatim string characters, so is always safe to blindly replace them.

If there is a real parser, then fixing that one is likely more efficient.

comment:4 Changed 11 years ago by miyagawa

The original steps to reproduce was incorrect. \u2028 in JSON text strings are safe, but U+2028 (or its UTF-8 representation: 0xE2 0x80 0xA8 is.

  1. perl -e 'print qq({"foo":"ba\xe2\x80\xa8r"})' > /path/to/foo.json
  1. <script> $.ajax({ url: "/path/to/foo.json", error: function(xhr, st, err) { alert(st) } }) </script>

Expected: No error

Actual: gives "parseerror" error

comment:5 Changed 11 years ago by pcg

As an addition: hacking your own parser around eval is a bad idea - while the above regex solution works for ecmascript compliant implementations, real-world browsers often have issues with many other characters.

Best throw away home-grown eval hacks and use a proper json parser, for example, Dougles Crockford's json2.pl, which escapes many problematic characters and makes security checks on the JSON text before passing it to eval.

comment:6 Changed 11 years ago by dmethvin

I'm still fuzzy on the problem here. What is the Content-type reported back when /path/to/foo.json is retrieved? Is it specified to be in UTF-8?

comment:7 Changed 11 years ago by miyagawa

Yes, the Content-Type should be "application/json; charset=utf-8"

comment:8 Changed 11 years ago by pcg

the bug most likely is simply that jquery eval's json texts, but json is a superste of javascript syntax, so usign eval will not work in general.

maybe jquery doesn't support json, and instead users are required to send in valid javascript fragments instead of json. then the behaviour is correct.

but if jquery officially supports json, and only uses eval on it, it's simply buggy, as a json parser (such as json2.pl) is required to parse json.

comment:9 Changed 11 years ago by pcg

and regarding the content-type: the content-type for json is utf-8 by default.

the problem itself has nothing to do with the content-type, however (it is insubstantial whether the json document is transferred as utf-16 or utf-8 for example)

comment:11 Changed 10 years ago by dmethvin

Component: unfilledajax

comment:12 Changed 10 years ago by john

Milestone: 1.3.21.4
Resolution: worksforme
Status: reopenedclosed
Version: 1.3.11.3.2

Two quick points: I don't think that this is something that we're going to fix directly in jQuery core. There are a lot of weird little edge cases around JSON (especially if you want to validate it) and it's probably best to use a proper parser (like json2.js).

Thankfully in jQuery 1.4 we now use the native JSON parser (or the parser in json2.js, if it exists). All you have to do is include json2.js in your page and jQuery will use it automatically. Hope this helps.

Note: See TracTickets for help on using tickets.