Opened 14 years ago
Closed 13 years ago
#4168 closed bug (worksforme)
U+2028 or \u2028 chokes jQuery JSON parser
Reported by: | miyagawa | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 1.4 |
Component: | ajax | Version: | 1.3.2 |
Keywords: | Cc: | ||
Blocked by: | Blocking: |
Description
Steps to reproduce:
echo '{"foo":"ba\u2028c"}' > /path/to/foo.json
<script> $.ajax({ url: "/path/to/foo.json", error: function(xhr, st, err) { alert(st) } }) </script>
Expected: No error Actual: gives "parseerror" error
My server side app generates JSON dynamically using the data in the database and some item has Unicode U+2028 character in its text, and the generated JSON would look like this (simplified):
{"foo":"ba\u2028r"}
jQuery's JSON parser uses built-in eval() function and Safari/Firefox can't parse this and they throw a Syntax Error. Other JSON libraries like Crockford's json_parse.js can parse this JSON successfully.
Note that double escaping \u to
u works around this issue, but that feels wrong and I don't think I should do this on the server side, since that causes other JSON libraries now has \u double-escaped.
Change History (11)
comment:1 Changed 14 years ago by
comment:2 Changed 14 years ago by
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:3 Changed 14 years ago by
Resolution: | invalid |
---|---|
Status: | closed → reopened |
please investigate - the original poster was confused about where the bug is and thought the bug was in his generatpr, but in fact, it is in jquery.
verbatim U+2028 characters are valid in JSON strings, and if jquery doesn't parse such json texts correctly, it is simply buggy (or does not support JSON).
Note that if the parser cannot be fixed, and only fails for U+2028 and other line terminators valid in JSON but not in ecmascript, then a trivial pre-pass that replaces all those characters by their \u-escaped form before eval'ing it will fix the json parser. This works because U+2028 and other such characters are not valid syntax in JSON except as verbatim string characters, so is always safe to blindly replace them.
If there is a real parser, then fixing that one is likely more efficient.
comment:4 Changed 14 years ago by
The original steps to reproduce was incorrect. \u2028 in JSON text strings are safe, but U+2028 (or its UTF-8 representation: 0xE2 0x80 0xA8 is.
- perl -e 'print qq({"foo":"ba\xe2\x80\xa8r"})' > /path/to/foo.json
- <script> $.ajax({ url: "/path/to/foo.json", error: function(xhr, st, err) { alert(st) } }) </script>
Expected: No error
Actual: gives "parseerror" error
comment:5 Changed 14 years ago by
As an addition: hacking your own parser around eval is a bad idea - while the above regex solution works for ecmascript compliant implementations, real-world browsers often have issues with many other characters.
Best throw away home-grown eval hacks and use a proper json parser, for example, Dougles Crockford's json2.pl, which escapes many problematic characters and makes security checks on the JSON text before passing it to eval.
comment:6 Changed 14 years ago by
I'm still fuzzy on the problem here. What is the Content-type reported back when /path/to/foo.json is retrieved? Is it specified to be in UTF-8?
comment:8 Changed 14 years ago by
the bug most likely is simply that jquery eval's json texts, but json is a superste of javascript syntax, so usign eval will not work in general.
maybe jquery doesn't support json, and instead users are required to send in valid javascript fragments instead of json. then the behaviour is correct.
but if jquery officially supports json, and only uses eval on it, it's simply buggy, as a json parser (such as json2.pl) is required to parse json.
comment:9 Changed 14 years ago by
and regarding the content-type: the content-type for json is utf-8 by default.
the problem itself has nothing to do with the content-type, however (it is insubstantial whether the json document is transferred as utf-16 or utf-8 for example)
comment:11 Changed 13 years ago by
Component: | unfilled → ajax |
---|
comment:12 Changed 13 years ago by
Milestone: | 1.3.2 → 1.4 |
---|---|
Resolution: | → worksforme |
Status: | reopened → closed |
Version: | 1.3.1 → 1.3.2 |
Two quick points: I don't think that this is something that we're going to fix directly in jQuery core. There are a lot of weird little edge cases around JSON (especially if you want to validate it) and it's probably best to use a proper parser (like json2.js).
Thankfully in jQuery 1.4 we now use the native JSON parser (or the parser in json2.js, if it exists). All you have to do is include json2.js in your page and jQuery will use it automatically. Hope this helps.
Replying to miyagawa:
Sorry I was a little confused.
The JSON in my app actually contains E2 80 A8, the UTF-8 representation of U+2028. By escaping this character to \u2028 on the server side this problem goes away. I will talk to JSON::XS author to escape it.
Close this bug. Thanks :)