Side navigation
#4168 closed bug (worksforme)
Opened February 16, 2009 10:20PM UTC
Closed December 05, 2009 01:50AM UTC
U+2028 or \\u2028 chokes jQuery JSON parser
Reported by: | miyagawa | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 1.4 |
Component: | ajax | Version: | 1.3.2 |
Keywords: | Cc: | ||
Blocked by: | Blocking: |
Description
Steps to reproduce:
echo '{"foo":"ba\\u2028c"}' > /path/to/foo.json
<script>
$.ajax({ url: "/path/to/foo.json", error: function(xhr, st, err) { alert(st) } })
</script>
Expected: No error
Actual: gives "parseerror" error
My server side app generates JSON dynamically using the data in the database and some item has Unicode U+2028 character in its text, and the generated JSON would look like this (simplified):
{"foo":"ba\\u2028r"}
jQuery's JSON parser uses built-in eval() function and Safari/Firefox can't parse this and they throw a Syntax Error. Other JSON libraries like Crockford's json_parse.js can parse this JSON successfully.
Note that double escaping \\u to \\\\u works around this issue, but that feels wrong and I don't think I should do this on the server side, since that causes other JSON libraries now has \\u double-escaped.
Attachments (0)
Change History (11)
Changed February 16, 2009 10:36PM UTC by comment:1
Changed February 16, 2009 11:56PM UTC by comment:2
resolution: | → invalid |
---|---|
status: | new → closed |
Changed February 17, 2009 08:48PM UTC by comment:3
resolution: | invalid |
---|---|
status: | closed → reopened |
please investigate - the original poster was confused about where the bug is and thought the bug was in his generatpr, but in fact, it is in jquery.
verbatim U+2028 characters are valid in JSON strings, and if jquery doesn't parse such json texts correctly, it is simply buggy (or does not support JSON).
Note that if the parser cannot be fixed, and only fails for U+2028 and other line terminators valid in JSON but not in ecmascript, then a trivial pre-pass that replaces all those characters by their \\u-escaped form before eval'ing it will fix the json parser. This works because U+2028 and other such characters are not valid syntax in JSON except as verbatim string characters, so is always safe to blindly replace them.
If there is a real parser, then fixing that one is likely more efficient.
Changed February 17, 2009 08:57PM UTC by comment:4
The original steps to reproduce was incorrect. \\u2028 in JSON text strings are safe, but U+2028 (or its UTF-8 representation: 0xE2 0x80 0xA8 is.
1. perl -e 'print qq({"foo":"ba\\xe2\\x80\\xa8r"})' > /path/to/foo.json
2. <script> $.ajax({ url: "/path/to/foo.json", error: function(xhr, st, err) { alert(st) } }) </script>
Expected: No error
Actual: gives "parseerror" error
Changed February 18, 2009 12:05AM UTC by comment:5
As an addition: hacking your own parser around eval is a bad idea - while the above regex solution works for ecmascript compliant implementations, real-world browsers often have issues with many other characters.
Best throw away home-grown eval hacks and use a proper json parser, for example, Dougles Crockford's json2.pl, which escapes many problematic characters and makes security checks on the JSON text before passing it to eval.
Changed February 18, 2009 12:11AM UTC by comment:6
I'm still fuzzy on the problem here. What is the Content-type reported back when /path/to/foo.json is retrieved? Is it specified to be in UTF-8?
Changed February 18, 2009 12:14AM UTC by comment:7
Yes, the Content-Type should be "application/json; charset=utf-8"
Changed February 21, 2009 04:13AM UTC by comment:8
the bug most likely is simply that jquery eval's json texts, but json is a superste of javascript syntax, so usign eval will not work in general.
maybe jquery doesn't support json, and instead users are required to send in valid javascript fragments instead of json. then the behaviour is correct.
but if jquery officially supports json, and only uses eval on it, it's simply buggy, as a json parser (such as json2.pl) is required to parse json.
Changed February 21, 2009 05:36AM UTC by comment:9
and regarding the content-type: the content-type for json is utf-8 by default.
the problem itself has nothing to do with the content-type, however (it is insubstantial whether the json document is transferred as utf-16 or utf-8 for example)
Changed October 14, 2009 03:01AM UTC by comment:10
component: | unfilled → ajax |
---|
Changed December 05, 2009 01:50AM UTC by comment:11
milestone: | 1.3.2 → 1.4 |
---|---|
resolution: | → worksforme |
status: | reopened → closed |
version: | 1.3.1 → 1.3.2 |
Two quick points: I don't think that this is something that we're going to fix directly in jQuery core. There are a lot of weird little edge cases around JSON (especially if you want to validate it) and it's probably best to use a proper parser (like json2.js).
Thankfully in jQuery 1.4 we now use the native JSON parser (or the parser in json2.js, if it exists). All you have to do is include json2.js in your page and jQuery will use it automatically. Hope this helps.
Replying to [ticket:4168 miyagawa]:
Sorry I was a little confused.
The JSON in my app actually contains E2 80 A8, the UTF-8 representation of U+2028. By escaping this character to \\u2028 on the server side this problem goes away. I will talk to JSON::XS author to escape it.
Close this bug. Thanks :)