Ticket #4168 (closed bug: worksforme)
U+2028 or \u2028 chokes jQuery JSON parser
| Reported by: | miyagawa | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | 1.4 |
| Component: | ajax | Version: | 1.3.2 |
| Keywords: | Cc: | ||
| Blocking: | Blocked by: |
Description
Steps to reproduce:
echo '{"foo":"ba\u2028c"}' > /path/to/foo.json
<script> $.ajax({ url: "/path/to/foo.json", error: function(xhr, st, err) { alert(st) } }) </script>
Expected: No error Actual: gives "parseerror" error
My server side app generates JSON dynamically using the data in the database and some item has Unicode U+2028 character in its text, and the generated JSON would look like this (simplified):
{"foo":"ba\u2028r"}
jQuery's JSON parser uses built-in eval() function and Safari/Firefox can't parse this and they throw a Syntax Error. Other JSON libraries like Crockford's json_parse.js can parse this JSON successfully.
Note that double escaping \u to
u works around this issue, but that feels wrong and I don't think I should do this on the server side, since that causes other JSON libraries now has \u double-escaped.
Change History
comment:1 in reply to: ↑ description Changed 4 years ago by miyagawa
comment:2 Changed 4 years ago by dmethvin
- Status changed from new to closed
- Resolution set to invalid
comment:3 Changed 4 years ago by pcg
- Status changed from closed to reopened
- Resolution invalid deleted
please investigate - the original poster was confused about where the bug is and thought the bug was in his generatpr, but in fact, it is in jquery.
verbatim U+2028 characters are valid in JSON strings, and if jquery doesn't parse such json texts correctly, it is simply buggy (or does not support JSON).
Note that if the parser cannot be fixed, and only fails for U+2028 and other line terminators valid in JSON but not in ecmascript, then a trivial pre-pass that replaces all those characters by their \u-escaped form before eval'ing it will fix the json parser. This works because U+2028 and other such characters are not valid syntax in JSON except as verbatim string characters, so is always safe to blindly replace them.
If there is a real parser, then fixing that one is likely more efficient.
comment:4 Changed 4 years ago by miyagawa
The original steps to reproduce was incorrect. \u2028 in JSON text strings are safe, but U+2028 (or its UTF-8 representation: 0xE2 0x80 0xA8 is.
- perl -e 'print qq({"foo":"ba\xe2\x80\xa8r"})' > /path/to/foo.json
- <script> $.ajax({ url: "/path/to/foo.json", error: function(xhr, st, err) { alert(st) } }) </script>
Expected: No error
Actual: gives "parseerror" error
comment:5 Changed 4 years ago by pcg
As an addition: hacking your own parser around eval is a bad idea - while the above regex solution works for ecmascript compliant implementations, real-world browsers often have issues with many other characters.
Best throw away home-grown eval hacks and use a proper json parser, for example, Dougles Crockford's json2.pl, which escapes many problematic characters and makes security checks on the JSON text before passing it to eval.
comment:6 Changed 4 years ago by dmethvin
I'm still fuzzy on the problem here. What is the Content-type reported back when /path/to/foo.json is retrieved? Is it specified to be in UTF-8?
comment:7 Changed 4 years ago by miyagawa
Yes, the Content-Type should be "application/json; charset=utf-8"
comment:8 Changed 4 years ago by pcg
the bug most likely is simply that jquery eval's json texts, but json is a superste of javascript syntax, so usign eval will not work in general.
maybe jquery doesn't support json, and instead users are required to send in valid javascript fragments instead of json. then the behaviour is correct.
but if jquery officially supports json, and only uses eval on it, it's simply buggy, as a json parser (such as json2.pl) is required to parse json.
comment:9 Changed 4 years ago by pcg
and regarding the content-type: the content-type for json is utf-8 by default.
the problem itself has nothing to do with the content-type, however (it is insubstantial whether the json document is transferred as utf-16 or utf-8 for example)
comment:12 Changed 3 years ago by john
- Status changed from reopened to closed
- Version changed from 1.3.1 to 1.3.2
- Resolution set to worksforme
- Milestone changed from 1.3.2 to 1.4
Two quick points: I don't think that this is something that we're going to fix directly in jQuery core. There are a lot of weird little edge cases around JSON (especially if you want to validate it) and it's probably best to use a proper parser (like json2.js).
Thankfully in jQuery 1.4 we now use the native JSON parser (or the parser in json2.js, if it exists). All you have to do is include json2.js in your page and jQuery will use it automatically. Hope this helps.
Please follow the bug reporting guidlines and use jsFiddle when providing test cases and demonstrations instead of pasting the code in the ticket.

Replying to miyagawa:
Sorry I was a little confused.
The JSON in my app actually contains E2 80 A8, the UTF-8 representation of U+2028. By escaping this character to \u2028 on the server side this problem goes away. I will talk to JSON::XS author to escape it.
Close this bug. Thanks :)