Skip to main content

Bug Tracker

Side navigation

#3611 closed bug (patchwelcome)

Opened November 14, 2008 07:24PM UTC

Closed January 26, 2012 03:01PM UTC

Last modified March 14, 2012 06:07PM UTC

Wrong codepage in URL (and in post data) when not UTF-8

Reported by: samsol Owned by: samsol
Priority: major Milestone:
Component: ajax Version: 1.2.6
Keywords: charset,url,ajax,encodeURIComponent,ajaxrewrite Cc: samsol, jaubourg
Blocked by: Blocking:
Description

When html page have charset different from UTF-8 ajax functions do wrong requests. QUERY_STRING parameters encoded with UTF-8, while plain form's Submiting encode it in page's charset.

Test case.

file test.cgi

#!/usr/bin/perl

print "Content-Type: text/html; charset=Windows-1251\\n";
print "Cache-control: private\\n";
print "\\n";

print << "EOL";
<html>
  <head>
    <script language='javascript' src='jquery-1.2.6.js'></script>
  </head>
  <body>
    <form>
      <input type='text' name='x' id='x'>
      <input type='submit'>
      <a href="javascript:putRussianFLetter()">Put Russian F letter</a>
    </form>
$ENV{'QUERY_STRING'}
    <button onclick='doClick()'>JQ</button>
    <script language='javascript'>
    function doClick() {
      \\$.get('test.cgi', {x:\\$('#x')[0].value, anticache:Math.random()},
          function( data ){\\$('#out').html(data);});
    }
    function putRussianFLetter() {
      \\$('#x')[0].value = '\\\\u0424';
    }
    </script>
    <div id='out'></div>
  </body>
</html>
EOL
  • 1. Navigate to http://your_address_here/test.cgi
  • 2. Ensure that browser detected charset Cyrilic (Windows-1251)
  • 3. Click on "Put Russian F letter" link
  • 4. Ensure text box has 1 char string
  • 5. Click on "Submit Query" button.
  • 6. Ensure x=%D4 text is appeared on the page
  • 7. Click on "Put Russian F letter" link once again
  • 8. Ensure text box has 1 letter
  • 9. Click on "JQ" button
  • 10. Wait, when copy of the page appeared bottom of the page
  • 11. Text x=%D0%A4 appeared, while expected x=%D4

Browsers tested

  • Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080208 Fedora/2.0.0.12-1.fc7 Firefox/2.0.0.12
  • Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.6 (like Gecko)
Attachments (1)
  • test.cgi (0.8 KB) - added by samsol November 14, 2008 07:25PM UTC.
Change History (14)

Changed November 24, 2008 05:46PM UTC by samsol comment:1

Workaround:

Because JQuery set X-Requested-With request header

xhr.setRequestHeader("X-Requested-With", "XMLHttpRequest");

we can use it to detect Jquery requests on server side

Example java servlet

if("XMLHttpRequest".equals(req.getHeader("X-Requested-With"))) {
    try {
        req.setCharacterEncoding("UTF-8");
    } catch(UnsupportedEncodingException e){
        throw new ServletException( e );
    }
}

in Perl you have to use

if($ENV{'HTTP_X_REQUESTED_WITH'} eq 'XMLHttpRequest'){
    # UTF-8 encoding
}else{
    # your default encoding
}

Changed January 08, 2009 03:42AM UTC by dmethvin comment:2

cc: → samsol

http://code.google.com/p/browsersec/wiki/Part1#Unicode_in_URLs

The table there says that plain links are always encoded in UTF-8, but XMLHttpRequest should use the current page encoding. I wonder whether we might confuse issues more by trying to follow that. As it stands now, jQuery always uses UTF-8 and it's just a question of whether the string created is sent via POST or GET.

Note that the use of UTF-8 for the body on a POST is non-negotiable; I believe that some browsers like Firefox will force it back if you try to override it.

http://www.w3.org/TR/XMLHttpRequest/#send

"data is a DOMString: Encode data using UTF-8 for transmission."

What is the impact of using UTF-8 in the URL?

Changed January 09, 2009 02:12AM UTC by samsol comment:3

Try to run test case.

The problem that I have found is...

Server side code (test.cgi) is undergo by different HTTP_1_1_Requests* made from the same html page.

HTTP_1_1_Request made by submitting html form NOT EQUALS to HTTP_1_1_Request made by calling jQuery function $.get

I'm talking about ''query'' part only of http-url (or body of POST requests).

http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" ''query'' ]]

||<form method="POST" enctype="application/x-www-form-urlencoded"> || page encoding ||

||$.post || UTF-8 encoding ||

||<form method="GET"> || page encoding ||

||$.get || UTF-8 encoding ||

Run test case to understand!


*HTTP_1_1_Request - message sent from client to server. See chapter 5 in http://www.ietf.org/rfc/rfc2616.txt

Changed March 21, 2009 05:27PM UTC by dmethvin comment:4

More information:

http://xkr.us/articles/javascript/encode-compare/

To get the character encoding right, it *seems* like the right thing to do is use

escape()
for the URL and any data added to the URL (e.g., data in a GET request). However,
escape()
only handles ISO-8859-1, and doesn't encode spaces, so it sounds like we'd need to do our own encoding.

See also #4315 which is a duplicate of this bug.

Changed March 21, 2009 10:32PM UTC by samsol comment:5

Looks like it is standard's collision.

Up to DOM Level 3 it is no ability to detect document's input encoding (as I know).

But it is possible since Level 3. http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Document3-inputEncoding

In other hand we have http://www.ecma-international.org/publications/standards/Ecma-262.htm section 15.1.3 - it strictly define UTF-8 for encodeURIComponent.

So it is possible to fix this bug using:

if( document.inputEncoding ) {
  // DOM Level 3 Only
  part = this.encodeURIComponentWithPropperEncoding(document.inputEncoding, raw );
} else {
  part = encodeURIComponent( raw );
}

It is still too hard to create encodeURIComponentWithPropperEncoding.

Dear developer! When you will fix this bug (if you will), please change

xhr.setRequestHeader("X-Requested-With", "XMLHttpRequest")

make something like

xhr.setRequestHeader("X-Requested-With", "XMLHttpRequest.v2")

Changed November 17, 2010 03:45AM UTC by dmethvin comment:6

status: newopen

Changed December 27, 2010 10:36PM UTC by rwaldron comment:7

keywords: charset url ajax encodeURIComponentcharset,url,ajax,encodeURIComponent,ajaxrewrite

Changed April 17, 2011 05:21PM UTC by timmywil comment:8

cc: samsolsamsol, jaubourg

Please confirm this bug still exists in jQuery 1.6b1.

Changed April 17, 2011 05:42PM UTC by john comment:9

milestone: 1.2
owner: → samsol
status: openpending

Changed May 02, 2011 07:59AM UTC by trac-o-bot comment:10

resolution: → invalid
status: pendingclosed

Because we get so many tickets, we often need to return them to the initial reporter for more information. If that person does not reply within 14 days, the ticket will automatically be closed, and that has happened in this case. If you still are interested in pursuing this issue, feel free to add a comment with the requested information and we will be happy to reopen the ticket if it is still valid. Thanks!

Changed May 27, 2011 12:10PM UTC by anonymous comment:11

I'm using jQuery JavaScript Library v1.6 Date: Mon May 2 13:50:00 2011 -0400 and the bug ist still existing.

Changed January 25, 2012 10:08PM UTC by sam@digitalfusion.co.nz comment:12

This bug still exists in jQuery v1.7.1, as of Jan 26, 2012.

Changed January 26, 2012 03:00PM UTC by dmethvin comment:13

resolution: invalid
status: closedreopened

Changed January 26, 2012 03:01PM UTC by dmethvin comment:14

resolution: → patchwelcome
status: reopenedclosed

This was closed invalid due to a lack of response from the OP, but in practical terms I don't think we can fix this due to the conflict in the standards involved. XHR wants UTF-8, period. If anyone has a solution we could consider it.