Bug Tracker

Opened 11 years ago

Closed 8 years ago

Last modified 7 years ago

#3611 closed bug (patchwelcome)

Wrong codepage in URL (and in post data) when not UTF-8

Reported by: samsol Owned by: samsol
Priority: major Milestone:
Component: ajax Version: 1.2.6
Keywords: charset, url, ajax, encodeURIComponent, ajaxrewrite Cc: samsol, jaubourg
Blocked by: Blocking:

Description

When html page have charset different from UTF-8 ajax functions do wrong requests. QUERY_STRING parameters encoded with UTF-8, while plain form's Submiting encode it in page's charset.

Test case.

file test.cgi

#!/usr/bin/perl

print "Content-Type: text/html; charset=Windows-1251\n";
print "Cache-control: private\n";
print "\n";

print << "EOL";
<html>
  <head>
    <script language='javascript' src='jquery-1.2.6.js'></script>
  </head>
  <body>
    <form>
      <input type='text' name='x' id='x'>
      <input type='submit'>
      <a href="javascript:putRussianFLetter()">Put Russian F letter</a>
    </form>
$ENV{'QUERY_STRING'}
    <button onclick='doClick()'>JQ</button>
    <script language='javascript'>
    function doClick() {
      \$.get('test.cgi', {x:\$('#x')[0].value, anticache:Math.random()},
          function( data ){\$('#out').html(data);});
    }
    function putRussianFLetter() {
      \$('#x')[0].value = '\\u0424';
    }
    </script>
    <div id='out'></div>
  </body>
</html>
EOL
  • 1. Navigate to http://your_address_here/test.cgi
  • 2. Ensure that browser detected charset Cyrilic (Windows-1251)
  • 3. Click on "Put Russian F letter" link
  • 4. Ensure text box has 1 char string
  • 5. Click on "Submit Query" button.
  • 6. Ensure x=%D4 text is appeared on the page
  • 7. Click on "Put Russian F letter" link once again
  • 8. Ensure text box has 1 letter
  • 9. Click on "JQ" button
  • 10. Wait, when copy of the page appeared bottom of the page
  • 11. Text x=%D0%A4 appeared, while expected x=%D4

Browsers tested

  • Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080208 Fedora/2.0.0.12-1.fc7 Firefox/2.0.0.12
  • Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.6 (like Gecko)

Attachments (1)

test.cgi (800 bytes) - added by samsol 11 years ago.

Download all attachments as: .zip

Change History (15)

Changed 11 years ago by samsol

Attachment: test.cgi added

comment:1 Changed 11 years ago by samsol

Workaround:

Because JQuery set X-Requested-With request header

xhr.setRequestHeader("X-Requested-With", "XMLHttpRequest");

we can use it to detect Jquery requests on server side

Example java servlet

if("XMLHttpRequest".equals(req.getHeader("X-Requested-With"))) {
    try {
        req.setCharacterEncoding("UTF-8");
    } catch(UnsupportedEncodingException e){
        throw new ServletException( e );
    }
}

in Perl you have to use

if($ENV{'HTTP_X_REQUESTED_WITH'} eq 'XMLHttpRequest'){
    # UTF-8 encoding
}else{
    # your default encoding
}

comment:2 Changed 11 years ago by dmethvin

Cc: samsol added

http://code.google.com/p/browsersec/wiki/Part1#Unicode_in_URLs

The table there says that plain links are always encoded in UTF-8, but XMLHttpRequest should use the current page encoding. I wonder whether we might confuse issues more by trying to follow that. As it stands now, jQuery always uses UTF-8 and it's just a question of whether the string created is sent via POST or GET.

Note that the use of UTF-8 for the body on a POST is non-negotiable; I believe that some browsers like Firefox will force it back if you try to override it. http://www.w3.org/TR/XMLHttpRequest/#send "data is a DOMString: Encode data using UTF-8 for transmission."

What is the impact of using UTF-8 in the URL?

comment:3 Changed 11 years ago by samsol

Try to run test case.

The problem that I have found is...

Server side code (test.cgi) is undergo by different HTTP_1_1_Requests* made from the same html page.

HTTP_1_1_Request made by submitting html form NOT EQUALS to HTTP_1_1_Request made by calling jQuery function $.get

I'm talking about query part only of http-url (or body of POST requests).

http_URL = "http:" "" host [ ":" port ] [ abs_path [ "?" query ]]

<form method="POST" enctype="application/x-www-form-urlencoded"> page encoding
$.post UTF-8 encoding
<form method="GET"> page encoding
$.get UTF-8 encoding

Run test case to understand!


*HTTP_1_1_Request - message sent from client to server. See chapter 5 in http://www.ietf.org/rfc/rfc2616.txt

comment:4 Changed 10 years ago by dmethvin

More information:

http://xkr.us/articles/javascript/encode-compare/

To get the character encoding right, it *seems* like the right thing to do is use escape() for the URL and any data added to the URL (e.g., data in a GET request). However, escape() only handles ISO-8859-1, and doesn't encode spaces, so it sounds like we'd need to do our own encoding.

See also #4315 which is a duplicate of this bug.

comment:5 Changed 10 years ago by samsol

Looks like it is standard's collision. Up to DOM Level 3 it is no ability to detect document's input encoding (as I know). But it is possible since Level 3. http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Document3-inputEncoding In other hand we have http://www.ecma-international.org/publications/standards/Ecma-262.htm section 15.1.3 - it strictly define UTF-8 for encodeURIComponent.

So it is possible to fix this bug using:

if( document.inputEncoding ) {
  // DOM Level 3 Only
  part = this.encodeURIComponentWithPropperEncoding(document.inputEncoding, raw );
} else {
  part = encodeURIComponent( raw );
}

It is still too hard to create encodeURIComponentWithPropperEncoding.

Dear developer! When you will fix this bug (if you will), please change xhr.setRequestHeader("X-Requested-With", "XMLHttpRequest") make something like xhr.setRequestHeader("X-Requested-With", "XMLHttpRequest.v2")

comment:6 Changed 9 years ago by dmethvin

Status: newopen

comment:7 Changed 9 years ago by Rick Waldron

Keywords: ajaxrewrite added

comment:8 Changed 8 years ago by timmywil

Cc: jaubourg added

Please confirm this bug still exists in jQuery 1.6b1.

comment:9 Changed 8 years ago by john

Milestone: 1.2
Owner: set to samsol
Status: openpending

comment:10 Changed 8 years ago by trac-o-bot

Resolution: invalid
Status: pendingclosed

Because we get so many tickets, we often need to return them to the initial reporter for more information. If that person does not reply within 14 days, the ticket will automatically be closed, and that has happened in this case. If you still are interested in pursuing this issue, feel free to add a comment with the requested information and we will be happy to reopen the ticket if it is still valid. Thanks!

comment:11 Changed 8 years ago by anonymous

I'm using jQuery JavaScript Library v1.6 Date: Mon May 2 13:50:00 2011 -0400 and the bug ist still existing.

comment:12 Changed 8 years ago by sam@…

This bug still exists in jQuery v1.7.1, as of Jan 26, 2012.

comment:13 Changed 8 years ago by dmethvin

Resolution: invalid
Status: closedreopened

comment:14 Changed 8 years ago by dmethvin

Resolution: patchwelcome
Status: reopenedclosed

This was closed invalid due to a lack of response from the OP, but in practical terms I don't think we can fix this due to the conflict in the standards involved. XHR wants UTF-8, period. If anyone has a solution we could consider it.

Note: See TracTickets for help on using tickets.