Side navigation
#11773 closed bug (plugin)
Opened May 15, 2012 07:10PM UTC
Closed July 12, 2012 01:01AM UTC
Last modified June 09, 2014 04:30AM UTC
jQuery needs HTML escaping functionality
Reported by: | walters@verbum.org | Owned by: | |
---|---|---|---|
Priority: | undecided | Milestone: | None |
Component: | manipulation | Version: | 1.7.2 |
Keywords: | Cc: | ||
Blocked by: | Blocking: |
Description
Many of the jQuery example snippets are unsafe to use with untrusted data.
The one that caused me to file this issue is:
http://jquerymobile.com/demos/1.0.1/docs/pages/page-dynamic.html
These specific lines are unsafe:
markup = "<p>" + category.description + "</p><ul data-role='listview' data-inset='true'>",
markup += "<li>" + cItems[i].name + "</li>";
$header.find( "h1" ).html( category.name );
There's a lot of information on the Internet about this;
http://code.google.com/p/doctype-mirror/wiki/ArticleXSSInBodyText
is pretty good.
We need an implementation of that included in the core, and to fix the examples to use it.
Attachments (0)
Change History (22)
Changed May 15, 2012 07:12PM UTC by comment:1
Changed May 15, 2012 07:16PM UTC by comment:2
resolution: | → plugin |
---|---|
status: | new → closed |
There is a syntax issue - you have a comma at the end of this line
markup = "<p>" + category.description + "</p><ul data-role='listview' data-inset='true'>",
But that's irrelevant to you ticket. This can and should be a plugin.
Changed May 15, 2012 07:17PM UTC by comment:3
For the example you've reference in the Mobile docs, it seems perfectly safe since the input is trusted (it's in the code). I understand your point but there are several other solutions including DOM element creation and good templating systems that escape the HTML.
Note that the *whole point* of .html()
is to insert HTML (including scripts that execute) into the document, so we can't change that behavior.
There is some related discussion on #9521.
Changed May 15, 2012 07:26PM UTC by comment:4
Replying to [comment:3 dmethvin]:
For the example you've reference in the Mobile docs, it seems perfectly safe since the input is trusted (it's in the code). I understand your point but there are several other solutions including DOM element creation and good templating systems that escape the HTML.
Yes, but in general, people follow the path of least resistance. If the jQuery examples don't escape because the input is trusted, things will appear to work for someone who takes the example and uses it on untrusted data. Except their code will be exploitable, or at least unreliable.
Note that the *whole point* of .html()
is to insert HTML (including scripts that execute) into the document, so we can't change that behavior.
Right. I'm not suggesting removing or changing .html()/.append() or the variety of functions that accept HTML currently in jQuery. This bug is more about:
1) There should be an official in-jQuery API to escape HTML (both body text and attributes)
2) That function should be used in the examples, even if not strictly necessary
There is some related discussion on #9521.
Yeah, related, but this bug is about how jQuery *encourages* using .html() but doesn't supply an escaping function.
Changed May 15, 2012 07:40PM UTC by comment:5
Are you asking for a function that does $( "<a>" ).text( str ).html()
?
Changed May 15, 2012 07:53PM UTC by comment:6
Replying to [comment:5 scott.gonzalez]:
Are you asking for a function that does $( "<a>" ).text( str ).html()
?
Yes. I'd attach the patch I wrote, but I can't do it while the bug is closed, I guess.
Here's a page which benchmarks multiple implementations:
http://jsperf.com/htmlencoderegex
It points to the regex replace as being the fastest.
Changed May 15, 2012 08:03PM UTC by comment:7
_comment0: | A string replace is safer for arbitrary HTML since the `<div>` or other container element may not be able to accept any markup you give it, and it may mangle the white space as well. \ \ > to escape HTML (both body text and attributes) \ \ What do you mean by that? It's just a string and we're just replacing `< > % " '` without parsing anything. Or are you referring to its use in examples? \ → 1337112270131669 |
---|
A string replace is safer for arbitrary HTML since the <div>
or other container element may not be able to accept any markup you give it, and it may mangle the white space as well.
to escape HTML (both body text and attributes)
What do you mean by that? It's just a string and we're just replacing < > & " '
without parsing anything. Or are you referring to its use in examples?
Changed May 15, 2012 08:13PM UTC by comment:8
Replying to [comment:7 dmethvin]:
What do you mean by that? It's just a string and we're just replacing < > & " '
without parsing anything.
Right. What I was getting at is that ' and " don't need to be replaced in HTML body context, but do for HTML attributes.
This page explains it:
http://code.google.com/p/doctype-mirror/wiki/ArticleXSSInBodyText
Specifically: "It's not strictly necessary to escape the quotes in this context; however this will be necessary in other contexts, and it's easiest to use the same escaping function everywhere. "
Basically what I'm suggesting we DON'T do is use the $('<div/>').text(html).html(); trick because it's not safe to use for HTML attributes.
Say I'm generating <h1> elements and I want id attributes on them:
$("#mycontent").append("<h1 id=\\"entry-" + $.quoteHTML(attacker_controlled_string) + "\\">");
With the regex-replace implementation of quoteHTML, this is safe. With the .text().html() approach, it's not:
$('<div/>').text("foo'bar").html();
evaluates to:
"foo'bar"
Changed May 15, 2012 08:16PM UTC by comment:9
Since I can't attach here because the bug is closed, here's a pastebin of my proposed patch:
Changed May 15, 2012 08:22PM UTC by comment:10
Changed May 15, 2012 08:58PM UTC by comment:11
From a purely API surface/support perspective, I'm strongly opposed to adding this to core. Perhaps we could take it on as an officially supported plugin?
Changed May 15, 2012 09:03PM UTC by comment:12
Replying to [comment:11 rwaldron]:
From a purely API surface/support perspective, I'm strongly opposed to adding this to core.
Can you elaborate on your concerns regarding API surface and support?
Also, Mozilla has a decent page on the subject:
https://developer.mozilla.org/en/XUL_School/DOM_Building_and_HTML_Insertion
It doesn't mention the problem with attributes and the need to quote ' and " though.
Changed May 15, 2012 09:31PM UTC by comment:13
As in... adding a new function that we didn't vote on creates new API surface, which in turn requires support: in the form of Q & A for users, writing docs, triaging and patching inevitable bugs, etc.
Changed May 15, 2012 09:34PM UTC by comment:14
Replying to [comment:7 dmethvin]:
A string replace is safer for arbitrary HTML since the <div>
or other container element may not be able to accept any markup you give it, and it may mangle the white space as well.
I suppose whitespace may get mangled, but there are no context problems here since we're using text, not HTML. There's only one node to parse, and it's a text node, so we know that it's safe.
Changed May 15, 2012 09:37PM UTC by comment:15
Replying to [comment:10 dmethvin]:
The logic for including it would be similar to #9521 and #11617.
Actually, if he's asking for attribute escaping, then it's not the same. Do we know how common it is for developers to 1) build up strings of HTML, 2) not use a templating system that already solves this, and 3) be aware enough that they need to escape their strings?
Changed May 16, 2012 02:47AM UTC by comment:16
keywords: | → needsdocs |
---|
I come across it all the time. It's not a best practice by any means, but as the OP mentions it's kind of encouraged by even our examples that whack out some HTML strings for simplicity.
For example the NYTimes Skimmer app (http://nytimes.com/skimmer/) uses John's simple templating plugin which doesn't have any HTML encoding. The MSNBC site builds all sorts of content just by throwing together HTML strings, see this file for example.
It's possible that both of those sites have cleaned the data upstream or HTML-encoded it before sending it, but conceptually it seems like the encoding should be there and not upstream. If not, for example, any headline that contains a double-quote will spit out bad HTML.
The pages for APIs like .html()
don't mention this problem, in fact we don't even mention that scripts injected via .html()
are executed so I'm adding a needsdocs here. (Of course, just about any API that accepts HTML strings does this but it might surprise some devs.)
Changed June 12, 2012 12:25PM UTC by comment:17
status: | reopened → open |
---|
Changed July 12, 2012 01:01AM UTC by comment:18
component: | unfiled → manipulation |
---|---|
resolution: | → plugin |
status: | open → closed |
Before we consider including this, I'd like to see a plugin get some traction.
Changed October 15, 2012 07:30PM UTC by comment:19
keywords: | needsdocs |
---|
Changed January 10, 2014 05:14PM UTC by comment:20
This is all you need to do to escape html. It would be nice to have this in jQuery and I would argue that the concept is too simple for most people to bother installing a plugin for.
escapeHTML = function(s) { return String(s).replace(/&(?!\\w+;)/g, '&').replace(/</g, '<').replace(/>/g, '>').replace(/"/g, '"'); }
Changed March 23, 2014 01:39PM UTC by comment:21
i was looking for such a function, too.
Changed June 09, 2014 04:30AM UTC by comment:22
I agree with trengr. I don't expect anyone would go so far as to use a plugin for such a simple function, but its usefulness warrants inclusion in core jQuery. In the meantime you'll just be forcing people to create the function for themselves.
Is ".text( value )" a viable alternative? Not really, because it's hard to mix generating HTML strings with DOM. You really have to do either one or the other.