#11773 closed bug (plugin)

Opened May 15, 2012 07:10PM UTC

Closed July 12, 2012 01:01AM UTC

Last modified June 09, 2014 04:30AM UTC

jQuery needs HTML escaping functionality

Reported by:	walters@verbum.org	Owned by:
Priority:	undecided	Milestone:	None
Component:	manipulation	Version:	1.7.2
Keywords:		Cc:
Blocked by:		Blocking:

Description

Many of the jQuery example snippets are unsafe to use with untrusted data.

The one that caused me to file this issue is:

http://jquerymobile.com/demos/1.0.1/docs/pages/page-dynamic.html

These specific lines are unsafe:

markup = "<p>" + category.description + "</p><ul data-role='listview' data-inset='true'>",

markup += "<li>" + cItems[i].name + "</li>";

$header.find( "h1" ).html( category.name );

There's a lot of information on the Internet about this;

http://code.google.com/p/doctype-mirror/wiki/ArticleXSSInBodyText

is pretty good.

We need an implementation of that included in the core, and to fix the examples to use it.

Attachments (0)

Change History (22)

Changed May 15, 2012 07:12PM UTC by anonymous comment:1

Is ".text( value )" a viable alternative? Not really, because it's hard to mix generating HTML strings with DOM. You really have to do either one or the other.

Changed May 15, 2012 07:16PM UTC by rwaldron comment:2

resolution:	→ plugin
status:	new → closed

There is a syntax issue - you have a comma at the end of this line

markup = "<p>" + category.description + "</p><ul data-role='listview' data-inset='true'>",

But that's irrelevant to you ticket. This can and should be a plugin.

Changed May 15, 2012 07:17PM UTC by dmethvin comment:3

For the example you've reference in the Mobile docs, it seems perfectly safe since the input is trusted (it's in the code). I understand your point but there are several other solutions including DOM element creation and good templating systems that escape the HTML.

Note that the *whole point* of .html() is to insert HTML (including scripts that execute) into the document, so we can't change that behavior.

There is some related discussion on #9521.

Changed May 15, 2012 07:26PM UTC by walters comment:4

Replying to [comment:3 dmethvin]:

For the example you've reference in the Mobile docs, it seems perfectly safe since the input is trusted (it's in the code). I understand your point but there are several other solutions including DOM element creation and good templating systems that escape the HTML.

Yes, but in general, people follow the path of least resistance. If the jQuery examples don't escape because the input is trusted, things will appear to work for someone who takes the example and uses it on untrusted data. Except their code will be exploitable, or at least unreliable.

Note that the *whole point* of .html() is to insert HTML (including scripts that execute) into the document, so we can't change that behavior.

Right. I'm not suggesting removing or changing .html()/.append() or the variety of functions that accept HTML currently in jQuery. This bug is more about:

1) There should be an official in-jQuery API to escape HTML (both body text and attributes)

2) That function should be used in the examples, even if not strictly necessary

There is some related discussion on #9521.

Yeah, related, but this bug is about how jQuery *encourages* using .html() but doesn't supply an escaping function.

Changed May 15, 2012 07:40PM UTC by scottgonzalez comment:5

Are you asking for a function that does $( "<a>" ).text( str ).html()?

Changed May 15, 2012 07:53PM UTC by walters comment:6

Replying to [comment:5 scott.gonzalez]:

Are you asking for a function that does $( "<a>" ).text( str ).html()?

Yes. I'd attach the patch I wrote, but I can't do it while the bug is closed, I guess.

Here's a page which benchmarks multiple implementations:

http://jsperf.com/htmlencoderegex

It points to the regex replace as being the fastest.

Changed May 15, 2012 08:03PM UTC by dmethvin comment:7

_comment0:	A string replace is safer for arbitrary HTML since the `<div>` or other container element may not be able to accept any markup you give it, and it may mangle the white space as well. \ \ > to escape HTML (both body text and attributes) \ \ What do you mean by that? It's just a string and we're just replacing `< > % " '` without parsing anything. Or are you referring to its use in examples? \ → 1337112270131669

_comment0:

A string replace is safer for arbitrary HTML since the `<div>` or other container element may not be able to accept any markup you give it, and it may mangle the white space as well. \ \ > to escape HTML (both body text and attributes) \ \ What do you mean by that? It's just a string and we're just replacing `< > % " '` without parsing anything. Or are you referring to its use in examples? \ → 1337112270131669

A string replace is safer for arbitrary HTML since the <div> or other container element may not be able to accept any markup you give it, and it may mangle the white space as well.

to escape HTML (both body text and attributes)

What do you mean by that? It's just a string and we're just replacing < > & " ' without parsing anything. Or are you referring to its use in examples?

Changed May 15, 2012 08:13PM UTC by walters comment:8

Replying to [comment:7 dmethvin]:

What do you mean by that? It's just a string and we're just replacing < > & " ' without parsing anything.

Right. What I was getting at is that ' and " don't need to be replaced in HTML body context, but do for HTML attributes.

This page explains it:

http://code.google.com/p/doctype-mirror/wiki/ArticleXSSInBodyText

Specifically: "It's not strictly necessary to escape the quotes in this context; however this will be necessary in other contexts, and it's easiest to use the same escaping function everywhere. "

Basically what I'm suggesting we DON'T do is use the $('<div/>').text(html).html(); trick because it's not safe to use for HTML attributes.

Say I'm generating <h1> elements and I want id attributes on them:

$("#mycontent").append("<h1 id=\\"entry-" + $.quoteHTML(attacker_controlled_string) + "\\">");

With the regex-replace implementation of quoteHTML, this is safe. With the .text().html() approach, it's not:

$('<div/>').text("foo'bar").html();

evaluates to:

"foo'bar"

Changed May 15, 2012 08:16PM UTC by walters comment:9

Since I can't attach here because the bug is closed, here's a pastebin of my proposed patch:

http://pastebin.mozilla.org/1640746

Changed May 15, 2012 08:22PM UTC by dmethvin comment:10

resolution:	plugin
status:	closed → reopened

The patch is straightforward, it's more a question of whether we want to add something like this. It's worth discussion though. The logic for including it would be similar to #9521 and #11617.

Changed May 15, 2012 08:58PM UTC by rwaldron comment:11

From a purely API surface/support perspective, I'm strongly opposed to adding this to core. Perhaps we could take it on as an officially supported plugin?

Changed May 15, 2012 09:03PM UTC by walters comment:12

Replying to [comment:11 rwaldron]:

From a purely API surface/support perspective, I'm strongly opposed to adding this to core.

Can you elaborate on your concerns regarding API surface and support?

Also, Mozilla has a decent page on the subject:

https://developer.mozilla.org/en/XUL_School/DOM_Building_and_HTML_Insertion

It doesn't mention the problem with attributes and the need to quote ' and " though.

Changed May 15, 2012 09:31PM UTC by rwaldron comment:13

As in... adding a new function that we didn't vote on creates new API surface, which in turn requires support: in the form of Q & A for users, writing docs, triaging and patching inevitable bugs, etc.

Changed May 15, 2012 09:34PM UTC by scottgonzalez comment:14

Replying to [comment:7 dmethvin]:

A string replace is safer for arbitrary HTML since the <div> or other container element may not be able to accept any markup you give it, and it may mangle the white space as well.

I suppose whitespace may get mangled, but there are no context problems here since we're using text, not HTML. There's only one node to parse, and it's a text node, so we know that it's safe.

Changed May 15, 2012 09:37PM UTC by scottgonzalez comment:15

Replying to [comment:10 dmethvin]:

The logic for including it would be similar to #9521 and #11617.

Actually, if he's asking for attribute escaping, then it's not the same. Do we know how common it is for developers to 1) build up strings of HTML, 2) not use a templating system that already solves this, and 3) be aware enough that they need to escape their strings?

Changed May 16, 2012 02:47AM UTC by dmethvin comment:16

keywords:	→ needsdocs

I come across it all the time. It's not a best practice by any means, but as the OP mentions it's kind of encouraged by even our examples that whack out some HTML strings for simplicity.

For example the NYTimes Skimmer app (http://nytimes.com/skimmer/) uses John's simple templating plugin which doesn't have any HTML encoding. The MSNBC site builds all sorts of content just by throwing together HTML strings, see this file for example.

It's possible that both of those sites have cleaned the data upstream or HTML-encoded it before sending it, but conceptually it seems like the encoding should be there and not upstream. If not, for example, any headline that contains a double-quote will spit out bad HTML.

The pages for APIs like .html() don't mention this problem, in fact we don't even mention that scripts injected via .html() are executed so I'm adding a needsdocs here. (Of course, just about any API that accepts HTML strings does this but it might surprise some devs.)

Changed June 12, 2012 12:25PM UTC by dmethvin comment:17

status:	reopened → open

Changed July 12, 2012 01:01AM UTC by dmethvin comment:18

component:	unfiled → manipulation
resolution:	→ plugin
status:	open → closed

Before we consider including this, I'd like to see a plugin get some traction.

Changed October 15, 2012 07:30PM UTC by mikesherov comment:19

keywords:	needsdocs

Changed January 10, 2014 05:14PM UTC by trengr comment:20

This is all you need to do to escape html. It would be nice to have this in jQuery and I would argue that the concept is too simple for most people to bother installing a plugin for.

escapeHTML = function(s) {
        return String(s).replace(/&(?!\\w+;)/g, '&').replace(/</g, '<').replace(/>/g, '>').replace(/"/g, '"');
      }

Changed March 23, 2014 01:39PM UTC by anonymous comment:21

i was looking for such a function, too.

Changed June 09, 2014 04:30AM UTC by dmittner comment:22

I agree with trengr. I don't expect anyone would go so far as to use a plugin for such a simple function, but its usefulness warrants inclusion in core jQuery. In the meantime you'll just be forcing people to create the function for themselves.

Side navigation