Bug Tracker

Opened 11 years ago

Closed 11 years ago

Last modified 9 years ago

#11773 closed bug (plugin)

jQuery needs HTML escaping functionality

Reported by: walters@… Owned by:
Priority: undecided Milestone: None
Component: manipulation Version: 1.7.2
Keywords: Cc:
Blocked by: Blocking:

Description

Many of the jQuery example snippets are unsafe to use with untrusted data.

The one that caused me to file this issue is:

http://jquerymobile.com/demos/1.0.1/docs/pages/page-dynamic.html

These specific lines are unsafe:

markup = "<p>" + category.description + "</p><ul data-role='listview' data-inset='true'>",

markup += "<li>" + cItems[i].name + "</li>";

$header.find( "h1" ).html( category.name );

There's a lot of information on the Internet about this; http://code.google.com/p/doctype-mirror/wiki/ArticleXSSInBodyText is pretty good.

We need an implementation of that included in the core, and to fix the examples to use it.

Change History (22)

comment:1 Changed 11 years ago by anonymous

Is ".text( value )" a viable alternative? Not really, because it's hard to mix generating HTML strings with DOM. You really have to do either one or the other.

comment:2 Changed 11 years ago by Rick Waldron

Resolution: plugin
Status: newclosed

There is a syntax issue - you have a comma at the end of this line

markup = "<p>" + category.description + "</p><ul data-role='listview' data-inset='true'>",

But that's irrelevant to you ticket. This can and should be a plugin.

comment:3 Changed 11 years ago by dmethvin

For the example you've reference in the Mobile docs, it seems perfectly safe since the input is trusted (it's in the code). I understand your point but there are several other solutions including DOM element creation and good templating systems that escape the HTML.

Note that the *whole point* of .html() is to insert HTML (including scripts that execute) into the document, so we can't change that behavior.

There is some related discussion on #9521.

comment:4 in reply to:  3 Changed 11 years ago by walters

Replying to dmethvin:

For the example you've reference in the Mobile docs, it seems perfectly safe since the input is trusted (it's in the code). I understand your point but there are several other solutions including DOM element creation and good templating systems that escape the HTML.

Yes, but in general, people follow the path of least resistance. If the jQuery examples don't escape because the input is trusted, things will appear to work for someone who takes the example and uses it on untrusted data. Except their code will be exploitable, or at least unreliable.

Note that the *whole point* of .html() is to insert HTML (including scripts that execute) into the document, so we can't change that behavior.

Right. I'm not suggesting removing or changing .html()/.append() or the variety of functions that accept HTML currently in jQuery. This bug is more about:

1) There should be an official in-jQuery API to escape HTML (both body text and attributes) 2) That function should be used in the examples, even if not strictly necessary

There is some related discussion on #9521.

Yeah, related, but this bug is about how jQuery *encourages* using .html() but doesn't supply an escaping function.

comment:5 Changed 11 years ago by scottgonzalez

Are you asking for a function that does $( "<a>" ).text( str ).html()?

comment:6 in reply to:  5 Changed 11 years ago by walters

Replying to scott.gonzalez:

Are you asking for a function that does $( "<a>" ).text( str ).html()?

Yes. I'd attach the patch I wrote, but I can't do it while the bug is closed, I guess.

Here's a page which benchmarks multiple implementations:

http://jsperf.com/htmlencoderegex

It points to the regex replace as being the fastest.

comment:7 Changed 11 years ago by dmethvin

A string replace is safer for arbitrary HTML since the <div> or other container element may not be able to accept any markup you give it, and it may mangle the white space as well.

to escape HTML (both body text and attributes)

What do you mean by that? It's just a string and we're just replacing < > & " ' without parsing anything. Or are you referring to its use in examples?

Last edited 11 years ago by dmethvin (previous) (diff)

comment:8 in reply to:  7 Changed 11 years ago by walters

Replying to dmethvin:

What do you mean by that? It's just a string and we're just replacing < > & " ' without parsing anything.

Right. What I was getting at is that ' and " don't need to be replaced in HTML body context, but do for HTML attributes.

This page explains it: http://code.google.com/p/doctype-mirror/wiki/ArticleXSSInBodyText

Specifically: "It's not strictly necessary to escape the quotes in this context; however this will be necessary in other contexts, and it's easiest to use the same escaping function everywhere. "

Basically what I'm suggesting we DON'T do is use the $('<div/>').text(html).html(); trick because it's not safe to use for HTML attributes.

Say I'm generating <h1> elements and I want id attributes on them:

$("#mycontent").append("<h1 id=\"entry-" + $.quoteHTML(attacker_controlled_string) + "\">");

With the regex-replace implementation of quoteHTML, this is safe. With the .text().html() approach, it's not:

$('<div/>').text("foo'bar").html(); evaluates to: "foo'bar"

comment:9 Changed 11 years ago by walters

Since I can't attach here because the bug is closed, here's a pastebin of my proposed patch:

http://pastebin.mozilla.org/1640746

comment:10 Changed 11 years ago by dmethvin

Resolution: plugin
Status: closedreopened

The patch is straightforward, it's more a question of whether we want to add something like this. It's worth discussion though. The logic for including it would be similar to #9521 and #11617.

comment:11 Changed 11 years ago by Rick Waldron

From a purely API surface/support perspective, I'm strongly opposed to adding this to core. Perhaps we could take it on as an officially supported plugin?

comment:12 in reply to:  11 Changed 11 years ago by walters

Replying to rwaldron:

From a purely API surface/support perspective, I'm strongly opposed to adding this to core.

Can you elaborate on your concerns regarding API surface and support?

Also, Mozilla has a decent page on the subject: https://developer.mozilla.org/en/XUL_School/DOM_Building_and_HTML_Insertion

It doesn't mention the problem with attributes and the need to quote ' and " though.

comment:13 Changed 11 years ago by Rick Waldron

As in... adding a new function that we didn't vote on creates new API surface, which in turn requires support: in the form of Q & A for users, writing docs, triaging and patching inevitable bugs, etc.

comment:14 in reply to:  7 Changed 11 years ago by scottgonzalez

Replying to dmethvin:

A string replace is safer for arbitrary HTML since the <div> or other container element may not be able to accept any markup you give it, and it may mangle the white space as well.

I suppose whitespace may get mangled, but there are no context problems here since we're using text, not HTML. There's only one node to parse, and it's a text node, so we know that it's safe.

comment:15 in reply to:  10 Changed 11 years ago by scottgonzalez

Replying to dmethvin:

The logic for including it would be similar to #9521 and #11617.

Actually, if he's asking for attribute escaping, then it's not the same. Do we know how common it is for developers to 1) build up strings of HTML, 2) not use a templating system that already solves this, and 3) be aware enough that they need to escape their strings?

comment:16 Changed 11 years ago by dmethvin

Keywords: needsdocs added

I come across it all the time. It's not a best practice by any means, but as the OP mentions it's kind of encouraged by even our examples that whack out some HTML strings for simplicity.

For example the NYTimes Skimmer app (http://nytimes.com/skimmer/) uses John's simple templating plugin which doesn't have any HTML encoding. The MSNBC site builds all sorts of content just by throwing together HTML strings, see this file for example.

It's possible that both of those sites have cleaned the data upstream or HTML-encoded it before sending it, but conceptually it seems like the encoding should be there and not upstream. If not, for example, any headline that contains a double-quote will spit out bad HTML.

The pages for APIs like .html() don't mention this problem, in fact we don't even mention that scripts injected via .html() are executed so I'm adding a needsdocs here. (Of course, just about any API that accepts HTML strings does this but it might surprise some devs.)

comment:17 Changed 11 years ago by dmethvin

Status: reopenedopen

comment:18 Changed 11 years ago by dmethvin

Component: unfiledmanipulation
Resolution: plugin
Status: openclosed

Before we consider including this, I'd like to see a plugin get some traction.

comment:19 Changed 11 years ago by mikesherov

Keywords: needsdocs removed

comment:20 Changed 9 years ago by trengr

This is all you need to do to escape html. It would be nice to have this in jQuery and I would argue that the concept is too simple for most people to bother installing a plugin for.

escapeHTML = function(s) {
        return String(s).replace(/&(?!\w+;)/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/"/g, '&quot;');
      }

comment:21 Changed 9 years ago by anonymous

i was looking for such a function, too.

comment:22 Changed 9 years ago by dmittner

I agree with trengr. I don't expect anyone would go so far as to use a plugin for such a simple function, but its usefulness warrants inclusion in core jQuery. In the meantime you'll just be forcing people to create the function for themselves.

Note: See TracTickets for help on using tickets.