Bug Tracker

Modify

Ticket #11773 (closed bug: plugin)

Opened 13 months ago

Last modified 8 months ago

jQuery needs HTML escaping functionality

Reported by: walters@… Owned by:
Priority: undecided Milestone: None
Component: manipulation Version: 1.7.2
Keywords: Cc:
Blocking: Blocked by:

Description

Many of the jQuery example snippets are unsafe to use with untrusted data.

The one that caused me to file this issue is:

 http://jquerymobile.com/demos/1.0.1/docs/pages/page-dynamic.html

These specific lines are unsafe:

markup = "<p>" + category.description + "</p><ul data-role='listview' data-inset='true'>",

markup += "<li>" + cItems[i].name + "</li>";

$header.find( "h1" ).html( category.name );

There's a lot of information on the Internet about this;  http://code.google.com/p/doctype-mirror/wiki/ArticleXSSInBodyText is pretty good.

We need an implementation of that included in the core, and to fix the examples to use it.

Change History

comment:1 Changed 13 months ago by anonymous

Is ".text( value )" a viable alternative? Not really, because it's hard to mix generating HTML strings with DOM. You really have to do either one or the other.

comment:2 Changed 13 months ago by rwaldron

  • Status changed from new to closed
  • Resolution set to plugin

There is a syntax issue - you have a comma at the end of this line

markup = "<p>" + category.description + "</p><ul data-role='listview' data-inset='true'>",

But that's irrelevant to you ticket. This can and should be a plugin.

comment:3 follow-up: ↓ 4 Changed 13 months ago by dmethvin

For the example you've reference in the Mobile docs, it seems perfectly safe since the input is trusted (it's in the code). I understand your point but there are several other solutions including DOM element creation and good templating systems that escape the HTML.

Note that the *whole point* of .html() is to insert HTML (including scripts that execute) into the document, so we can't change that behavior.

There is some related discussion on #9521.

comment:4 in reply to: ↑ 3 Changed 13 months ago by walters

Replying to dmethvin:

For the example you've reference in the Mobile docs, it seems perfectly safe since the input is trusted (it's in the code). I understand your point but there are several other solutions including DOM element creation and good templating systems that escape the HTML.

Yes, but in general, people follow the path of least resistance. If the jQuery examples don't escape because the input is trusted, things will appear to work for someone who takes the example and uses it on untrusted data. Except their code will be exploitable, or at least unreliable.

Note that the *whole point* of .html() is to insert HTML (including scripts that execute) into the document, so we can't change that behavior.

Right. I'm not suggesting removing or changing .html()/.append() or the variety of functions that accept HTML currently in jQuery. This bug is more about:

1) There should be an official in-jQuery API to escape HTML (both body text and attributes) 2) That function should be used in the examples, even if not strictly necessary

There is some related discussion on #9521.

Yeah, related, but this bug is about how jQuery *encourages* using .html() but doesn't supply an escaping function.

comment:5 follow-up: ↓ 6 Changed 13 months ago by scott.gonzalez

Are you asking for a function that does $( "<a>" ).text( str ).html()?

comment:6 in reply to: ↑ 5 Changed 13 months ago by walters

Replying to scott.gonzalez:

Are you asking for a function that does $( "<a>" ).text( str ).html()?

Yes. I'd attach the patch I wrote, but I can't do it while the bug is closed, I guess.

Here's a page which benchmarks multiple implementations:

 http://jsperf.com/htmlencoderegex

It points to the regex replace as being the fastest.

comment:7 follow-ups: ↓ 8 ↓ 14 Changed 13 months ago by dmethvin

A string replace is safer for arbitrary HTML since the <div> or other container element may not be able to accept any markup you give it, and it may mangle the white space as well.

to escape HTML (both body text and attributes)

What do you mean by that? It's just a string and we're just replacing < > & " ' without parsing anything. Or are you referring to its use in examples?

Last edited 13 months ago by dmethvin (previous) (diff)

comment:8 in reply to: ↑ 7 Changed 13 months ago by walters

Replying to dmethvin:

What do you mean by that? It's just a string and we're just replacing < > & " ' without parsing anything.

Right. What I was getting at is that ' and " don't need to be replaced in HTML body context, but do for HTML attributes.

This page explains it:  http://code.google.com/p/doctype-mirror/wiki/ArticleXSSInBodyText

Specifically: "It's not strictly necessary to escape the quotes in this context; however this will be necessary in other contexts, and it's easiest to use the same escaping function everywhere. "

Basically what I'm suggesting we DON'T do is use the $('<div/>').text(html).html(); trick because it's not safe to use for HTML attributes.

Say I'm generating <h1> elements and I want id attributes on them:

$("#mycontent").append("<h1 id=\"entry-" + $.quoteHTML(attacker_controlled_string) + "\">");

With the regex-replace implementation of quoteHTML, this is safe. With the .text().html() approach, it's not:

$('<div/>').text("foo'bar").html(); evaluates to: "foo'bar"

comment:9 Changed 13 months ago by walters

Since I can't attach here because the bug is closed, here's a pastebin of my proposed patch:

 http://pastebin.mozilla.org/1640746

comment:10 follow-up: ↓ 15 Changed 13 months ago by dmethvin

  • Status changed from closed to reopened
  • Resolution plugin deleted

The patch is straightforward, it's more a question of whether we want to add something like this. It's worth discussion though. The logic for including it would be similar to #9521 and #11617.

comment:11 follow-up: ↓ 12 Changed 13 months ago by rwaldron

From a purely API surface/support perspective, I'm strongly opposed to adding this to core. Perhaps we could take it on as an officially supported plugin?

comment:12 in reply to: ↑ 11 Changed 13 months ago by walters

Replying to rwaldron:

From a purely API surface/support perspective, I'm strongly opposed to adding this to core.

Can you elaborate on your concerns regarding API surface and support?

Also, Mozilla has a decent page on the subject:  https://developer.mozilla.org/en/XUL_School/DOM_Building_and_HTML_Insertion

It doesn't mention the problem with attributes and the need to quote ' and " though.

comment:13 Changed 13 months ago by rwaldron

As in... adding a new function that we didn't vote on creates new API surface, which in turn requires support: in the form of Q & A for users, writing docs, triaging and patching inevitable bugs, etc.

comment:14 in reply to: ↑ 7 Changed 13 months ago by scott.gonzalez

Replying to dmethvin:

A string replace is safer for arbitrary HTML since the <div> or other container element may not be able to accept any markup you give it, and it may mangle the white space as well.

I suppose whitespace may get mangled, but there are no context problems here since we're using text, not HTML. There's only one node to parse, and it's a text node, so we know that it's safe.

comment:15 in reply to: ↑ 10 Changed 13 months ago by scott.gonzalez

Replying to dmethvin:

The logic for including it would be similar to #9521 and #11617.

Actually, if he's asking for attribute escaping, then it's not the same. Do we know how common it is for developers to 1) build up strings of HTML, 2) not use a templating system that already solves this, and 3) be aware enough that they need to escape their strings?

comment:16 Changed 13 months ago by dmethvin

  • Keywords needsdocs added

I come across it all the time. It's not a best practice by any means, but as the OP mentions it's kind of encouraged by even our examples that whack out some HTML strings for simplicity.

For example the NYTimes Skimmer app ( http://nytimes.com/skimmer/) uses John's simple templating plugin which doesn't have any HTML encoding. The MSNBC site builds all sorts of content just by throwing together HTML strings, see  this file for example.

It's possible that both of those sites have cleaned the data upstream or HTML-encoded it before sending it, but conceptually it seems like the encoding should be there and not upstream. If not, for example, any headline that contains a double-quote will spit out bad HTML.

The pages for APIs like .html() don't mention this problem, in fact we don't even mention that scripts injected via .html() are executed so I'm adding a needsdocs here. (Of course, just about any API that accepts HTML strings does this but it might surprise some devs.)

comment:17 Changed 12 months ago by dmethvin

  • Status changed from reopened to open

comment:18 Changed 11 months ago by dmethvin

  • Status changed from open to closed
  • Resolution set to plugin
  • Component changed from unfiled to manipulation

Before we consider including this, I'd like to see a plugin get some traction.

comment:19 Changed 8 months ago by mikesherov

  • Keywords needsdocs removed

Please follow the  bug reporting guidlines and use  jsFiddle when providing test cases and demonstrations instead of pasting the code in the ticket.

View

Add a comment

Modify Ticket

Action
as closed
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.