Skip to main content

Bug Tracker

Side navigation

#7231 closed bug (duplicate)

Opened October 18, 2010 05:27AM UTC

Closed December 09, 2010 02:51AM UTC

Last modified March 15, 2012 12:14PM UTC

$.fn.data and html5 data- attributes - broken implementation

Reported by: nate@frickenate.com Owned by:
Priority: undecided Milestone:
Component: data Version: 1.4.3
Keywords: Cc:
Blocked by: Blocking:
Description

The type detection built into retrieval of values from html5 data- attributes via $.fn.data in its current implementation is horribly broken.

Support was provided for booleans true and false, null, and complex objects (arrays and object hashes). These are all handled by setting the attribute's value to the JSON-encoded version of the data. However, the same treatment was not provided for parsing strings.

data-name='"joe"' currently parses as a string with the value '"joe"', when it should in fact be parsed as string 'joe' - no quotes. Current implementation does not allow for passing a string with the value "true", "false", "null", nor a stringified version of a complex JSON object.

Invalid JSON values (ie: data-name="joe") can be made to fallback on string value rather than generating a JSON error. But strings in their proper JSON encoded form must be properly supported.

If support for booleans, null and complex objects is going to stay, all data- attributes should be parsed with parseJSON, not a hacked-together soup of sometimes-parsing as JSON, sometimes not. If there is truly a performance boost from prematurely detecting booleans/nulls/numerics rather than parsing proper JSON, then these shortcut detections should be built directly into the parseJSON function, not in a single place within $.fn.data.

Attachments (0)
Change History (27)

Changed October 18, 2010 09:06AM UTC by jitter comment:1

The following reddit post is related to this bug report (and probably also the base for it).

It also provides code to detail what is working wrong

http://www.reddit.com/r/programming/comments/ds42b/jquery_143_released/c12l6z9

Changed October 18, 2010 02:59PM UTC by addyosmani comment:2

#7241 is a duplicate of this ticket.

Changed October 18, 2010 10:10PM UTC by snover comment:3

component: unfileddata
keywords: → needsreview

Personally, I don’t think this should have ever landed. Marking for review.

Changed October 21, 2010 02:53AM UTC by paul.irish comment:4

I can't verify the issue with data-name=joe. It parses as a string without quotes fine in my tests:

http://jsfiddle.net/A3UFA/2/

Changed October 21, 2010 01:11PM UTC by leo@snorland.com comment:5

@paul.irish: You have misunderstood the problem, read the bug report very carefully and pay attention to the number of quotes surrounding the various strings. Also see the link to the Reddit comment above that shows the problem with examples in more details.

Changed October 21, 2010 03:27PM UTC by rwaldron comment:6

_comment0: So, I've recreated paul's test but with the correct quoting weirdness from the ticket. With regard to: \ \ data-name='"joe"' (that's: single, double, joe, double, single) \ \ ... If anyone posted on the forums asking for help with something like this, they'd likely be told to correct their markup syntax. \ \ \ However, I've also recreated the JSON string issue... \ \ http://jsfiddle.net/A3UFA/6/ \ \ 1287674845895421

So, I've recreated paul's test but with the correct quoting weirdness from the ticket. With regard to:

data-name='"joe"' (that's: single, double, joe, double, single)

... If anyone posted on the forums asking for help with something like this, they'd likely be told to correct their markup syntax.

However, I've also recreated the JSON string issue...

http://jsfiddle.net/A3UFA/7/

Changed October 21, 2010 08:04PM UTC by leo@snorland.com comment:7

@rwaldron: I think examples are just being given this way for the sake of brevity (although it's technically valid HTML with single quotes), of course if you use double quoted attribute values, then the HTML would be:

data-name=""joe""

But that's not the issue as you know, it's that this needs to be parsed as JSON, giving:

joe

But instead it is not parsed at all and gives:

"joe"

Regarding your other examples, if you're going to use double quotes in your HTML, then you obviously have to escape your entities accordingly, eg. in PHP:

{
htmlentities( json_encode( $data ), ENT_COMPAT, "utf-8" )
}

will give you output that can be used inside double quoted HTML. There's no way around this - a "bummer" as you say but anything outputted from a server to be used in HTML needs processing anyway, this is simply a step that should *already*, *always* be taken, so it's not like there's really any extra hassle when doing these things programatically. It will just look ugly if you're writing the HTML by hand, reading it, etc. but this is simply how it is.

Changed November 04, 2010 03:17PM UTC by braddunbar comment:8

This is a very small issue with a very convenient abstraction. Stripping quotes can be easily done in post-processing on the client (not to mention, why put them in on the server in the first place?). The data is in no way mangled, it just needs a little more massaging.

Changed November 04, 2010 03:25PM UTC by nate@frickenate.com comment:9

@braddunbar - no, it's not a very small issue. Please don't downplay a serious problem like this.

Changed November 04, 2010 03:28PM UTC by nate@frickenate.com comment:10

There is NO REASON why jquery should not parse data attributes as

try {

value = parseJSON(data-attr-value);

} catch {

value = value; // allow unquoted strings as fallback rather than error

}

Changed November 05, 2010 07:42PM UTC by john comment:11

@nate: Handling the parsing in that way would be very very slow. It would make it such that we have to do a full code eval (and try/catch) for every attribute - which we most certainly don't want to do. We only want to do it for attributes that look like they could contain JSON data.

If you care about accessing a guaranteed string value from a data attribute then you can just access it using the same way you've always been able to access it: .attr("data-some-attr"). The .data() API is well beyond simple string values and being able to get the results as actual JavaScript objects is quite important.

If we want to turn this bug into a discussion regarding performance of parseJSON and moving logic relating to JSON parsing into there - we can certainly do that. But right now this bug is lumping together a few issues (string parsing - which there is a very obvious solution to - and better than the double-quoting weirdness that was mentioned - and the performance of parseJSON, which is another point of discussion). If this is purely a code-organization/performance discussion (as it appears it is) then it is not a top priority for 1.4.4 which is primarily tackling regressions and bug fixes.

Changed November 08, 2010 12:22AM UTC by Leo Dillon <leo@snorland.com> comment:12

@braddunbar:

This is a very small issue with a very convenient abstraction

No, not only is this not a small issue, it completely breaks the usefulness that this function might otherwise have had. It is broken and unusable for many cases in it's current state.

why put them in on the server in the first place?

Because that is how a string is encoded in JSON. This function parses all JSON types except for strings - so any time a string is passed it will corrupt the data you have send - it will come out differently than it went in.

You can't encode everything *except* strings as JSON on the server because a string could end up being a valid boolean, integer, etc. Then you end up with a string being turned into something which is not a string. So yes, it does mangle data.

Changed November 08, 2010 12:32AM UTC by Leo Dillon <leo@snorland.com> comment:13

@john

We only want to do it for attributes that look like they could contain JSON data.

Currently it has some way of detecting arrays and objects, I'm not entirely sure how it's doing this as I have not yet checked the code, but if it's doing something like checking the first character for { or [ as a quick way to test if it's likely to be JSON, why can't it simply do the same for "?

If this is purely a code-organization/performance discussion (as it appears it is) then it is not a top priority for 1.4.4 which is primarily tackling regressions and bug fixes.

This is NOT such a discussion - I honestly don't see how this ever made it into 1.4.3 in it's current state. I don't see why there are fixes for some data functionality in 1.4.4 when it's not even usable by so many people due to this bug. I'm sure @nate won't be using it any time soon while it remains in this state, and I know I sure won't, and we're not the only ones...

and better than the double-quoting weirdness that was mentioned

This comment makes me think that perhaps you haven't quite understood the problem here too, but I may be wrong. There's no "weirdness" happening, there is simply a broken function that isn't doing the one sensible thing that makes sense.

I've confirmed this broken-ness with many other people after seeing the number of responses by people who don't seem to fathom the extent of the problem ... it seems like once someone has read through the various cases thoroughly and thought about how they would use this function in a real-world scenario, they realise that it is, in fact, broken... I suggest anyone else commenting on this bug do some thorough thinking on the subject before adding your 2 cents.

Changed November 17, 2010 12:30AM UTC by dmethvin comment:14

strings in their proper JSON encoded form must be properly supported.

They *are* properly supported when the attribute is JSON. JSON syntax as described at http://json.org says a JSON string should start with either '{' or '[' so the only thing I suppose we can debate is whether leading spaces should be recognized. The 5 characters

"joe"
are not valid JSON.

Single-quoted attribute values are valid HTML. That makes it possible to specify valid JSON in markup.

http://www.w3.org/TR/html-markup/syntax.html#syntax-attributes

I want a call to

.data("name")
with an attribute of
data-name="joe"
to return the three-character string
joe
. No further magic should be applied for that case. As for the other magic values like
null
,
true
, and
false
, I would be perfectly happy if they just returned the string to avoid more special cases and we could save the magic exclusively for JSON. In most cases the code will know what it is expecting, and
typeof
can be used for the other cases.

Changed November 17, 2010 01:15PM UTC by Leo Dillon <leo@snorland.com> comment:15

@dmethvin: ?????????????????????????

They *are* properly supported when the attribute is JSON. JSON syntax as described at http://json.org says a JSON string should start with either '{' or '[' so the only thing I suppose we can debate is whether leading spaces should be recognized. The 5 characters "joe" are not valid JSON.

Have you ever looked at the JSON spec for more than 3 seconds? It clearly states that: A JSON string is denoted using ", and any "..." is a valid JSON string. In fact { and [ notate objects and arrays, NOT strings!

The 5 characters "joe" IS VALID JSON - FOR THE 500TH TIME!!! This is THE ONLY way to denote a string using JSON!

What on earth is going on here, how is it that the entire jQuery team don't even know how JSON works and why is 1.4.4 out without a fix for this? Your major feature doesn't even work!!!!!!!!!!!!!!!!

Sorry if I'm being a bit crazy here but everyone other than nate here doesn't even appear to understand this bug report, so can someone please just spare 30 minutes of their time to read it carefully and don't reply unless you actually know what you're talking about - THANK YOU!

Changed November 17, 2010 01:36PM UTC by Leo Dillon <leo@snorland.com> comment:16

@dmethvin: Sorry, I must apologise and I replied a bit hastily there as this bug has been driving me crazy - I just realised that you were referring to the standard JSON structures. So maybe not technically wrong but individual variables: booleans, strings, numbers and null are all still considered valid JSON strings... Every JSON encoding and decoding function I have come across treats them in this way, if you JSON encode a string 'joe' then it will become '"joe"' - even jQuery's .parseJSON will turn '"joe"' into 'joe' ... the reason this .data() function is problematic is that you can't encode your variable (of any given type) on a server, for example, and retain the variable along with the type when it's read through .data(), so any time you want to pass a string you simply can't and have to put it inside an array or object, which really shouldn't be the case.

Changed November 17, 2010 05:56PM UTC by dmethvin comment:17

milestone: 1.next
resolution: → invalid
status: newclosed

Leo, yes I was referring to the standard JSON object or array. By the term "JSON string" I meant a string that contains the character representation of a JSON object or array. That is the only case where the implementation tries to parse the characters returned from a

data-*
attribute as JSON. The documentation does not say that all
data-*
attributes are parsed as JSON, because they are not.

The implementation does not treat any other attribute string as JSON because (IMO) it would be less useful to do it that way. To repeat, I want a call to

.data("name")
with an attribute of
data-name="joe"
to return the three-character string
joe
. I do not want everyone who writes markup for jQuery to be required to wrap an extra set of quotes around a simple string attribute.

I'm going to close this ticket because the implementation and documentation are in perfect agreement right now as far as I can tell, and they are working as intended. I'll leave the

needsreview
tag so the team can take a final look at it for 1.5 fine-tuning, but I will personally lobby hard against any change that requires doubly-quoting string attributes.

Leo, I'd also suggest that you review your tone in the original ticket and responses. It's hard to be objective about these suggestions given the way they've been presented.

Changed November 17, 2010 11:13PM UTC by Leo Dillon <leo@snorland.com> comment:18

@dmethvin: Sorry but I must reply again because you have just stated some things about the functionality which aren't quite true.

The implementation does not treat any other attribute string as JSON because (IMO) it would be less useful to do it that way.

It treats a string containing 'true' or 'false' as a boolean.

It treats a string containing 'null' as null.

It treats a string containing a number as an integer/float.

So maybe when your string is 'joe', yes, it will come through as 'joe'. But when your string is 'true' it will not come through as a string at all, it will be converted into a boolean value.

Due to *this* passing a string is absolutely useless because there is no guarantee that it will remain a string.

If you don't want people to have to put quotes around their strings then you need to also stop parsing the null, boolean and number types so that the strings are actually preserved.

Again, I apologise regarding my tone but I have tried to explain this so many, many people and it seems like I've got through to about 3 out of 100, and it's seriously making me doubt my own sanity. But still no one, yourself included, can explain to me how I am supposed to put a string into a data attribute in my HTML, and get the same string out when I call .data() ... because this function doesn't do this.

One final example (quoting you):

To repeat, I want a call to .data("name") with an attribute of data-name="joe" to return the three-character string joe.

That's fine.. it does this.

How about:

To repeat, I want a call to .data("name") with an attribute of data-name="true" to return the four-character string true.

It does not do this!

Falling back to parsing as a string when nothing else can be matched is fine, but there needs to be a way to explicitly specify that your variable is a string or all kinds of weirdness, inconsistencies and bugs are going to be introduced into so many pieces of code.

Changed November 19, 2010 11:43AM UTC by jakub.suder@gmail.com comment:19

I agree that this issue can make this feature unusable in many cases. I would like to use it to pass a user's login from the server to Javascript, as data-username="..." and then access it as data('username'). However, I can't be sure that the username won't be "true" or "false" or "123", in which case the value will be returned not as string but as boolean or a number. I don't want to have to code special cases for such usernames.

Please reopen this ticket, this needs to be addressed.

Changed November 22, 2010 03:58AM UTC by snover comment:20

I agree with everyone speaking against this change, and have raised basically every issue you have directly with the rest of the team, but there are key players that simply do not see eye-to-eye on this. Their responses are ‘you can always just use .attr if you don’t like it’, ‘it’s a cool feature that is more useful than dangerous, despite what you are saying’, and ‘well, we can’t remove it, because it’s there now’.

The number issue can come up in other fun ways. “5E2201” is a number. “0xED” is a number. “0702” is a number, obviously, but it’s interpreted as octal, so actually means 450. None of these rules should be surprising if you’ve worked with numbers in JS a lot, but many people haven’t.

Anyway, I am really not sure how much more there is worth talking about here. I will continue to push internally as I can to get this changed, and you are welcome to continue to vote on this ticket (even closed tickets that are voted highly will be considered), but any further arguing in comments is likely to fall on deaf ears.

Changed November 22, 2010 01:11PM UTC by anonymous comment:21

Ok, thanks for info.

Changed November 22, 2010 04:14PM UTC by Leo Dillon <leo@snorland.com> comment:22

@snover: Thanks for the extra info. I understand not wanting to remove or change this in any way that would break anything, but is there a real argument against the addition of detecting JSON strings? Everything else including the fallback to string anyway would still apply so nothing would be broken or lost. I would be against anything more drastic than this anyway, but this seems like the most sensible thing to do at this point to at least make the function useful, but also less hazardous as people will at least have the choice to use the safe notation after reading about it.

The most worrying thing about this is that lots of people have already started writing code using this functionality not realising the problems with it. There are already plenty of bits of code floating around out there just waiting to break when the right string comes along, so maybe the very least that could be done for now is adding a warning to the function page letting people know that they can't rely on their plain strings being preserved.

And while it doesn't implicitly state that strings are preserved, enough people have already assumed that this is the case and started using the function without stopping to think about these implications.

Changed December 09, 2010 02:49AM UTC by dmethvin comment:23

resolution: invalid
status: closedreopened

Changed December 09, 2010 02:51AM UTC by dmethvin comment:24

resolution: → duplicate
status: reopenedclosed

I'm re-marking this as a dup of #7579, the resolution will be slightly different than discussed here.

Changed December 09, 2010 02:51AM UTC by dmethvin comment:25

Duplicate of #7579.

Changed December 09, 2010 03:44AM UTC by dmethvin comment:26

keywords: needsreview

Changed May 10, 2011 07:51PM UTC by anonymous comment:27

I am eager to know if there is a result for this bug. my problem is that I need "01002" to be passed exactly as "01002". But I get 1002 instead.