800x faster HTML load on large HTML chunks

Reported by:	dimi	Owned by:
Priority:	major	Milestone:	1.3.2
Component:	core	Version:	1.3.1
Keywords:	performance	Cc:	dimi@lattica.com
Blocked by:		Blocking:

Description

I was using .load() to load a large chunk of HTML (660K) into my document, when I noticed we are spending almost 3000ms into the .clean() function. Most of the time was spent in .trim(), but beyond that we are doing there a _lot_ of work that's not needed.

I have prepared 2 patches that make a huge difference in performance. Both are trying the remove the bottleneck in the part of code that says "Convert html string into DOM nodes".

Patch number one just gets rid of the unneeded .trim(), and get the time spent in that part of the code from about 2700ms to about 440ms, that more than a 5x speed improvement.

Patch number two tries to get rid of the large memory allocations and tons of unnecessary work that's left, and it's just a little bit more complicated. This one brings the same area of code to only 4ms! That's an almost 800x speed improvement, and with larger chunks it's gonna be bigger.

Size impact is tiny as well. Patch one adds just 6 bytes to the minimized version, or only 4 bytes to the gziped version. Patch two adds 17 bytes on top of patch one to the minimized version, or only 12 bytes to the gziped version.

Attachments (2)

clean-1.diff (0.4 KB) - added by dimi February 01, 2009 09:13AM UTC.

Patch one
clean-2.diff (0.9 KB) - added by dimi February 01, 2009 09:13AM UTC.

Patch two

Change History (4)

Changed February 01, 2009 04:42PM UTC by dimi comment:1

My blog entry about it:

http://zipalong.com/blog/?p=300

Changed February 09, 2009 03:41PM UTC by john comment:2

resolution:	→ fixed
status:	new → closed

Fixed in SVN rev [6190].

Changed February 10, 2009 12:53AM UTC by pbcomm comment:3

It would be better to change trim to:

var start = -1, end = str.length;

while (str.charCodeAt(--end) < 33);

while (++start < end && str.charCodeAt(start) < 33);

return str.slice(start, end + 1);

I don't remember exactly where I got this, I think there is a ticket in here about it. Yes, it's a little more code, but it is so much faster.

I love regex, but in js it is just too slow.

Changed February 10, 2009 01:01AM UTC by dimi comment:4

I agree that we can have a faster .trim(), but I disagree what it would be better to use trim in this context.

Even the fastest trim() I've seen will be much slower than the above code? Pls read my blog entry on this subject, I really don't think you can speed it up all that much, given that it's just a few ms (which, BTW, is faster than the measurements published on the site from where you got that code snippet :)).

Side navigation