Ticket #4037 (closed bug: fixed)
800x faster HTML load on large HTML chunks
|Reported by:||dimi||Owned by:|
I was using .load() to load a large chunk of HTML (660K) into my document, when I noticed we are spending almost 3000ms into the .clean() function. Most of the time was spent in .trim(), but beyond that we are doing there a _lot_ of work that's not needed.
I have prepared 2 patches that make a huge difference in performance. Both are trying the remove the bottleneck in the part of code that says "Convert html string into DOM nodes".
Patch number one just gets rid of the unneeded .trim(), and get the time spent in that part of the code from about 2700ms to about 440ms, that more than a 5x speed improvement.
Patch number two tries to get rid of the large memory allocations and tons of unnecessary work that's left, and it's just a little bit more complicated. This one brings the same area of code to only 4ms! That's an almost 800x speed improvement, and with larger chunks it's gonna be bigger.
Size impact is tiny as well. Patch one adds just 6 bytes to the minimized version, or only 4 bytes to the gziped version. Patch two adds 17 bytes on top of patch one to the minimized version, or only 12 bytes to the gziped version.