Ruby XML Parsing Benchmarks
Benchmark. Benchmark. Benchmark.
I had heard it a million times.
So rather than just blindly trust that Hpricot was faster than REXML for my next project I whipped up a quick benchmark.
$ ruby test/benchmarks/xml_parsing.rb
Rehearsal -------------------------------------------
REXML 3.300000 1.190000 4.490000 ( 4.637418)
Hpricot 9.010000 3.370000 12.380000 ( 12.645412)
--------------------------------- total: 16.870000sec
user system total real
REXML 3.320000 1.100000 4.420000 ( 4.456072)
Hpricot 8.870000 3.340000 12.210000 ( 12.438566)
Wow - at least with the type of XML data I’ll be working with, REXML is massively faster. See the full benchmark code. See update below…
P.S. libxml claims to be much faster than REXML. But I found it poorly documented and very unstable (at least on Leopard it would continually crash IRB with bus errors - yikes). If anyone else has had better luck with libxml drop a note in the comments…
Update 4-24-08
Lee commented with a nice diff showing a much cleaner way to use Hpricot which makes it much faster than REXML!
$ ruby test/benchmarks/xml_parsing.rb
Result are identical
Rehearsal -------------------------------------------
REXML 3.370000 1.160000 4.530000 ( 4.712390)
Hpricot 1.280000 0.390000 1.670000 ( 1.703311)
---------------------------------- total: 6.200000sec
user system total real
REXML 3.360000 1.090000 4.450000 ( 4.716999)
Hpricot 1.270000 0.380000 1.650000 ( 1.684723)
Here is the new benchmark code.
Thanks Lee!
Now, anyone want to add a libxml section to the benchmark?