Ruby XML Parsing Benchmarks

Benchmark. Benchmark. Benchmark.

I had heard it a million times.

So rather than just blindly trust that Hpricot was faster than REXML for my next project I whipped up a quick benchmark.

$ ruby test/benchmarks/xml_parsing.rb 
Rehearsal -------------------------------------------
REXML     3.300000   1.190000   4.490000 (  4.637418)
Hpricot   9.010000   3.370000  12.380000 ( 12.645412)
--------------------------------- total: 16.870000sec

              user     system      total        real
REXML     3.320000   1.100000   4.420000 (  4.456072)
Hpricot   8.870000   3.340000  12.210000 ( 12.438566)

Wow - at least with the type of XML data I’ll be working with, REXML is massively faster. See the full benchmark code. See update below…

P.S. libxml claims to be much faster than REXML. But I found it poorly documented and very unstable (at least on Leopard it would continually crash IRB with bus errors - yikes). If anyone else has had better luck with libxml drop a note in the comments…

Update 4-24-08

Lee commented with a nice diff showing a much cleaner way to use Hpricot which makes it much faster than REXML!

$ ruby test/benchmarks/xml_parsing.rb 
Result are identical
Rehearsal -------------------------------------------
REXML     3.370000   1.160000   4.530000 (  4.712390)
Hpricot   1.280000   0.390000   1.670000 (  1.703311)
---------------------------------- total: 6.200000sec

              user     system      total        real
REXML     3.360000   1.090000   4.450000 (  4.716999)
Hpricot   1.270000   0.380000   1.650000 (  1.684723)

Here is the new benchmark code.

Thanks Lee!

Now, anyone want to add a libxml section to the benchmark?

Comment or question via
FYI: This post was migrated over from another blogging engine. If you encounter any issues please let me know on . Thanks.