Ruby XML Parsing Benchmarks
Benchmark. Benchmark. Benchmark.
I had heard it a million times.
So rather than just blindly trust that Hpricot was faster than REXML for my next project I whipped up a quick benchmark.
$ ruby test/benchmarks/xml_parsing.rb
Rehearsal -------------------------------------------
REXML 3.300000 1.190000 4.490000 ( 4.637418)
Hpricot 9.010000 3.370000 12.380000 ( 12.645412)
--------------------------------- total: 16.870000sec
user system total real
REXML 3.320000 1.100000 4.420000 ( 4.456072)
Hpricot 8.870000 3.340000 12.210000 ( 12.438566)
Wow - at least with the type of XML data I'll be working with, REXML is massively faster.
See the full benchmark code. See update below...
P.S. libxml claims to be much faster than REXML. But I found it poorly documented and very unstable (at least on Leopard it would continually crash IRB with bus errors - yikes). If anyone else has had better luck with libxml drop a note in the comments...
Update 4-24-08
Lee commented with a nice diff showing a much cleaner way to use Hpricot which makes it much faster than REXML!
$ ruby test/benchmarks/xml_parsing.rb
Result are identical
Rehearsal -------------------------------------------
REXML 3.370000 1.160000 4.530000 ( 4.712390)
Hpricot 1.280000 0.390000 1.670000 ( 1.703311)
---------------------------------- total: 6.200000sec
user system total real
REXML 3.360000 1.090000 4.450000 ( 4.716999)
Hpricot 1.270000 0.380000 1.650000 ( 1.684723)
Here is the new benchmark code.
Thanks Lee!
Now, anyone want to add a libxml section to the benchmark?


5 Comments
Rehearsal ------------------------------------------- REXML 0.960000 0.000000 0.960000 ( 0.961413) Hpricot 0.370000 0.000000 0.370000 ( 0.377359) ---------------------------------- total: 1.330000sec user system total real REXML 0.960000 0.000000 0.960000 ( 0.954501) Hpricot 0.330000 0.000000 0.330000 ( 0.331897)Here is the diff from the pastie to my version:--- 185724.txt 2008-04-24 09:00:58.000000000 -0600 +++ rexml_h_bm.rb 2008-04-24 09:00:42.000000000 -0600 @@ -8,25 +8,31 @@ class Parse def self.rexml doc = REXML::Document.new(XML) + ary = [] REXML::XPath.each(doc, '/*/*/*') do |node| case node.name when 'ItemQueryRs' node.elements.each do |element| - rexml_fetch(element, 'ListID') + ary << rexml_fetch(element, 'ListID') end end end + ary end def self.hpricot - doc = Hpricot(XML) - response_element = doc.search('*[@requestid]').first - case response_element.name - when 'itemqueryrs' - doc.search('itemserviceret|itemnoninventoryret|itemotherchargeret|iteminventoryret').each do |e| - hpricot_fetch(doc/:listid) + doc = Hpricot.XML(XML) + ary = [] + response_element = doc.search('/*/*/*').each do |node| + next unless node.elem? + case node.name + when 'ItemQueryRs' + node.containers.each do |element| + ary << hpricot_fetch(element/'ListID') + end end end + ary end # rexml helper @@ -43,6 +49,16 @@ end end +rexml_results = Parse.rexml +hpricot_results = Parse.hpricot +if rexml_results == hpricot_results + puts "Result are identical" +else + puts "Results are not the same!" + puts "REXML values: #{rexml_results.inspect}" + puts "Hpricot values: #{hpricot_results.inspect}" +end + TIMES = 10 Benchmark.bmbm do |x| x.report('REXML') { TIMES.times { Parse.rexml } }Commenting is closed for this article.