The short of it, as you can find with a quick Google search1, is that << is superior to + (or +=) in Ruby. This is because << appends to an existing object, while += creates a new object. Creating a new object in Ruby is not nearly as fast as appending to an existing object. This should not be surprising to anyone that has written the two operations in a lower level language like C or C++. Now, on with the story.
At work, I’m writing a parser for less-than-well-known file format. In the lexical analysis stage, I was adding arrays together. Something along the lines of:
@tokens += token
Where @tokens is an array, and token is an array of the form [:symbol, character]2. At any rate, when I tried to parse a very large file, it took approximately 180 seconds. My smaller tests had all been very fast. Looking at the code, the += was the first thing that came to mind. I checked it with the benchmarker and found that a particular loop full of +=‘s was particularly offensive. A quick refactor from += to << later, and parsing the same file now took just under 9 seconds. That’s down from 180 seconds. Nice.
So what have I learned here? Well, I’m not sorry that I used +=. It worked just fine, and anyone that knows me knows I recite “premature optimization is the devil” at least 10 times a day. Now that I’m in the final test phases of the parser, I can take the time to optimize things like this. However this also reinforced the adage of “everything is fast when n is small.” A good reminder to test with larger datasets before claiming any algorithm is complete.
Comments
Leave a comment Trackback