On one of my Ruby on Rails projects, I worked on a story that involved performing calculations on some reference data stored in a table in a database. The table was like a dimension table (in data warehousing parlance) and was non-trivial in size, say about 100 columns and 1000 rows. We had to calculate, something similar to a median of a column, for all the rows, for all 100 columns. And the challenge was to perform this calculation at request time, without being too much of a performance overhead.
My first response to any such calculation intensive activity is to push it offline, to avoid the runtime calculation penalty. But in this case, the calculation involved inputs from the user and hence did not lend itself very well to pre-processing. We could have done something that a Oracle cube does, where it builds aggregates of fact data before hand, to expedite the processing time required. But in our case, pre-processing the results for all the possible user input options would have been extremely wasteful in terms of storage. The other concern with this approach would be that, there would be some time delay between the data being available and being usable, due to the pre-processing.
Then came the suggestion of manipulating the data in stored procedure. It made me shudder! Having recently read Pramod Sadalage’s blog1 on pain points of stored procedures, I certainly was not in a mood to accept this solution. The pain points outlined in the blog that particularly appealed to me were, no modern IDE’s to support refactoring, requiring a database to compile a stored procedure, immaturity of the unit testing frameworks, and vertical scaling being the only option to scale a database engine.
As we all know, Ruby is perceived to be slow and incompetent to perform any computationally intensive tasks, and consequentially was not an option on the table. I thought to myself, Ruby can’t be that slow. The calculation we were doing was non-trivial but not super involved either. I decided to give it a shot.
We started by doing all these calculations using ActiveRecord objects and found that the performance was not good at all. ActiveRecord was the culprit because it was creating all these objects in memory and considerably slowing down the calculation process. We ditched it and opted for straight SQL instead and storing the results in arrays and performing the calculations on those arrays. Better! But not good enough. We found that we were performing operations like finding max, min, order by for a given column values in Ruby and which wasn’t particularly performant. We delegated those to the database engine, since they are typically good at such things and saw quite a hefty performance gain. By doing these simple tricks, we could pretty much get the performance that we were looking for.
Even though we had solved the performance problem, we had an unintended side-effect of our design. Given that we were processing data in arrays and outside of the objects where they were fetched from, the code looked very procedural. To solve this problem, we created some meaningful abstractions to hold the data and operate on it. These weren’t at the same granularity as the ActiveRecord would have created in the first place, but at a much higher level. This way it was a good compromise between, having too many objects and procedural code on the other hand, yet getting the performance we desired.
The biggest win for me in doing all these calculations in Ruby, was keeping the business logic in one place, in the app, and unit testing exhaustively the calculation logic.
I guess none of this is revolutionary in any way, but I guess next time I face a similar situation, I would have the conviction that, Ruby is not all that slow.