refactoring | Computellect

Yesterday I had a 12 hour non-stop^[1] code fest to refactor a thin slice of 2-tiered web application into a 3-tiered one. It was very productive and I must say this is the kind of stuff that soothes my developer soul and hence the name. 🙂

The primary driver for the refactoring was that the core logic of the application was tightly coupled on both ends to the frameworks being used. On one side, it was tied to the web application framework, Play and on the other end the ORM, Ebean. We managed to move the business logic in to a separate tier, which is good on it own, but also let us unit test the business logic without bothering with the frameworks, which can be frankly quite nasty. As a follow on effect, we also managed to split the models into 2 variants, one to support the database via Ebean and the other to generate JSON views using Jackson. This let us separate the two concerns and test them nicely in isolation. Similarly, we could test the controllers in isolation. We got rid of bulk of our functional tests that were testing unhappy paths for which we had unit tests at the appropriate place viz., the controller, view model, service and database models.

I was quite amazed at how much we could get done in a day. Here are some of the important take aways from this experiment:

We had a discussion the previous day about how we wanted to restructure the code, primarily focusing on separation of responsbilities and improving unit testability of the code. We agreed upon certain set of principles, which served as a nice guideline going into the refactoring. On the day of refactoring, we realized that not all the things we discussed were feasible, but we made sensible adjustments along the way.
Keep your eye on the end goal. Keep reminding yourself of the big picture. Do not let minor refactorings distract you, write it down on a piece of paper so you dont forget.
Pairing really helps. If you are used to pairing, when you are doing refactoring at this scale, it doubly helps. Helps you keep focused on the end goal, solves problems quickly due to collective knowledge and also decision making cycle time is considerably reduced when making adjustments to the initial design. Also I would say pick a pair who is aligned with you on the ground rules of how you are going to approach development. You don’t want to get into a discussion of how or why you should be writing tests and what is a good commit size.
Having tools handy that get you going quickly. Between me and my pair, we pretty much knew what tool to use for all the problems at hand. At one point, we got stuck with testing static methods and constructors. My pair knew about PowerMock, we gave it a spin and it worked. And there it was, included in the project. Dont spend too much time debating, pick something that works and move on. If it does not work for certain scenarios, put it on your refactoring list.
Thankfully for us, we had a whole bunch of functional tests already at our disposal to validate the expected behavior, which was tremendously useful to make sure we weren’t breaking stuff. If you dont have this luxury, then pick a thin slice of functionality to refactor which you can manually test quickly.
Small, frequent commits. Again the virtuosity of this is amplified in this kind of scenario.
Say no to meetings. Yes, you can do without them for a day, even if you are the president of the company. 🙂

Have you done any soul coding lately? 🙂

[1] Ok, not quite 12 hours, but it was on my mind all the time. 😉

Refactoring is a critical factor in the successful evolution of a code base. Slow and small steps of refactoring help keep the code base healthy and clean.

I would classify refactoring into 3 levels ordered by increasing scope, and consequentially, effort.

1) class level refactoring

This is the type of refactoring typically done at the end of red-green-refactor cycle. As TDD practitioners are aware, you write a failing test for the business expectation, you fix the test and then refactor the code to make it ‘better’. Here are some examples:
a) rename methods/variables to clearly express their intent
b) extract duplicate lines of code into a method
c) memoize to avoid duplicate calls to expensive pieces of code
d) write idiomatic code. I was baffled when I first heard the word ‘idiomatic’ so let me take a minute to explain it. It means using the language constructs rather than the plain old constructs, that you find in almost every language. An example in Ruby would be: lets say you have a method that initializes an array, adds some elements to the array and returns the array. As you can imagine, these three steps would each correspond to a line in your method if written in a non-idiomatic fashion. If you were to write idiomatic code, you would use tap instead. Here is how it looks.


[].tap do |fruits|
  fruits << 'apple'
  fruits << 'banana'
end

2) story level refactoring

This is the type of refactoring, you do when the story is “functionally” complete. Here are some examples:

a) convert procedural code into object oriented code. A symptom of procedural code is, manipulating the state of an object outside its own class. Here is an example:


if(input.numeric?)
  NumberFormatter.format(input)
else if(input.date?)
  DateFormatter.format(input)
end

In this case, some code outside the input class is making a decision of which formatter to apply to a input. This clearly breaks encapsulation by exposing the input data to something outside the input class. Doing this the right way would mean, when initializing the object you create the right type of class based on its input, like a NumericInput vs DateInput, write a format method on each of those classes and simply have the client call the format method on the instance of the class. The client should not care what instance it is calling the format method on, as long as the input type knows how to format itself.

Here is how it looks:


class NumericInput
  def format
  end
end

class DateInput
  def format
  end
end

client code would be:
input.format
Believe it or not, a lot of your decision making statements will go away, if you create the right type of objects. To create the right type of objects, it is very important to identify the right abstractions for “state holders” and then push the behavior on to those abstractions(classes in object-oriented world).

b) Remove duplicate code from multiple classes and create common abstractions.

c) Watch out for performance-related refactorings like avoiding n+1 query problem, removing/merging duplicate database/service calls, caching expensive service/database calls, avoiding fetching more data than necessary from external services (example being fetching all the search results and counting them as opposed to directly getting count of results or getting database results and sorting the results as opposed to fetching sorted results), graceful handling of service/database error conditions.

3) big bang refactoring

This is the type of refactoring where you change a large part of your code, purely for technical reasons. The drivers for this type of refactoring are the exorbitant cost of changing a relatively small piece of code, performance being unacceptable, the code being too buggy and rather be rewritten than applying band-aid solutions or the code breaking in production and it is hard to tell what is going on. The effort is too big to be handled as a part of a ‘feature’ story and hence has to be done independently as one or more stories.

Big bang refactoring stories are generally hard to sell because they don’t deliver any client-facing functionality, the impact and to some extent the scope being unclear(example being refactoring Javascript code for a page to follow MVC pattern) and the fear of breaking something already working. These stories have to be drafted with a clear end goal in mind to give the developers a good idea of the scope and assess the impact. The story should enunciate the pain points being addressed as a part of acceptance criteria and the rough amount of effort required. Such stories should be more actively monitored to avoid losing focus. The stories with high level of pain and low level of effort should be prioritized first. It is possible that such stories might never get prioritized. In such a scenario, the work should be broken down into smaller chunks and handled as type 2 story-level refactoring with a related feature card.

Lessons learnt:

* Make separate unit tests for testing different intentions of a method. For an example, lets say a method returns a hash of attributes about a person object. So you would have a separate test for each item in the hash or at least a logical grouping of items in the hash. Here is an example


it "should format the name" do
  person.to_hash[:name].should == 'Todkar, Praful'
end

it "should calculate the age based on the birth date" do
  person.to_hash[:age].should == '29'
end

This way it makes it easier to understand the behavior being tested. Also if some of the logic changes or goes away it is easy to change or delete the test. Also, you could test the complicated parts more rigorously without testing everything that the method returns. You could put the test setup for the method in a before block to avoid duplication.

* Avoid unit testing of private methods of a class. Testing private methods is a smell and indicates that there is too much going on in the class being tested. The fact that you want to test the private method without going through the public interface means that you could easily create an abstraction for that code and inject it as a dependency. You could test the dependency separately and rigorously. For example, there is a method that finds the median age for a group of people stored in a database. Now as you can see, there are two separate pieces of logic here. The part where you read from the database is fairly easy to test where as testing the median age is more complicated and hence can be tested easily and thoroughly if it were its own abstraction. Also for some reason if your logic changed to find the mean age instead of the median age, your database test should remain the same.

* Try to do as much unit-level testing as possible to isolate problems during test failures. The other advantage of unit testing is that it helps reduce the combinations of integration/functional tests. Say you have to write a functional test to test a form that accepts age as one of it fields. If you have properly unit tested all the edge cases for the field accepting age as an input, then the form submission test does not have to deal with various combinations of age input. Almost always integration tests are an order of magnitude slower than unit tests. Unit tests give you faster feedback at relatively low cost.

* It is very important to have clearly defined outcomes for refactoring stories. I was on a project that had the first phase of the project dedicated to refactoring. Quickly we realized, we weren’t making much progress as it was hard to tell when a refactoring story was done. The boundary lines of the refactoring stories were blurry, hence causing confusion. I recommended that we tie refactoring effort to new feature stories, originally planned to be developed in phase 2. The approach we took was refactoring code in and around the feature we were working on. This way we had a better sense of story boundary and knew when to stop refactoring. Once the patterns were in place, it was easy to replicate those across the code base. The business also liked it as they started seeing features delivered sooner than they had anticipated.

* If you find too much mocking/stubbing of dependencies when testing a method in a class, it might be a good time to consolidate the dependencies into fewer abstractions, and delegate some of the testing to the new abstractions. Too much mocking/stubbing can be a result of a chatty interaction between multiple dependencies. It happens when you have data being passed around from one object to the other. It helps to have a owner of the data and push the behavior on to the data owner to avoid chatty interaction.

* One of the tips that I have found useful when naming classes is having the name be a noun as opposed to a verb. For example, when picking a name for a class that compares hotels, it should be called HotelComparison as opposed to HotelComparer since it feels more natural for a HotelComparison class to hold state like date of comparison, result of the comparison, requesting user. This helps in writing more object-oriented code.

* Always test the public interface. One of the biggest advantages is that unit tests don’t break when internal implementation changes. Also it strengthens the idea of having a public interface for a class.

* Functional/UI tests can be very useful when doing story level or big bang refactoring to make sure that the functionality is working as expected.

* When refactoring a large piece of code, change one thing at a time. When refactoring a complicated piece of Javascript code, we decided not to change the structure of HTML, so that we do not break the CSS and keep it out of the equation.

* When refactoring a large piece of code, follow a top-down refactoring approach. Make sure that you create the right abstractions first and establish the interactions between them in such a way that data lives as close as possible to the class manipulating it. Then move on to refactoring individual classes. If unit tests exist, it is easier to ignore them until you have stabilized on the classes and their methods.

Computellect

:reflection => intelligence

Tag Archives: refactoring

Soul coding

Refactoring: Levels and Lessons Learnt