Agile, Programming

Refactoring: Levels and Lessons Learnt

Refactoring is a critical factor in the successful evolution of a code base. Slow and small steps of refactoring help keep the code base healthy and clean.

I would classify refactoring into 3 levels ordered by increasing scope, and consequentially, effort.

1) class level refactoring

This is the type of refactoring typically done at the end of red-green-refactor cycle. As TDD practitioners are aware, you write a failing test for the business expectation, you fix the test and then refactor the code to make it ‘better’. Here are some examples:
a) rename methods/variables to clearly express their intent
b) extract duplicate lines of code into a method
c) memoize to avoid duplicate calls to expensive pieces of code
d) write idiomatic code. I was baffled when I first heard the word ‘idiomatic’ so let me take a minute to explain it. It means using the language constructs rather than the plain old constructs, that you find in almost every language. An example in Ruby would be: lets say you have a method that initializes an array, adds some elements to the array and returns the array. As you can imagine, these three steps would each correspond to a line in your method if written in a non-idiomatic fashion. If you were to write idiomatic code, you would use tap instead. Here is how it looks.

[].tap do |fruits|
  fruits << 'apple'
  fruits << 'banana'
end


2) story level refactoring

This is the type of refactoring, you do when the story is “functionally” complete. Here are some examples:

a) convert procedural code into object oriented code. A symptom of procedural code is, manipulating the state of an object outside its own class. Here is an example:

if(input.numeric?)
  NumberFormatter.format(input)
else if(input.date?)
  DateFormatter.format(input)
end


In this case, some code outside the input class is making a decision of which formatter to apply to a input. This clearly breaks encapsulation by exposing the input data to something outside the input class. Doing this the right way would mean, when initializing the object you create the right type of class based on its input, like a NumericInput vs DateInput, write a format method on each of those classes and simply have the client call the format method on the instance of the class. The client should not care what instance it is calling the format method on, as long as the input type knows how to format itself.

Here is how it looks:

class NumericInput
  def format
  end
end

class DateInput
  def format
  end
end


client code would be:

input.format

Believe it or not, a lot of your decision making statements will go away, if you create the right type of objects. To create the right type of objects, it is very important to identify the right abstractions for “state holders” and then push the behavior on to those abstractions(classes in object-oriented world).

b) Remove duplicate code from multiple classes and create common abstractions.

c) Watch out for performance-related refactorings like avoiding n+1 query problem, removing/merging duplicate database/service calls, caching expensive service/database calls, avoiding fetching more data than necessary from external services (example being fetching all the search results and counting them as opposed to directly getting count of results or getting database results and sorting the results as opposed to fetching sorted results), graceful handling of service/database error conditions.

3) big bang refactoring

This is the type of refactoring where you change a large part of your code, purely for technical reasons. The drivers for this type of refactoring are the exorbitant cost of changing a relatively small piece of code, performance being unacceptable, the code being too buggy and rather be rewritten than applying band-aid solutions or the code breaking in production and it is hard to tell what is going on. The effort is too big to be handled as a part of a ‘feature’ story and hence has to be done independently as one or more stories.

Big bang refactoring stories are generally hard to sell because they don’t deliver any client-facing functionality, the impact and to some extent the scope being unclear(example being refactoring Javascript code for a page to follow MVC pattern) and the fear of breaking something already working. These stories have to be drafted with a clear end goal in mind to give the developers a good idea of the scope and assess the impact. The story should enunciate the pain points being addressed as a part of acceptance criteria and the rough amount of effort required. Such stories should be more actively monitored to avoid losing focus. The stories with high level of pain and low level of effort should be prioritized first. It is possible that such stories might never get prioritized. In such a scenario, the work should be broken down into smaller chunks and handled as type 2 story-level refactoring with a related feature card.

Lessons learnt:

* Make separate unit tests for testing different intentions of a method. For an example, lets say a method returns a hash of attributes about a person object. So you would have a separate test for each item in the hash or at least a logical grouping of items in the hash. Here is an example

it "should format the name" do
  person.to_hash[:name].should == 'Todkar, Praful'
end

it "should calculate the age based on the birth date" do
  person.to_hash[:age].should == '29'
end


This way it makes it easier to understand the behavior being tested. Also if some of the logic changes or goes away it is easy to change or delete the test. Also, you could test the complicated parts more rigorously without testing everything that the method returns. You could put the test setup for the method in a before block to avoid duplication.

* Avoid unit testing of private methods of a class. Testing private methods is a smell and indicates that there is too much going on in the class being tested. The fact that you want to test the private method without going through the public interface means that you could easily create an abstraction for that code and inject it as a dependency. You could test the dependency separately and rigorously. For example, there is a method that finds the median age for a group of people stored in a database. Now as you can see, there are two separate pieces of logic here. The part where you read from the database is fairly easy to test where as testing the median age is more complicated and hence can be tested easily and thoroughly if it were its own abstraction. Also for some reason if your logic changed to find the mean age instead of the median age, your database test should remain the same.

* Try to do as much unit-level testing as possible to isolate problems during test failures. The other advantage of unit testing is that it helps reduce the combinations of integration/functional tests. Say you have to write a functional test to test a form that accepts age as one of it fields. If you have properly unit tested all the edge cases for the field accepting age as an input, then the form submission test does not have to deal with various combinations of age input. Almost always integration tests are an order of magnitude slower than unit tests. Unit tests give you faster feedback at relatively low cost.

* It is very important to have clearly defined outcomes for refactoring stories. I was on a project that had the first phase of the project dedicated to refactoring. Quickly we realized, we weren’t making much progress as it was hard to tell when a refactoring story was done. The boundary lines of the refactoring stories were blurry, hence causing confusion. I recommended that we tie refactoring effort to new feature stories, originally planned to be developed in phase 2. The approach we took was refactoring code in and around the feature we were working on. This way we had a better sense of story boundary and knew when to stop refactoring. Once the patterns were in place, it was easy to replicate those across the code base. The business also liked it as they started seeing features delivered sooner than they had anticipated.

* If you find too much mocking/stubbing of dependencies when testing a method in a class, it might be a good time to consolidate the dependencies into fewer abstractions, and delegate some of the testing to the new abstractions. Too much mocking/stubbing can be a result of a chatty interaction between multiple dependencies. It happens when you have data being passed around from one object to the other. It helps to have a owner of the data and push the behavior on to the data owner to avoid chatty interaction.

* One of the tips that I have found useful when naming classes is having the name be a noun as opposed to a verb. For example, when picking a name for a class that compares hotels, it should be called HotelComparison as opposed to HotelComparer since it feels more natural for a HotelComparison class to hold state like date of comparison, result of the comparison, requesting user. This helps in writing more object-oriented code.

* Always test the public interface. One of the biggest advantages is that unit tests don’t break when internal implementation changes. Also it strengthens the idea of having a public interface for a class.

* Functional/UI tests can be very useful when doing story level or big bang refactoring to make sure that the functionality is working as expected.

* When refactoring a large piece of code, change one thing at a time. When refactoring a complicated piece of Javascript code, we decided not to change the structure of HTML, so that we do not break the CSS and keep it out of the equation.

* When refactoring a large piece of code, follow a top-down refactoring approach. Make sure that you create the right abstractions first and establish the interactions between them in such a way that data lives as close as possible to the class manipulating it. Then move on to refactoring individual classes. If unit tests exist, it is easier to ignore them until you have stabilized on the classes and their methods.

Standard
Agile

Story estimation on Agile teams

Story estimation is an important part of the Agile process where the developers on the team get to vote on the level of effort required to implement a story. Here are some guidelines, in the form of Q&A, that I have used on my teams. These are meant for a practitioner of the technique, some body who is conversant with the basic mechanics of estimation. There are some good books on estimation which should be referred to, if you are beginner.

Q: Who should be involved in an estimation session?
A: Primarily the developers and business/product owners. If the stories are UI intensive, it is always a good idea to involve the UX owners too. People with purely managerial responsibilities should not be present, as it could put pressure on the developers to throw lower estimates.

Q: Should all developers on the team participate in the estimation session?
A: Yes, everyone should be present at estimation for a decent sized team (3-4 pairs). An anti-pattern would be to pick people, who have worked or have the most context on the story to be estimated contributing to siloing of information. Having everyone present, helps the team to realize if the knowledge is not being shared around enough. Also if people who have no context, end up working on the story, it could lead to a bad estimate. Estimation requires the developers to throw a number and hence keeps them fully engaged and leads to better understanding of the story and the application as a whole. Estimation is also a time to spread awareness about technical debt surrounding the story. It is also a quick way to get acquainted with unknown parts of the application and the system/technical landscape. Certain implementation decisions are also made/communicated at estimation time.

Q: Should stories be completely fleshed out before estimation?
A: They should be fleshed out so as to give the developers enough context to estimate the level of effort. Having product/business owners at the estimation helps to answer questions or concerns that the developers have during estimation. Better story narratives lead to better estimates and better assumptions.

Q: Should stories be time-boxed?
A: Absolutely, time boxing of stories is very important. Developers tend to discuss technical issues in detail during estimation. This is ok if the story is straight-forward but if the discussion goes on for too long it is a waste of everyone’s valuable time: developers and the business. A rough guideline would be to limit the story discussion to 8 minutes. The business/product owner starts by explaining the business case for the story and the scope. At the end of the time limit, if the developers feel they do not have enough information to estimate the story, the discussion should continue for another minute. If the story still cannot be estimated, it could be because the team does not understand the business domain well enough or there are technical concerns. If it is the earlier case, then separate workshops could be held for the developers by the business to provide an overview of the business domain. If there are technical issues, then the team should decide on 2 possible options: one, a pair should do tech analysis for the story or do technical spike and then estimate it at the next estimation session. Tech analysis is where a pair of developers attempt to answer all the outstanding questions, technical or business related. This may involve coming up with a technical solution, analyzing external dependencies like data ingestion, database schema creation, services, etc. Technical spike goes one step further and involves writing code to prove technical feasibility of the solution, when using new technologies or to understand the performance implications of the solution. It is important the pair records the outcome of the analysis or spike on the card itself, for it to be referred to, at the next estimation session. The 8 minute time limit generally works for most stories, but it could be adjusted based on the team feedback.

Q: Should every story be tech analyzed before it is estimated?
A: If you notice a trend that every story ends up being a long technical discussion or exceeding the time limit, then it makes sense to do it before hand so as to save everyone’s time.

Q: Where should estimation assumptions be captured?
A: Estimation assumptions should be explicitly noted on the story card. A change in story assumptions can potentially lead to re-estimation of the story. Story assumptions include notes about external dependencies and significant technical implementation details like system should display real-time data as opposed to cached.

Q: Should technical debt surrounding a story be factored into the estimate?
A: Yes, it should be. In fact, it is the best possible way to tackle technical debt on the project. Two arguments can be made in favor of it: the code in that area will be touched, might as well make it better. Second, it would make it easy to build on top of the code in the future. The extent of technical debt should be investigated by doing some tech analysis. The outcome of the tech analysis should be the scope of the technical debt that would be dealt with, as a part of the story. This can be a slippery slope and care should be taken to not add too much technical scope outside the realms of the story in consideration.

Q: Should testing efforts be included in the estimate of a story?
A: For run-of-the-mill stories, the estimate should not account for the testing efforts, be it manual/QA testing or automated testing done by developers. There should be a testing strategy in place and all the stories should more or less follow those patterns so that the estimates are evened out for an iteration. If new testing strategy gets introduced that negatively affects the velocity, then appropriate expectations should be set with the team and management rather than polluting the estimates. But for stories that require an exceptional amount of testing effort, it should be factored in.

Q: Should stories be ever re-estimated?
A: Stories should be re-estimated only in exceptional cases like changes in team, architecture, scope and assumptions.

Q: Should different versions of the story be estimated? Different UI look and feel? different implementations – service vs local database?
A: This is a tricky one. One one hand, it gives the business/UX a great feedback on the different options and the associated cost. But it can also be a source of confusion and a potential waste of valuable developer time. If this happens too often, then the business needs to be made aware of the time spent on the estimation, and if it is worth it. One way to tackle this could be to estimate different variants separately and then add them up to get a feel for the effort. But in the end, the story should be estimated holistically.

Q: Should bugs be estimated?
A: Bugs should not be estimated, since they are misunderstood/unfulfilled expectations whose effort have already been accounted for. They should be given the smallest possible estimate to account for the time spent analyzing the root cause and not necessarily the time to fix it.

Q: Should chores/tech debt be estimated?
A: Chore cards or Tech Debt cards should be absolutely estimated. Making them go through a rigorous estimation process helps the primary beneficiary enunciate clearly the requirements and also an idea of, if the cost is worth the effort.

Q: Should stories be broken down during estimation?
A: Yes, if it makes sense. Smaller stories are much easier to be developed and tested. There is a business tendency to lump stories together since they wish to deliver the entire chunk of work in the same iteration but the counter argument could be that since stories have been broken into pieces they could be played in parallel. If the story gets broken down, the story should not be estimated until the story is completely fleshed out.

Q: How early or late should estimation be done before a story gets played?
A: The estimation should be done typically an iteration or two before the scheduled iteration. Things change fast on Agile teams so if stories are estimated too much in advance, there is a possibility they might have to be re-estimated. Doing it just in time means people on the team have the context fresh in their heads when the story is played. Doing it before the iteration starts, gives the business an opportunity to not play a story if they feel the benefit is not worth the effort.

Q: How often should the estimation sessions be held in an iteration? Should there be ad hoc estimation sessions?
A: The estimation sessions should be held twice every iteration on specific days of the week, typically right after lunch. This sets a natural rhythm for the team and also the discipline of doing tech analysis is rigorously followed the day before or the morning of the estimation. Ad hoc estimation sessions should be avoided and should be held only under exceptional circumstances since any kind of meeting disrupts the rhythm of the team.

Q: How many stories should be estimated before an iteration begins?
A: More than what could possibly fit in an iteration. Firstly, it gives the business an opportunity to adjust the scheduling of a story based on the estimate and secondly if the team runs out of stories mid-iteration, there are buffer stories. If a story does not get played for a scheduled iteration, it could be played in a later iteration provided not much has changed in terms of team, architecture, scope and assumptions.

Q: Can a story be changed after estimation?
A: Only in exceptional cases. If the changes do no affect the estimate(approved by team or team representative), it is ok, else the story should be re-estimated.

Q: Can estimates for a story be used for a different team?
A: Estimates are contextual and always unique to a team. They cannot be enforced upon a different team. The estimates are made by a certain group of people who have certain knowledge of the application and the code base.

Standard
Ruby

Why isn’t Ruby 1.8.7 copy-on-write friendly?

You might have heard, Ruby 1.8.7 isn’t copy-on-write (COW) friendly. Lets see why.

To set some context, say you have built an application in Rails using Ruby 1.8.7 and is deployed on a UNIX system. Scaling the application means spawning multiple processes to serve client requests. Since memory cannot be shared between processes, there is whole lot of memory wasted in holding the same Rails code in multiple processes. Modern UNIX systems provide fork functionality which let a child process share memory with the parent. So essentially there would be one parent process holding the, shared by all memory, and it would spawn child processes, which will share the parent’s memory. When something is being written to the shared memory by the parent, each child process is given a copy of that memory and shared memory in the parent process is written with new data. Similarly, when the child writes to the shared memory, the child gets its own copy of that memory and writes to it with the new data. This is called copy-on-write (COW) technique.

Now lets see how the Ruby 1.8.7 garbage collector works. It is apparently simple in its implementation. It uses a mark-and-sweep algorithm to collect unused memory. What this essentially means is, it scans each and every object and marks a flag “on” the object currently being used. At the end of this cycle, it sweeps or collects all the objects that have not been marked. But the problem here is that the “mark” information is stored on the object itself, in essence making each used object “dirty” which triggers a copy-on-write by the underlying UNIX system. All the used objects held in the shared memory, will be copied to each child’s memory space. This nullifies all the memory savings that would have been possible with the copy-on-write technique. This is why Ruby 1.8.7 isn’t copy-on-write friendly.

Ruby Enterprise Edition circumvents this problem by patching the GC to store the “mark” information outside the object. For more information checkout this.

For a longer and better explanation, go here.

Standard