Craftsmanship

Pragmatism hurts too…

I spent quite a bit of my early career learning pragmatism. Frankly I did not know the word until I entered the industry. ;). I got this as a feedback, quite a few times, from my colleagues and more so from clients. They would say “You need to be more pragmatic”. What that really meant is, stop being so obsessed with you software engineering “craft” (my clients didn’t even think it was craft :)). Its ok to not have some part of the code tested before it hits production. Its ok to not have automated tests for some part of the code. Its ok to not pair program on critical code. Its ok to have 2000 lines of classes, with no tests, if it “already works”. Now I get it, right. All this does make sense, in a given context.

But, pragmatism hurts too. In this context, pragmatism means implicitly consenting to bad behavior. And again this might be ok, when you are in a bit of crunch, but when it starts becoming the norm, it begins to hurt, hurt a lot. When you see it becoming a norm, you need to start becoming a zealot for code quality. Best would be to have a good mix of zealots or for positive connotation, “craftsmen or craftswomen” on the team.

So, dont forget to promote craftsmanship just as you would promote pragmatism, especially if you are the leader of your software team.

Standard
Cloud computing, Infrastructure

Cloud vendor migration

Recently we migrated our cloud infrastructure from Amazon to a different cloud vendor. I won’t get into the details of why we had to do it, but the experience of the migration itself was interesting and I want to provide some guidelines around the things you should consider, particularly around infrastructure automation, if you find yourself in a similar situation.

Going into this migration discussion, we clearly knew that Amazon was a better cloud vendor than the new one. We looked at a comparison site[1] to compare the features of the two cloud vendors. Amazon was a clear winner. This comparison gave us some pointers of where the new cloud vendor would be lacking. But rather than focusing on individual features, we decided to come up with own “requirements spec” for our infrastructure, and then see how the other cloud vendor fares. We knew we would have to make some compromises, but most importantly, we understood where we would not make compromises at any cost.

Our application is a fairly straight-forward Rails app backed by a Postgres database and Memcache, hosted on Amazon virtual machines. We use a lot of Amazon services like S3 (storage), ELB (Load balancer), Route 53 (DNS), SES (Email server), et al.

One of the big things we were concerned about from the get go, was the “ease of automation” to setup our infrastructure with the new cloud vendor. Our existing infrastructure setup is automated to a large extent using Puppet. Our infrastructure setup falls into three steps: Provisioning, Configuration & Deployment. I will explain these in a bit. These steps have different degrees of tolerance for automation and we decided early on which of these should be “should-be-automated” versus “must-be-automated”. Lets talk about the steps:

1) Provisioning
This is the first step in infrastructure setup which involves creating virtual machines or in technical parlance “provisioning” them. Once you have provisioned an instance, you get a virtual machine with base OS installed on it and an IP address and credentials to access it. For our new cloud provider, this would be a manual step whereas Amazon lets you automate this piece very nicely, if you use the AWS API. We thought this falls under the “should-be-automated” because we did not see ourselves spinning up new machines frequently. Surely we were giving up the capability of “auto-scaling” our infrastructure, but we were ok with it. The way auto-scaling works is that, Amazon is going to monitor the load on your machines and automatically create new machines to handle extra load. It is actually a pretty cool feature, but we thought we did not need it, not at least in the near term.

2) Configuration
This is the step where a raw virtual machine is transformed into something that it is meant to be. So for example, a virtual machine that is supposed to be the database server would have the database server software and all the other pieces it needs installed on it. This part is probably the most complicated or rather time-consuming to set up, because it involves configuring a virtual machine to be either a application server, web server, database server, cache server, load balancer, email server, router, et al. We did not automate all of it to start with, like the email server, router, et al because they are pretty much one-time setup activities and we did not find it worth our time. So this step falls somewhere in between “should-be-automated” and “must-be-automated”. As I explained before some of the things like email server, router are one-time setup activities and we would be ok with not automating them. But for things like web server, database server, these fall under “must-be-automated” category because we set them up and tear them down frequently, not just in production but in all the downstream environments like staging, integration, development. The other advantage is, if we were to bring up new servers (web or database) in response to scaling or outage situation, we should be able to do it fairly quickly and most importantly an exact replica of what we had before.

3) Deployment
The last step in the process of infrastructure setup is application deployment which falls under the “must-be-automated” category. Deployment means every time we make a change to our code base, an automated process would build the code, run the tests, and deploy it to all the different machines like web server, application server, database server, et al. Having this step automated is the cornerstone of continuous delivery, which is something we highly value. Continuous delivery means being able to deploy changes to an environment quickly and with least manual intervention. This gives us the ability to make rapid changes to production environment, get feedback quickly from users and make changes accordingly. Luckily for us, this step with our new vendor was going to be fully automated, else that would have been a showstopper.

The other things that we considered when moving to the new cloud vendor were:

  • How do we migrate data to the new cloud infrastructure?
  • What are the data backup solutions available with the new cloud provider?
  • Is the new cloud PCI compliant?
  • What are the SLAs (Service Level Agreement) for the new cloud? What are the escalation routes? Who will the development team have access to when emergency arises?
  • Does the new cloud use OpenStack?
  • Does it provide services like email server, load balancer, router, et al or do we have to build these ourselves?
  • Does it support encrypted backups?
  • What kind of file storage does it provide? Does it provide streaming capability for say video, audio, et al?
  • Does it provide identity and access management solution?
  • What kind of monitoring solutions does the cloud vendor provide?

[1] http://cloud-computing.findthebest.com

Standard
Agile, Programming

Tasking

I have been following this seemingly innocuous practice of “tasking” when programming on a story, which I find very useful and I recommend you try it. Here are the what, when and whys of tasking.

What: Breaking the story into small tasks that could be worked on sequentially.

When: Before writing the first line of code for the implementation of the story.

Why:
Understanding: By breaking the story into smaller chunks, it gives a good grasp on the entire scope of the story. It helps define the boundary of the story and gives a good idea of what is in scope and out of scope.
Completeness: Tasking makes you think of the edge case scenarios and is a conversation starter for missed business requirements or even cross functional requirements like security, performance, et al.
Estimation: Doing tasking right in the beginning of the story gives a sense of the story size. On my current project we are following the Kanban process. Ideally we would like to have “right-sized” stories, not too-big not too-small. Tasking helps me decide if the story is too big and if it is, then how could it be possibly split into smaller ones.
Orientation: This has been a big thing for me. I feel I go much faster when I have a sense of direction. I like to know what is coming next and then just keep knocking off those items one by one.
Talking point: If you have 1 task per sticky which I recommend, it serves as a good talking point for say business tasks vs refactoring/tech tasks, prioritizing the tasks, et al.
Pair switch: If you are doing pair programming, like we do, then you could be switching pairs mid way through the story. Having a task list helps retain the original thought process when pairs switch. Stickies are transferable and they travel with you if you change locations.
Small commits: Another big one. Each task should roughly correspond to a commit. Each task completion should beg the question “Should we commit now?”. If you are doing it sooner, even better.
Minimize distration: There is a tendency as a developer to fix things outside the scope of the story like refactoring or styling changes to adjacent pieces of code. If you find yourself in that situation, just create a new task for it and play it last.

Thanks for reading and feel free to share your experiences.

Standard
Design, Programming

Simplicity via Redundancy

Typically when we do a time versus space tradeoff analysis for solving a computational problem, we have to give up one for the sake of the other. By definition, you trade reduced memory usage for slower execution or vice versa, faster execution for increased memory usage. Caching responses from a web server is a good example of space traded for time efficiency. You can store the response of a web request in a cache store hoping that you could reuse it, if the same request needs to be served again. On the other hand, if you wanted to be space efficient, you would not cache anything and perform the necessary computations on every request.

When I am having a discussion with my colleagues, this time versus space tradeoff analysis comes up quite frequently ending in a time efficient solution or a space efficient solution based on the problem requirements. Recently I got into a discussion which gave me a new way of thinking about the problem. How about if we also include simplicity of the solution as another dimension which directly contributes to development efficiency. In an industrial setting, simplicity is just as important or the lack thereof costs money just as time or space inefficient solutions would.

So the recent discussion I had was about a problem that involved some serious computation at the end of a workflow that spanned multiple web requests in a web application. This computation would have been considerably simpler, if we stored some extra state on a model in the middle of the workflow. It started to turn into a classic time versus space efficiency debate. Time was not an issue in this case and hence giving up that extra space was considered unnecessary. But I was arguing for the sake of simplicity. Quite understandably, there was some resistance to this approach. The main argument was “We dont have to store that intermediate result, then why are we storing it”. I can understand that. If there is no need, then why do it. But if it makes the solution simple, why not?

I admit there might be certain pitfalls in storing redundant information in the database, because it is not classic normalized data. Also, there could be inconsistencies in the data, if one piece changes and the other piece is not updated along with it. Also it might be a little weird storing random pieces of information on a domain. Luckily in my favor, none of these were true. The data made sense on the model and could be made immutable as it had no reason to change once set and hence guaranteeing data consistency.

Extending this principle to code, I sometimes prefer redundancy over DRYness (Don’t Repeat Yourself), if it buys me simplicity. Some developers obsess over DRYness and in the process make the code harder to read. They would put unrelated code in some shared location just to avoid duplication of code. Trying to reduce the lines of code, by generating methods on the fly that are similar to each other, using meta-programming can take it to a whole new level. It might certainly be justified in some cases but I am not sure how many developers think about the added complexity brought on by this approach.

A good way to think about reusability is thinking about common reason for change. If two classes have a similar looking method, and you want to create a reusable method and have it share between the two, then you should also think about if those two classes have a common reason for that method to change. If yes, then very clearly that method should be shared, if not, may be not so much, especially if it makes the code hard to understand.

I feel redundancy has value sometimes and simplicity should get precedence over eliminating redundancy. Thank you for reading. Comments welcome.

Standard
architecture, Programming, Scala

To graph or not to graph?

I have been recently working on the Credit Union Findr application and I think it is a pretty interesting problem to solve. To give you some idea about the functional requirements, the application intends to help people find credit unions they are “eligible” for. If you don’t know what a credit union is, it is a non-profit financial institution that is owned and operated entirely by its members. They provide a variety of financial services, including lending and saving money at a much better rate than regular banks. Credit unions can only accept members that satisfy certain eligibility criteria. The eligibility criteria are based on where you live, work, worship or volunteer. They are also based on who you work for, your religious, professional affiliations, et al.

So on the face of it, it seems like a simple enough problem. Take the eligibility criteria for credit unions, take the user’s personal details and then match the two. Sounds simple, right? Not so much. When we started building the application, there were some interesting challenges on the domain side (non-software) as well as on the implementation side (the software bits). Although the focus of this blog is to talk about the software side of the application, I will quickly run through the domain challenges.

First of all, gathering the credit union eligibility data is a monumental task. Thankfully, the product owner team, Alternate Banking System (ABS) handled that for us. Second, we were not sure which questions to ask the user. Because we could ask them everything under the sun and still not have any qualifying credit unions as a result, which is rare, but still a possibility. Moreover there were concerns around how much personal information would the user be comfortable sharing with us. All these challenges were tackled by our UX (user experience) team when building the prototype and they made the suggestion that we should treat this application like a dating site. You ask the user a minimum set of relevant personal questions, ask them what they were looking for in a credit union and then match the two. The analogy made it much simpler to visualize the system.

The match making analogy stuck with us developers. But the big question from an implementation stand point was how do we model the data. Max Lincoln, one of the brilliant engineers on our team, made a suggestion that we use a graph database. Frankly at first I found that a bit odd. I was thinking more along the lines of a document store. You throw credit union eligibility criteria on a document, user details on another document and match the two. Graph database. Really? But any way what do we do when we have two approaches to solve a problem. Well, we spike them out! Both the spikes were quick and revealed the merits and demerits of each solution. Mustafa Sezgin led the document store spike and used ElasticSearch. As I thought more and more about the problem, I became convinced that graph solution was the way to go. Let me explain how.

So lets start off by an example to see how we could model this problem as a graph. Lets say, you have a credit union that accepts people working in Manhattan. And you have a user who works in Manhattan. Now on a graph, you would have a node for the credit union which has a relationship called “acceptsWorkingIn”, represented by an edge, to the node Manhattan. And on the user side, you would have a user node, that has a relationship called “worksIn” to the same Manhattan node that also connects to the credit union. Now lets say you had another credit union that accepted people working in Manhattan. Then you would have a new credit union node to represent it and with a relationship “acceptsWorkingIn” to the same Manhattan node. Now to find the credit unions that the user is eligible for, you would simply traverse the graph starting from the user node until you find all leaf credit union nodes.

One of the reasons why the graph solution really shines is “hotspot questions”. Let me first explain what a hotspot is. So in our example, a hotspot is a node to which you have a lot of credit union nodes attached to. So in our previous example, if we keep adding all the credit union eligibility data as far as “acceptsWorkingIn” relationship is concerned, and you find out that 90% of the credit union nodes are attached to Manhattan node, then Manhattan node clearly becomes a “hotspot”. Remember what I mentioned earlier, that we had a difficulty coming up with which questions should we be asking the user. Hotspot questions! The questions that lead us to a hotspot quickly are hotspot questions. So a hotspot question in this example would be “Do you work in Manhattan?”. The UX folks were delighted because if we could ask a hotspot question as our first question and get a positive response, the user becomes eligible for 90% of the credit unions right away. More results quickly equals big win for usability! Now there is another side to this hotspot questioning. Lets say all credit unions accept people who live on the Ellis Island, and what if nobody lives on Ellis Island, you would get a negative response for all the users. Doesn’t make it a great question, does it? Now lets say we had a healthy database of users and you analyze the data from the user side and you find out that a large percentage of users work for NYC Metropolitan government and end up qualifying for some credit union. So you can find the user an eligible credit union if they also happen to work for NYC Metropolitan government. So the hotspot question would be “Do you work for NYC Metropolitan government?”.

Other reason why I liked the graph model is because a graph model is flexible and easy to extend. Lets say you have a credit union that accepts folks who work for NYC Metropolitan government and you have a user who works for NYC Transit. Now in a graph model, you could easily have a connection between NYC Metropolitan government and NYC Transit via the relationship “belongsTo”. This way, over time as you gather more information, you could have multiple levels of relationship between the user and the credit union. The technical implementation remains exactly the same. All you have to do is traverse the graph from the user node to the leaf credit union nodes.

What I have realized is that, what would be a value in a document store, is a first-class entity in a graph model. So for example, in a document store you have an attribute “worksIn” on a user document with the value Manhattan. On a graph database the value Manhattan gets its own node which could have its own properties, could have its own relationships to any other node in the system. More so, there is one unique node to represent Manhattan, thus giving rise to the concept of hotspot which has been immensely valuable in our case.

Any way we have just started building this application and I am really excited about it. For the technology stack we are using Scala, Play! and Neo4J and will be deploying to Heroku.

Thanks for reading and comments welcome!

Standard
Design, User Experience

Design Studio

My dream job would be a place where I get to play with, interact with, and build physical objects. It is incredibly satisfying to hold something in your hand and say I built this. It gives you that instant connection with the fruit of your labor. Now given the business that I am in, of building software (which I love by the way), there are no physical objects in a software system. The closest we get, is drawing some diagrams to represent the components in a software system or show how data flows between those components. Physical objects are great not only from a gratification point of view, but they also provide valuable feedback on the design and usability of the system.

Enter the idea of design studio for software. The idea is to have entities in a software system represented by physical objects, that you can touch and feel. When you interact with this physical world of software, you would get a much better feel for the system you are building and it would accelerate your learning about the system under construction. Also it levels the playing field between the designer of the software system and the innocent user. Let me explain how.

Lets say you are building a website that comprises of multiple screens. In this design studio, these screens would be represented by wooden blocks that would be placed on the floor, large enough for a person to stand on it. A user navigating these screens in a real world would be represented by a person jumping from one wooden block to the other. In addition to moving around on these blocks, the person would have to carry weights on her/him. These weights would be representative of the “effort” that the user has to put in, to use the website. Example of such “effort” would be the presence of too many UI (User Interface) elements on the screen, like buttons, drop-downs, et al (more UI elements, more effort required on part of the user to understand what they do), waiting time spent on the screen, information user has to hold across screens (did I click that checkbox 2 screens ago or am I shipping this to the right address on the checkout page with no shipping address on it), etc. All these things negatively affect the user experience on the website and hence, in a physical world, weights would be added as and when the user is subjected to such “effort”, when jumping from block to block.

So the first justification for building such a design studio would be to reduce the user interface complexity of a system. When professionals design a website, they often do not realize what a user really goes through when navigating a website. This is partly because websites are built by professionals for people who have little or no experience in working with a website. Imagine if you were to click a couple of extra links to get somewhere on a website, you might not think of it as a big deal. But if you were to hop around a few extra blocks carrying all those weights, then it seems a bit too much. When the designer is feeling the pain (literally) the user goes through on a real website, in a sense we are leveling the playing field between the designer and the user.

For usability testing, you could bring in some real users and have them walk through this physical world of the software system. You could introduce them to this “game”, hand them a task and see how they perform. You could have the person, leave a breadcrumb, when she/he jumps over the wooden blocks. This way you could study the paths they take through the system. With the help of the physical trace, you might notice that people, when performing apparently simple tasks, jump through a lot of “hoops” (in our case blocks), when it should have been rather obvious to begin with. This could be because of bad layout or just incorrect assumptions about user behavior. The solution could be as simple as placing related blocks adjacent to each other or providing appropriate directions to the user on the blocks. This would translate to providing a better workflow to the user in the software system and placing related screens closer to each other and linking them in a easy-to-find way.

The other justification for the design studio would be to get feedback on the system architecture. Same situation, you were building a website with some screens. Lets say each screen was backed by a combination of databases and services. Now as per the previous analogy, each screen would be represented by a wooden block and each external service/database would be represented by a physical box. There would be ropes connected from a screen to all the external systems that it talks to. What do I get out of this?

First of all you get the system view of things. You can easily tell if a screen is backed by too many external connections and infer that the screen might be slow to respond to user input. May be you should think of consolidating the external system end points or move some of the functionality to a different screen. Going back to the earlier example, you would add additional weights proportional to the connections the screen makes, to account for the latency introduced by the system integration, when you jumped on the wooden block representing that screen. Typically this kind of system integration information is left out of initial prototyping and becomes cumbersome to incorporate later into the user experience. User experience is not just about how the software looks but also how it behaves. Integration points can have a huge bearing on the design of the user experience and it is better to identify them sooner than later.

One other thing that I would like to throw in there is that the lengths of the ropes connecting the screens to the external systems should be proportional to the integration effort. Say a database is quick and easy to work with, so it gets a shorter rope, whereas an external uncontrolled service gets a relatively longer rope. A crazy idea would be, just by measuring the lengths of these ropes, you should be able to put some estimates around the actual integration effort. Yaa, how about putting some science into estimation!

All in all, it feels that this experiment might be worthwhile to give it a shot. I hope I can try this out soon enough on a real project. Thanks for reading and let me know how you feel about this idea.

Standard
Design

Design Thinking

I just finished reading Donald Norman’s ‘The Design Of Everyday Things’ and here are my thoughts on it. The author talks about a lot of enlightening concepts and I will summarize them as I understand them. There is plenty of other interesting stuff in the book that I have not covered here, including great pictures, and I would strongly urge you to read the book.

Natural mapping: The author says that when designing systems, you should make use of natural mapping. What does it mean? For example, lets say you were designing the switch board for electrical appliances in a room. One way and the common way to do it would be, to place all switches in one straight line. Now, this means, that the user has to remember, which switch does what. If you use natural mapping, you would design the switch board in such a way that the switch position on the board mirrors or maps the physical location of the appliance it operates in the room. This way it is very intuitive, which switch goes to which appliance in the room. Another great example is the gas burner switches, that I struggle with almost every day. I can never remember, which switch is for which burner. The author suggests some beautiful designs to map the actual burner position to its switch. Doors are a source of constant pain when operating. Wouldn’t it be nice to have a flat plate on the side of the door that needs to be pushed to be opened and a vertical bar for ones that need to be pulled. Why is it so hard? Any way, another nice example of natural mapping, that I have encountered on a Mac is, you see the Caps lock light located on the button itself. Sweet, no more searching!

Knowledge in the head versus knowledge in the world: This concept is complementary to the natural mapping one. If you make use of natural mapping, like having a switch board that mirrors the physical layout of the room, it reduces the burden on the person to hold the knowledge in the head of what switch does what. On the contrary, you are putting that information out in the world! A great benefit of this is reduced learning curve for everybody, when dealing with new switches.

Visibility: When designing systems, sometimes people get too caught with aesthetics and don’t care much about usability. You might have come across numerous faucets or doors that look awesome but are not very friendly to operate. Every time I see one of these new faucets, I spend at least a minute or two trying to figure out how to make it work. I once rented a car for which I could not figure out for a long time how to lower the windows. I approached a toll booth and I actually had to get off of the car to pay the toll. Funny! Later I figured out there was a small knob above the radio to lower the windows. Who thought it would be a good idea to put the window opener on the dashboard. Another funny incident that happened to me on the British Rail train, which I think the author mentions too in the book. The train stopped at my destination station and I was waiting patiently for the doors to open automatically, as there was no handle on the door to open it. Somebody walked from behind me, lowered the window on the door, put his hand out and turned the handle on the outside to open the door. Holy cow! Had I not known about that, I would have easily missed my stop. Coming to the point of visibility, the author argues it is extremely important to provide visual clues, to help the user, to operate the system. A long hanging chain with a handle at the bottom provides a visual clue that the device can be operated by pulling it. Sure aesthetics are important, but usability comes foremost. And by the way, I don’t think the British Rail train door was designed that way for aesthetic purpose. 😉

While I was reading this book, I had my own moment of slight ingenuity. I was thinking about how I have had a hard time opening cabinet doors that have a flat plate on the corner that needs to be pushed and then some spring is released and the door pops out. I always try to put my finger in the small gap between the door and the cabinet and pull it out. I thought it might be nice if you could make a small depression on the plate, big enough the size of a finger, to indicate that the plate needs to be pushed to open the door. 🙂

Feedback: The author talks about how important it is for the system to give the user feedback, for the action they have performed, to assure that they have completed the task successfully. You do not want people sitting there thinking the machine is doing something when it is doing nothing or even vice versa. Feedback can be provided by sound (buzz when the microwave door opens), lights (caps lock is on), tactile (pressing a button on a telephone), et al. These are examples of feedback provided after an action has been performed. One of the examples that the author gives, is for providing feedback, even before an action is performed. This is about designing aircraft switches for landing gear and wing control, which I thought was ingenious given the risk involved in flying an airplane. He suggests that we create the landing gear switch shaped like a tire, whereas the switch that operates the wings shaped like a flat plate. When the pilot is working with these switches, the tactile feedback of touching a flat plate versus a round knob would have a much bigger chance of reducing error as opposed to having both the switches feel the same, especially in pressure situations.

Constraints and forcing functions: Constraints can be used very effectively to better design products. For example, keys for locks are designed in such a way that it goes in only one way, which is a constraint that is built into the system to avoid making errors. Computer programs do this well these days with disabling certain options on the screen when they are not applicable to the user. Washing machines do not let you open the door when the system is running to avoid possible mishap. The author talks about other types of constraints like cultural constraints. An example of this would be when creating a LEGO model of motorcycle driver, the user knows to build the model with the motorcycle driver facing forward because it is the only possible way the driver can drive safely. The author issues a word of caution in using constraints because sometimes they can be a source of annoyance. You would know what I am talking about, if you have been in a situation where you are carrying stuff in your car and you place a heavy bag on the passengers seat and the seat belt buzzer goes off when you start driving the car. It would be dangerous if the driver permanently disabled the passenger seat belt warning. What I have seen happen is that the seat belt warning beeps for a while and then displays a warning on the dashboard to indicate the non-compliance, which is not so annoying.

So what do I think of the book? I think it is great. It is very insightful. I got a little lost in the middle where it starts to talk a lot about psychology but there was always something that piqued my interest. The other thing I liked about the book is the book itself. Its nice and concise with 200 odd pages. I think books with about that size are perfect. They are easy to carry around, not too long yet enough to give a good understanding of about anything in the universe.

When I first started reading the book, I thought all these issues were minor, trivial or first-world problems1. Then as I started thinking more about it, I realized how important the problem is of fixing the usability of everyday things like train doors, aircraft switches, car windows, gas burner switches, et al. The problem only gets worse when you think of operating these devices in a panic situation. We have been producing fantastic devices until now but not paying a lot of attention to usability and I think that should be one of our top priorities going forward.

[1] First-world problem

Standard
Performance, Programming, Ruby

Ruby is not all that slow

On one of my Ruby on Rails projects, I worked on a story that involved performing calculations on some reference data stored in a table in a database. The table was like a dimension table (in data warehousing parlance) and was non-trivial in size, say about 100 columns and 1000 rows. We had to calculate, something similar to a median of a column, for all the rows, for all 100 columns. And the challenge was to perform this calculation at request time, without being too much of a performance overhead.

My first response to any such calculation intensive activity is to push it offline, to avoid the runtime calculation penalty. But in this case, the calculation involved inputs from the user and hence did not lend itself very well to pre-processing. We could have done something that a Oracle cube does, where it builds aggregates of fact data before hand, to expedite the processing time required. But in our case, pre-processing the results for all the possible user input options would have been extremely wasteful in terms of storage. The other concern with this approach would be that, there would be some time delay between the data being available and being usable, due to the pre-processing.

Then came the suggestion of manipulating the data in stored procedure. It made me shudder! Having recently read Pramod Sadalage’s blog1 on pain points of stored procedures, I certainly was not in a mood to accept this solution. The pain points outlined in the blog that particularly appealed to me were, no modern IDE’s to support refactoring, requiring a database to compile a stored procedure, immaturity of the unit testing frameworks, and vertical scaling being the only option to scale a database engine.

As we all know, Ruby is perceived to be slow and incompetent to perform any computationally intensive tasks, and consequentially was not an option on the table. I thought to myself, Ruby can’t be that slow. The calculation we were doing was non-trivial but not super involved either. I decided to give it a shot.

We started by doing all these calculations using ActiveRecord objects and found that the performance was not good at all. ActiveRecord was the culprit because it was creating all these objects in memory and considerably slowing down the calculation process. We ditched it and opted for straight SQL instead and storing the results in arrays and performing the calculations on those arrays. Better! But not good enough. We found that we were performing operations like finding max, min, order by for a given column values in Ruby and which wasn’t particularly performant. We delegated those to the database engine, since they are typically good at such things and saw quite a hefty performance gain. By doing these simple tricks, we could pretty much get the performance that we were looking for.

Even though we had solved the performance problem, we had an unintended side-effect of our design. Given that we were processing data in arrays and outside of the objects where they were fetched from, the code looked very procedural. To solve this problem, we created some meaningful abstractions to hold the data and operate on it. These weren’t at the same granularity as the ActiveRecord would have created in the first place, but at a much higher level. This way it was a good compromise between, having too many objects and procedural code on the other hand, yet getting the performance we desired.

The biggest win for me in doing all these calculations in Ruby, was keeping the business logic in one place, in the app, and unit testing exhaustively the calculation logic.

I guess none of this is revolutionary in any way, but I guess next time I face a similar situation, I would have the conviction that, Ruby is not all that slow.

1 With so much pain, why are stored procedures used so much

Standard
architecture, Integration, Programming

Integration-first development

Yay, yay, I know its another one of those X-first development, but this one has been a bit of a revelation to me. I have been convinced of its value, so late in my career, when it should have been obvious day one. Its not that I did not know about it, but I never realized its paramount importance or conversely, the havoc it can cause, if not adhered to. Oh well, its never too late to do the right thing :). Now that I talked so much about it, let me say a few words about what it means to me. It means, when you are starting a new piece of work, always start development at your integration boundaries. Yes, it is that simple and yet was so elusive, at least to me until now.

We have had numerous heart burns in the past over, not being able to successfully integrate with external systems, leading to delays and frustrations. Examples of integration points in a typical software system could be web services, databases, dropping files via FTP into obscure locations, GUI. Number one reason for our failure to integrate, has been our lack of ability to fully understand, firstly, the complexity of the data being transferred across the boundary and secondly, the medium of transfer itself. We think we understand these 2 things and then proceed with the development with mostly the sunny-day scenario in mind. We happily work on the story development until we hit this integration boundary and then realize, damn, why is this thing working differently than we expected. Not completely different from what we thought, but different enough to warrant a redesign of our system and/or having to renegotiate some integration contract. By then we have spent a lot of time working on things which are very much under our control and could have been done later.

Now what are the possibilities that we could not have potentially thought about in the first place? Let me start with a real-world example that I have come across. We were integrating with a third party vendor that delivered us some data via XML files. We had to ingest these files, translate them to HTML and display on the screen. Easy enough? We thought so too. We got a sample file from them, made sure we were parsing the XML using the schema provided and translating it into HTML that made sense for us. We were pretty good at exception handling and making sure that we did not ingest bad data. The problem started when we were handed a file of over 10 MB of data. Parsing such a huge file at request time and displaying the data on the screen in a reasonable amount of time was impossible. At this point we were forced into rethinking our initial “just-in-time” parsing strategy. We moved a lot of processing offline, as in, pre-process all the XML in to HTML and store it in the database. This alone did not solve the problem, for it was still not possible to render the raw HTML on the screen in a reasonable amount of time, since the HTML had to be further processed to be displayed correctly with all the styling. Obviously the solution was to cache the rendered HTML in memcached. We hit another road block with memcached, being unable to store items greater than 1 MB in size. Surely, we could have used gzip memcached store or some other caching library but that wouldn’t have solved our problem either. We chose to create another cache store in the database. After doing all this, some of the pages were still taking too long to load because of the sheer size and as a result, the business had to compromise on not showing those pages as links at all. But this meant that we had to know the page sizes before hand, to determine where to show links and where not to. All this led to a lot of back and forth, which is fine, but had we known about this size issue earlier, it would have saved us a lot of cycles. By the way, we could have incrementally loaded the page using Ajax but it was too late and wasn’t trivial for our use case. We could have known about the size issue if we had integrated first and worked with a realistic data set rather than a sample file.

So as per the example above, our integration point dictated our architecture in a huge way and also, more importantly, business functionality which is another reason to practice integration-first development. In an ideal world, you would like to insulate yourself from all the idiosyncrasies of your integration points but it rarely happens in a real world.

Other issues around integration points that have influenced our design are, web services going down often, and hence having to code defensively by caching the previous response from the service and setting the appropriate expiration time on it based on the time sensitivity of the data. In some cases, you might want to raise an alarm if the service goes down. If you know that it happens too often, then you might want to have it raised less frequently and be more intelligent about it.

There have been cases, when we have had no test environment to test a service and hence having to test with the production version of the service, which could be potentially dangerous. We had to do it in such a way, so as to not expose our test data in production. This meant our response had to change based on what environment they were being sent from. Certain services charged us a fee per request hence we had to reduce the number of service calls to be more economical.

Some vendors offered data via database replication and occasionally sent us bad data. In this case, we had a blue-green database setup where we had two identical databases, blue and green. The production would point to one of them, say it was pointing to blue. We would load the new data in the green database, run some sanity tests on the new data and only when the tests pass, point production to green. We would then update blue to keep it in sync.

Some of the vendors offered data via flat files. Sometimes they would not send us files on time or completely skip them. We updated our UI to reflect the latest updated time so as to be transparent. In addition to the file size episode I mentioned above, we had another problem with large file sizes. Our process would blow up when we processed large amounts of data. It was because we were saving large amounts of data to the database in one go, which wasn’t apparent initially.

Performance of external systems was an issue and obviously affected the performance of our app. We had to stress/load test the external system before hand and build the right checks into the system if the system would not function as expected.

We have tried to mitigate these integration risks in a number of ways. We implement a technical spike before integrating with an external system, to understand its interactions or even feasibility in some cases. When integrating with data focused stories, what I have found to be most useful is, as a first step, striving to get all the (minimum releasable) dataset into the system, without even worrying slightly about UI* jazz. Don’t bother the developers with UI since it can be distracting. Have the raw data displayed on the screen and then “progressively enhance” it by adding sorting, searching, pagination, etc. This is popular concept used in UI view rendering where you load the most significant bits of the page first and then add layers of UI (javascript, CSS) to spruce it up so that the user can get maximum value first.

On a recent refactoring exercise, we moved a tiny sliver to our application to a new architecture to test out fully the integration points and have a working model first. After that it was all about migrating other functionality to the new architecture.

There are so many things that could possibly go wrong when integrating and it is always a good idea to iron out the good and bad scenarios, before you start working on anything that you know you have full control over. I am not saying you could predict all these cases before hand, but if you start integrating first you have a much better chance of coming up with a better design and delivering the functionality in a reasonable time.

I would like to leave with you two thoughts: integrate-first and test with real data.

* UI can have exacting requirements too, in which case it should be looked at along with data requirements. Some UI requirements that could influence your architecture are loading data in certain amount of time or in a certain way, like all data should be available on page load.

Standard
architecture, Javascript, Programming

MVC: desktop application vs web application

Here is a post, to compare and contrast the two styles of MVC I have worked with: Web application MVC and desktop application MVC. As I understand, the desktop application MVC came first and then we tried to fit that idea to Web applications.

Lets start with a quick example of the two application types: desktop and web. An example of the web application would be the ubiquitous shopping website where the user interacts with the website via a browser. An example of desktop application would be a thick client for trading stocks or even a rich client-side UI interaction in a web application using Javascript. When contrasting between web application MVC and desktop application MVC, I would be considering purely the HTTP/network communication aspect in a web application for MVC, devoid of the Javascript, Ajax or client-side templating.

To talk a bit about the architecture, the basic components of the two styles are the same: Model, View, Controller and hence the name. Sparing the details of the pattern itself, a subtle yet important difference between the two styles of MVC is that, for a desktop application all 3 components live in the same memory space on the same machine and this has some significant implications which we will talk about later. For a web application though, the controller and model live in the server memory space but the view partly lives on the server and partly on the client. The view is built on the server but interacted with on the client (through the browser).

Coming to the control flow, in a desktop application, the user interacts with the view to generate events. These events will be intercepted by a controller action which uses the model to update/retrieve data. There are multiple ways in which the view will be updated to reflect the model state change. Either the controller will directly update the view, or by employing the observer pattern. In the observer pattern, all the components interested in the model change, will register with the model. When the model changes, it informs all the observers of the change. This is the interesting bit of communication that you do not get to see in a web application. Since all 3 components, M,V and C, are objects in the same memory space, the communication between them is richer, hence a model can notify all the interested models/views about its changes. Another interesting pattern of communication is, the direct communication between the view and model on a user event. Given that the controller has bound the correct model action to a view event as a callback, the view can directly invoke the model action on the trigger of the event. Lets put this in perspective with an example. In the desktop trading application, lets say the user has the ability to change the trading currency. This currency is being used for transactions in multiple widgets/views on the same page. In MVC land, this currency change event is tied to some update action on the trading currency model. When the user changes the currency, the model is updated directly. The model then notifies all the registered observers (predominantly models) about the currency change who then subsequently update their respective views. This communication seems very natural in a desktop style MVC.

Lets look at the web application control flow. The user interacts with the view via the browser. The view is built on the server with 2 pieces of information: one, the actual view code and second, a mapping of user event to a controller action. Each user action is converted into an HTTP request by the browser. On the server side, the web application framework invokes the controller action. The controller action uses the model to retrieve/update data, builds the next view and sends it back to the browser as a HTTP response. The browser renders the view code and then the user is free to interact with the view again. In this style of communication, all the communication between the view and model has to be channeled through the controller. Going back to our example of updating trading currency, in a web application, updating the currency would mean having a currency update controller action that updates the necessary models and then builds the entire page with updates to all the necessary views and retaining the unchanged views. This seems like inelegant approach as opposed to the desktop style of MVC.

The web application seems fit for one-page-one-user-event model where you put all the information on the page, post it to the server and get back some results. It makes the web application single-tasked and slow to respond to user events. But life is rarely simple enough to warrant a single threaded communication, especially in the world of fancy UI interactions. True to the saying, “a layer of indirection can solve every problem in computer science”, it seems Javascript can solve some of these problems. It provides the rich user interaction to a web application and fetch and update selective parts of the view using Ajax and client-side templating. It is still a pull mechanism where the Javascript is pulling all the necessary information and updating the relevant bits as opposed to a desktop application where you could update the model and then it publishes its changes to interested components who update themselves as they see fit.

Standard