Using Metrics and KPI’s to Refactor and Improve Your Enterprise Software

Abstract
Refactoring exercises are sometimes “a shot in the dark” where as the depth of the problem isn’t always measured or well understood nor can the success of the initiative be measured easily. This article indicates an approach to use data collected over time to determine and prioritize software refactoring exercises to give us the biggest bang for the buck as well as to provide projections and justification of the initiative to business stakeholders. KPI’s are then established to measure the success or failure of the initiative.

 

In both greenfield and brownfield development, one of the more negotiable items is refactoring or taking on large refactoring exercises for improving the health of the software. In greenfield projects these initiatives may come up at any time in the project and typically the risk/benefit is measured against project timelines, launch dates, milestones, and feature scope. On the other hand, brownfield projects typically have been running in Production for a while, and at this stage, refactoring exercises may focus on areas which have had noticeable issues in production, or were exercises deferred until after launch for a variety of reasons. In either case, it’s sometimes hard to justify the exercise even though we think we know that it’s the right thing to do.

The best agile experts will even tell you that even in the best of Agile environments there is always refactoring to be done. The ultimate Agile purists may disagree that in their perfect world, refactoring is completely unnecessary because refactoring is done as we go. It just doesn’t work like that – that’s clever marketing though! I won’t go into detail about how software gets fragmented even when the entire team has the best of intentions, but it does. Fragmentation happens, and moving too far one way makes it difficult and risky to move another way without significant refactoring efforts which cannot always be justified at the time of development.

Refactoring can have many purposes, including, reducing complexity, improving performance, improving reliability or scalability, because it’s cool, or just needed to add new features. However, how do you justify the benefit vs risk of these initiatives? One approach is to use data by establishing data points, metrics, and KPI’s to not only justify the means, but also to validate the effort. Imagine the performance of the application, we measure the performance, we see the numbers, and we realize we are way off. We need to improve performance to meet our NFR timings, and maybe it’s a quick fix, or maybe it requires a larger refactoring exercise. The justification is in the numbers. We create a PoC to further justify the performance gain, we implement, we measure again, and voila. We have met our goal. Elementary stuff.

Quite often, our refactoring efforts may not be that cut and dry. There just may not be enough justification for teams to consider the refactoring exercise or for the stakeholders to approve it. Also, applications may have a series of different problems, maintenance headaches, and technical debt. How do we know what to prioritize, where the plumbing needs a serious make-over, or identify the biggest areas of concern? Look at the data.

Tracking the health of your software through data points on an ongoing basis can help greatly with identifying the big problems, or identifying the small problems before they become big problems. We may have an Async service that we’ve been itching to re-write, however the metrics may indicate that we have far more important complexity issues in some of our MVC controllers which have lead to a large number of functional defects and page crashes. Because of data, we can not only justify, but we can prioritize what our refactoring efforts should be. My experience is that if you want to justify something, especially to business stakeholders, it’s easier when you have data to back it up.

Good ideas of data points include collecting information from exception logs such as components, classes, methods that have the most number of exceptions, or by tapping into your ALM system weather it be TFS or other systems to query the list of defect fixes to get a list of the file check in’s related to the fix in order to determine the files which have had the most number of fixes applied. This can be further categorized or aggregated as necessary, but it’s important to understand which data points are important to you in order to get those metrics and measure the problematic areas. Another data point example could be number (or severity) of defects per functional areas. How you get that data will depend on your environment, but it’s important to be pre-emptive about this. Part of new software initiatives should always include data-point capture. Ensure your logging is capturing the data in a meaningful way, or that you are tracking adequate data in your ALM system which will allow you to determine your biggest problem areas. It is important to determine, prepare, and track your needed data points well before you need to use and analyze them.

Code complexity is another type of measurement which gives a numeric ranking to pieces of code indicating how complex the code is. There are many tools on the market which can help determine and rank your code complexity. Complexity often can map directly to stability and functional problems, and also generally increases the maintenance cost of your software. A recent larger enterprise SOA application I have been consulting with has tens of millions of lines of code and up to 100 people engaged in the project at any given time. As an exercise to determine where our biggest software problems are, we pulled together two sets of data. One set of data was data that was aggregated and compiled from our ALM functional defects as well as our crash logs to see where, historically, our biggest issues in our application were and to determine the types of issues we were seeing repeatedly. The data allowed us to see the trends. The other set of data came from complexity reports using tools which help measure the cyclomatic complexity of our software. We aligned the data and what we found was that in many of the cases, the data showed a direct correlation between code complexity and the unique number of issues and number of recurring issues we are seeing in our crash and ALM data. This gives us hard data to justify refactoring and software improvement based on code complexity as well as the actual numbers of issues and crashes we are seeing in the code.

Metrics, data points, and KPI’s, can help you determine the state of your software on an ongoing basis. Data can prove to be vital in helping meet your software’s requirements, nonfunctional requirements, feature-set, and release goals. Justification can be provided to business stakeholders using data with the promise that the results of the initiatives can also be measured and reacted upon. Let’s consider we have custom data translation with tens of thousands of lines of code where we are repeatedly opening new defects and seeing multiple crashes per day/week/whatever. We know this code is overly complex, and we also know that the trends and data we compiled are showing that we have 40 crashes here a month, and our regression or defect rate for this part of our code is 1 to 1. Meaning for every defect we close in this area, one more opens. Because we have analyzed the data, we have seen that this is one of a few areas we can target to give us the biggest bang for the buck, and we can now use all of this data to provide justification to our business stakeholders.

Why does this even have to be justified to business stakeholders? We all know we can’t always just write the code that we think will make the system better. Project team members typically have an idea of what the pain points of the system are, and know code that is overly complex or a PITA (pain in the ass). Some project teams will just say – go ahead – we have to re-write X, so let’s just start doing it – no justification and no measurements needed. That might work, and if we get it right, good. Of course, we’ll have no way to really measure it without data. We may think that it’s the right thing to do and we have the best of intentions, but we don’t always get it right. However, part of compiling data and creating KPIs for justifying our actions is not only to justify it to business stakeholders when we need to, it’s also about justifying it to ourselves – the project team.

Let’s go back to the example I cited before. We have our justification for refactoring or rewriting our data translation components. We have a new solution which will use an ORM mapping tool to replace all of our fragmented data translators. This will takes 2 months of re-development, and introduce some risk, but that risk is completely offset by the negative data trends we are seeing in this area. We are seeing a 1:1 ratio of defects in this areas as well as 40 crashes a month. Terrible, in comparison to the other components in our application.

Now let’s take this a step further. Not only do we have the data and trends to identify the rewrites that will give us the biggest bang for the buck, we have data and trends to create and measure KPIs. After analyzing a sample of our biggest issues in our data translators we see inherent problems that repeat month after month. Let’s say these inherent problems represent 80% of the 40 issues we are seeing, we know that we can reduce the number of issues in this area by up to 80% alone inherently based on the stability record of the new (and fully tested) 3rd party ORM framework we will be introducing. Let’s say the other 20% of issues are random one-off issues which may be inherent to our complex code, but some may still re-occur due to other external factors. Part of the justification will be establishing KPI’s to measure weather we met our goal or not.

After analyzing all of the data, we can establish justification and KPIs that look something like this example:

Justification and Approach

  •  We see 40 crashes a month in our data translator code
  • Defect rate of 1:1 in this area, specifically.
  • Replace with ORM XYZ component. A 3rd party Object Relational Mapping tool which will replace all of our existing data translators.
  • Cost of 1,200 development hours and an ETA of 2 months
  • As the rest of the application is seeing a defect rate of 1:0.75, these 8 defects will continue to decline at that rate. Further initiatives can reduce our defect and regression rates.
  • Milestone date will be impacted by 1 month at the benefit of a much more stable data layer and large reduction of measurable crashes per month.

Measurable KPI

  • Whereas we originally had on average 40 defects per month in this area, we will maintain an average of 8 or less defects per month in this area (further reduced as our defect rate improves) one month after the new solution is implemented.

Proposing the improvements, and backing it up with hard data, and proposing measurable KPIs helps to move these types of initiatives forward and helps get everyone on-board and moving in the same direction. It provides justification to everyone from development team members, business analysts, project managers, and other business stakeholders. Establishing and committing to measuring KPIs as related to the initiative may be scary for some as every time we measure the success of an initiative there is a chance we miscalculated or made a mistake which lead to no improvement or not as big of an improvement that we had thought. However, done right, it shows confidence to the team and business stakeholders and helps drive the team to think smartly about the goal we are trying to achieve. “We’re not just making software better, we’re not just making it cooler, we have a measurable goal and let’s ensure we all work hard to achieve it!”

Let’s say the initiative was a success. If our data was right and our projections were right, it should be! But, maybe we don’t get as far as we wanted to with an initiative. Well, we step up, we retrospect, we introspect, and we improve. We demonstrate that the initiative wasn’t as successful that we had hoped, but we demonstrate what we learned and how we will apply this next time. I’m not saying to expect failure, but failure is always inevitable at some point. Take it gracefully, and be even better next time. We’ll still be in a much better situation than if we had done it as a shot in the dark based on suspicion rather than being based on data. Otherwise, we would have had no way to know if it was successful or not, because we couldn’t measure it very well before, and we can’t easily measure it’s success as an isolated contribution to the overall big picture.

The beauty of using data is that let’s say we had an idea to replace our translators to use an ORM mapping framework or other data management framework, we may have found that the data shows this code to be surprisingly stable and there are few issues here. It may be a little more complex than we would like, but it works well, and it’s stable. The data could have shown that we absolutely didn’t need to spend two months rewriting this component and introducing new risk which wasn’t there before. The data would show us that the best bang for the buck would have been in other areas, and that there is little to no risk in keeping our existing data translation code in tact.

Aggregating data points to give us the insight we need is very valuable to help justify driving key software improvements. It is an important and often overlooked approach to making sure refactoring and improvement exercises are worthwhile, and can provide important KPIs which will provide confidence to business stakeholders and help ensure we have been successful in our initiatives. In addition to using data to measure and prioritize the potential impact of initiatives, other methods may need to be employed to reduce overall defect rates, and ensuring the team is moving in the right direction, and that overall software quality is improving.