Refactoring and Performance

In the first portion of the book Refactoring, Martin Fowler presents an example where he whittles a verbose method into an improved design. Here’s the original, more or less:

   public String statement() {
      double totalAmount = 0;
      int frequentRenterPoints = 0;
      Enumeration rentals = this.rentals.elements();
      String result = "Rental Record for " + getName() + "\n";

      while (rentals.hasMoreElements()) {
         double thisAmount = 0;
         Rental each = (Rental)rentals.nextElement();
         switch (each.getMovie().getPriceCode()) {
            case Movie.REGULAR:
               thisAmount += 2;
               if (each.getDaysRented() > 2)
                  thisAmount += (each.getDaysRented() - 2) * 1.5;
               break;
            case Movie.NEW_RELEASE:
               thisAmount += each.getDaysRented() * 3;
               break;
            case Movie.CHILDRENS:
               thisAmount += 1.5;
               if (each.getDaysRented() > 3)
                  thisAmount += (each.getDaysRented() - 3) * 1.5;
               break;
         }

         frequentRenterPoints++;

         if (each.getMovie().getPriceCode() == Movie.NEW_RELEASE && 
             each.getDaysRented() > 1)
            frequentRenterPoints++;

         result += "t" + each.getMovie().getTitle() + 
                   "t" + String.valueOf(thisAmount) + "\n";
         totalAmount += thisAmount;
      }
   }

And here’s the (relevant portion of the) factored code. I think this is as far as Fowler takes the example:

   public void addRental(Rental rental) {
      rentals.add(rental);
   }

   public String statement() {
      StringBuilder result = new StringBuilder();
      appendHeader(result);
      for (Rental rental: rentals)
         appendDetail(result, rental);
      appendSummary(result);
      return result.toString();
   }

   private int calculateFrequentRenterPoints() {
      int totalFrequentRenterPoints = 0;
      for (Rental rental: rentals)
         totalFrequentRenterPoints += rental.getFrequentRenterPoints();
      return totalFrequentRenterPoints;
   }

   private double calculateTotalCost() {
      double totalCost = 0;
      for (Rental rental: rentals)
         totalCost += rental.getCost();
      return totalCost;
   }

The while loop in the original statement method intertwines three things: construction of a text statement, addition into a total cost, and addition into a total for frequent renter points. The refactored version separates and isolates each of these three behaviors.

At the point of presenting this code, Fowler discusses the performance implications: Instead of a single set of iterations, the code now requires three iterations over the same collection. Many developers object strongly to this refactoring.

I’ve done quick (and rough, but relatively accurate) performance measurements for this example. Things aren’t three times worse, as a naive guess might suggest; instead we’re talking about a 7% degradation in performance. That’s still a significant amount in many systems, maybe even too much to accept. In many other systems, the degradation is negligible, given the context.

Some conclusions: First, always measure, never guess. Second, understand what the real performance requirements are. Create end-to-end performance tests early on. Run these regularly (nightly?), so that you know the day someone introduces underperforming code.

The refactored, better design provides the stepping stone to many other good things. The small, cohesive methods can each be read and understood in a few seconds. You can glance at each, and almost instantly know they must be correct. They’re also extremely easily tested if you aren’t so trustworthy of glances.

Looking further at the code, you might also spot an abstraction waiting to come into existence–the collection of rentals itself. Producing a statement is a responsibility separate from managing this collection. You could create another class named RentalSet or RentalCollection. Moving the calculation operations into the new class is a trivial operation, given that the calculation logic is now completely divorced from statement generation.

Other benefits accrue. Fowler talks about the ability to provide support for an HTML statement, in addition to existing support for a plaintext statement. Migration to a polymorphic hierarchy is simple and safe with the refactored design.

Following the mantra of “make it run, make it [the design] right, make it fast,” you can return to the performance issue now that you have an improved design. Suppose you must restore performance and eliminate the 7% degradation. You could un-factor the loop extract and again intertwine three elements in the body of one for loop.

But since you have a simpler design, the optimization doesn’t need to impact the design so negatively:

   public void addRental(Rental rental) {
      rentals.add(rental);

      // optimization--cache totals:
      totalFrequentRenterPoints += rental.getFrequentRenterPoints();
      totalCost += rental.getCost();
   }

   private int calculateFrequentRenterPoints() {
      return totalFrequentRenterPoints;
   }

   private double calculateTotalCost() {
      return totalCost;
   }

(Note the rare, debatably worthwhile comment, which I could eliminate by another method extract.)

Small methods are about many things, ranging from comprehension to reuse. In this case, having an optimized design allowed for an even simpler solution to the optimization of performance.

Performance Debates

I’m always amused at the number of posts I see in Java forums about various performance concerns. Yesterday I read a sequence of anxious messages discussing whether Java’s import pkg.* performs worse than importing explicit classes. Once people were informed that it makes no difference at runtime, the concern moved to compile time.

Java compilation is slow, but it’s not that slow, and it’s also not C++, where concern over such minute detail can be rewarded handsomely. I think we’re talking close to picoseconds here. Someone did manage to run a time test and found a difference that equated to probably one minute over the course of a development year.

I find it odd that people are willing to waste hours debating this nonsense. Worse, they’re willing to sacrifice far more hours by sacrificing readability for execution efficiency.

I figure the anxiety over performance is due to two things: one, Java is one of the slower compiled languages out there (scripting languages such as Python typically fare much worse). Two, it’s something you can quantify easily.

With respect to the first concern, if you have that much angst over performance, perhaps you shouldn’t be programming in Java. Honestly, it’s not *that* bad, and plenty of systems execute with high performance while ignoring some of the minute optimizations that Java affords (for example, making all possible methods final). And usually, the concerns are misdirected. Some people even worry about performance when there are no known performance issues. But all it takes is a profiler (such as JProbe or OptimizeIt) and a bit of education to get the seance aspect out of performance optimization.

It’s the quantification that’s most alluring, however. Geeks like to speak numbers, or more importantly, they like to rank things. “Star Trek is better than Star Wars,” and “this construct executes faster than that.” Quantifying good programming style, however, isn’t possible. It’s subjective and debatable, so it doesn’t provide such an easy answer.

Get over it. Code for cost. The cost is in maintenance, and usually not in degraded performance. That means coming up with an optimal design–one where you can easily make performance tweaks if and only if you find performance problems. A bad design with bad performance will cost far more to fix than a good design with bad performance.

Not to say that you should ignore performance entirely, either. But right now my major concerns over performance aren’t about the production system’s performance. They instead are about the performance of my unit tests–something that directly ties to the cost of maintenance.

Atom