TDD Is Not Mindless

Claims that TDD is a mindless or tedious activity come only from those without a full understanding of the technique. You can’t succeed with TDD if you don’t think. We want you to not think about the steps in the technique itself–those become habitual after a short while–but instead think about the many things that are involved with growing software incrementally. You’ll need to answer many  questions at each step in the TDD cycle.
  • Write a small test. What’s the smallest possible piece of behavior you can increment the system by ? Does the system already have that behavior? How can you concisely describe the behavior in a test name?
  • Ensure the new test fails. If it doesn’t, why not? Does the behavior already
    exist? Should you add this as a new test? Did you forget to compile? Did
    you take too large a step in the prior test?
  • Write the code you think makes the test pass. Are you writing no more
    code than needed to meet the current set of behaviors specified by the
    tests? Are you recognizing where the code you just wrote will need to be
    cleaned up? Are you following your team’s standards?
  • Ensure all the tests pass. If not, did you code things incorrectly, or is your
    specification incorrect?
  • Clean up the code changes you just made. What do you need to do in order
    to get your code to follow team standards? What do you know about design
    and clean code that you should apply here? Does your new code duplicate
    other code in the system that you should clean up, too? What other
    changes to system design should you make due to the impact of our code
    change?
  • Ensure the tests all still pass. Do you trust that you have adequate unit
    test coverage? Should you run a slower set of tests to have the confidence
    to move on? What’s the next test?

(The preceding list of questions comes from the forthcoming book Test-Driven Development in C++ by Jeff Langr, targeted to be published by PragProg in early 2013.)

Why We Create Unnecessary Complexity

complexityQ. What’s “unnecessary complexity?”
A. Any code that doesn’t need to be there, or that is more difficult to understand than necessary.

Your system might exhibit unnecessary complexity if it contains duplicate code, abstractions that aren’t needed, or convoluted algorithms. Unnecessary complexity permanently increases the cost to understand and maintain code. We create it for various reasons, most of which we can begin to eliminate.

Time pressure. “We just have to ship and move on–we don’t have time to make it look nice!” You’ll regret this sadly typical choice later (not even that much later) when everything takes longer. Learn to push back when you know short-term time-saving measures will cost everyone dearly later.

Lack of education. To create quality designs that you can maintain at low cost, you have to know what that looks like. Most novices have little clue about how they are degrading their system. The concepts of cohesion and coupling (or SRP and DIP) are essential, but most everything you can do to learn about good design will pay off. Consider start by learning and living the concepts of Simple Design.

Existing complexity. The uglier your system is, the more likely that the average programmer will force-fit a solution into it and move on to the next thing. Long methods beget longer methods, and over-coupled systems beget more coupling. Incremental application of simple techniques (such as those in WELC) to get the codebase under control will ultimately pay off.

Premature Conjecture. “We might need a many-to-many relationship between customers and users down the road… let’s build it now.” Sometimes you never need to travel down that road. Even if you do, you pay for the complexity in the interim. Deferring the introduction of complexity usually increases the cost only marginally at the time when it’s actually required. When you choose the wrong unnecessary abstractions, it can significantly increase the cost to change to the right ones down the road.

Fear of changing code. “If it ain’t broke, don’t fix it.” Maybe we have the education we need, but we often resist using it. Why? Imagine you must add a new feature. You’ve found a similar feature implemented in a 200-line method. The new feature must be a little bit different–five lines worth of different code. The right thing would of course be to reuse the 195 lines of code that aren’t different.

But most developers blanch at the thought–that would mean making changes to the code that supports an existing feature, code  that many feel they shouldn’t touch. “If I break it, I’ll be asked why I was mucking with something that was already working.”

Good tests, of course, provide a means to minimize fear. In the absence of good tests, however, we tend to do the wrong things in the code. We duplicate the 200 lines and change the 5 we need, and make life a bit worse for everybody over time.

Code Socialization

The first rule of Fouled-up-code* Club is you don’t talk about the code.
The second rule of …

We don’t talk about the code, and so most of us are members of F’d-up-code Club. We’re not socializing the code enough. Shop upon shop I visit, I ask when the programmers discuss what’s in the code with each other. The usual answer?  “Once in a while.”

If you’re not regularly talking about the code as a team, it’s getting worse. And “it” includes the time to understand the code, the time to fix the code, the pain the code causes you, the defect count, and ultimately the extent to which you have a real team.

We create standards but they idle and adherence falls off, and the code shows it, and attempts to get back on track fall short. We pair-program, but don’t switch pairs mid-story, and the solutions show it. (Yes, two heads produce a better-than-one solution, but a pair deep into understanding of their feature can easily produce a solution that makes little sense to others.) We hold tepid brown-bags that bore attendees and ultimately taper off in quantity. We hold retrospectives, sometimes, but the process crud dominates, and the audience isn’t usually right for talking about code problems. We sit in the same area (well, a small number of us do), but rarely call others over to our monitor to look at some cool or challenging code.

Try this (the opposite of the prior paragraph, duh):

  • Revisit your standard once a quarter at least, or simply when someone wants to visit a specific standard issue.
  • Pair program and take advantage of the two-heads effect. If you can’t pair, incorporate a streamlined inspection process. If you can pair, switch pairs at least once mid-feature. See if the third party can make sense of your tests and solution with minimal assistance from you.
  • Run end-of-day code sharing sessions on the LCD projector. Limit them to 15 minutes, and shoot for 5 minutes. “Hey, can we stop doing this in the code?” Or, “Here’s a new reusable construct that we should all be using.” Or, “Here’s an effective tool tip I learned.”
  • Find 60 minutes for a programmer-only retrospective each iteration. Make sure what comes out of them are commitments to improve that are actually met (otherwise don’t bother).
  • Don’t wait for the end of the day or a pair switch to discuss something important in the code. People should feel comfortable about standing up and finding an audience for a 5-minute ad hoc code discussion.

* More blunt folks can substitute other words for “fouled.”

A Smackdown Tool for Overeager TDDers

smackdown!
Image source: https://commons.wikimedia.org/wiki/File:Cross_rhodes_on_gabriel.jpg

I’ve always prefaced my first test-driven development (TDD) exercises by saying something like, “Make sure you write no more code than necessary to pass your test. Don’t put in data structures you don’t need, for example.” This pleading typically comes on the tail of a short demo where I’ve mentioned the word incremental numerous times.

But most people don’t listen well, and do instead what they’ve been habituated to do.

With students in shu mode, it’s ok for instructors to be emphatic and dogmatic, smacking students upside the head when they break the rules for an exercise. It’s impossible to properly learn TDD if you don’t follow the sufficiency rule, whether deliberately or not. Trouble is, it’s tough for me to smack the heads of a half-dozen pairs all at once, and some people tend to call in HR when you hit them.

The whole issue of incrementalism is such an important concept that I’ve introduced a new starting exercise to provide me with one more opportunity to push the idea. The natural tendency of students to jump to an end solution is one of the harder habits to break (and a frequent cause of students’ negative first reaction when they actually try TDD).

I present a meaty first example (latest: the Soundex algorithm) where all the tests are marked as ignored or disabled, a great idea I learned from James Grenning. In Java, the students are to un-@Ignore tests one-by-one, simply getting them to pass, until they’ve gotten all tests to pass. The few required instructions are in the test file, meaning they can be hitting this exercise about two minutes after class begins.

Problem is, students have a hard time not breaking rules, and always tend to implement too much. As I walk around, I catch them, but it’s often a little too late. Telling them that they need to scrap their code and back up isn’t what they want to hear.

So, I built a custom test-runner that will instead fail their tests if they code too much, acting as a virtual head-smacking Jeff. (I built a similar tool for C++ that I’ve used successfully in a couple C++ classes.)

Here’s the (hastily built) code:

import org.junit.*;
import org.junit.internal.*;
import org.junit.internal.runners.model.*;
import org.junit.runner.*;
import org.junit.runner.notification.*;
import org.junit.runners.*;
import org.junit.runners.model.*;

public class IncrementalRunner extends BlockJUnit4ClassRunner {

   public IncrementalRunner(Class klass) 
         throws InitializationError {
      super(klass);
   }

   @Override
   protected void runChild(
         FrameworkMethod method, RunNotifier notifier) {
      EachTestNotifier eachNotifier = 
         derivedMakeNotifier(method, notifier);
      if (method.getAnnotation(Ignore.class) != null) {
         runIgnoredTest(method, eachNotifier);
         return;
      }

      eachNotifier.fireTestStarted();
      try {
         methodBlock(method).evaluate();
      } catch (AssumptionViolatedException e) {
         eachNotifier.addFailedAssumption(e);
      } catch (Throwable e) {
         eachNotifier.addFailure(e);
      } finally {
         eachNotifier.fireTestFinished();
      }
   }

   private void runIgnoredTest(
         FrameworkMethod method, EachTestNotifier eachNotifier) {
      eachNotifier.fireTestStarted();
      runExpectingFailure(method, eachNotifier);
      eachNotifier.fireTestFinished();
   }

   private EachTestNotifier derivedMakeNotifier(
         FrameworkMethod method, RunNotifier notifier) {
      Description description = describeChild(method);
      return new EachTestNotifier(notifier, description);
   }

   private void runExpectingFailure(
         final FrameworkMethod method, EachTestNotifier notifier) {
      if (runsSuccessfully(method)) 
         notifier.addFailure(
            new RuntimeException("You've built too much, causing " + 
                                 "this ignored test to pass."));
   }

   private boolean runsSuccessfully(final FrameworkMethod method) {
      try {
         methodBlock(method).evaluate();
         return true;
      } catch (Throwable e) {
         return false;
      }
   }
}

(Note: this code is written for JUnit 4.5 due to client version constraints.)

All the custom runner does is run tests that were previously @Ignored, and expect them to fail. (I think I was forced into completely overriding runChild to add my behavior in runIgnoredTest, but I could be wrong. Please let me know if you’re aware of a simpler way.) To use the runner, you simply annotate your test class with @RunWith(IncrementalRunner.class).

To effectively use the tool, you must provide students with a complete set of tests that supply a definitive means of incrementally building a solution. For any given test, there must be a possible implementation that doesn’t cause any later test to pass. It took me a couple tries to create a good sequence for the Soundex solution.

The tool is neither foolish-proof nor clever-proof; a small bit of monkeying about and a willingness to deliberately cheat will get around it quite easily. (There are probably a half-dozen ways to defeat the mechanism: For example, students could un-ignore tests prematurely, or they could simply turn off the custom test-runner.) But as long as they are not devious, the test failure from building too much gets in their face and smacks them when I’m not be around.

If you choose to try this technique, please drop me a line and let me know how it went!

Bad Code Examples Promote Bad Habits


Image source: http://upload.wikimedia.org/wikipedia/commons/thumb/c/c7/Ugly.svg/120px-Ugly.svg.png

Why is there so much bad code? Developers offer myriad excuses:

  • We just have to ship (i.e. we don’t have enough time to do it right).
  • We disagree as to what “bad” is.
  • We know the code is not clean but we don’t think that it’s a real problem (i.e. we don’t care).
  • The existing design makes it tough to do the right thing.
  • We don’t have enough tests to clean the code up safely (“if it ain’t broke don’t fix it”).

Missing from that list is inadequate education. Most programmers learn from looking at other peoples’ code–whether in delivered systems, open source, internet examples, or book examples. A quick probe of the beginner’s forum at JavaRanch demonstrates how horribly constructed novice programs can get. Many novices simply won’t know better–they copy working code snippets from elsewhere and modify it to their needs.

Let’s shift at least part of the blame where it belongs–to the author of the bad code that the novice copied. Most beginning programming language tutorials and books don’t discuss or teach proper design. Examples frequently demonstrate rampant duplication, intertwining of console output with business logic, clever constructs, and cryptic naming.

Even authors who should know better present examples with severe design flaws. They could at least caveat them (“cleaning up the obvious duplication in this example is an exercise left to the reader”). The better solution would be to find a way to code it properly and not show the bad code at all.

I’ve been working through examples in a couple Erlang books. Here’s a paraphrasing of one unfortunate theme presented in both:

loop() ->
    receive
        {Client, "smelt"} ->
            Client ! "a small silvery fish",
            loop();
        {Client, "agile"} ->
            Client ! "lithe",
            loop();
        {Client, "lean"} ->
            Client ! "trim",
            loop();
        {Client, _} ->
            Client ! "no definition found",
            loop()
     end.

You don’t have to know Erlang to see the obvious duplication in this short example. Sure, it’s just an example, designed to show a construct, don’t worry about the design, yadda yadda. Yet it’s at the heart of why most of our code sucks. I know little about Erlang so far, but it sure makes a lot more sense to separate handling a server request from domain logic:

loop() ->
   receive
      {Client, Word} ->
           Client ! definition(Word),
           loop()
 end.

(If we’ve gotten far enough to learn about spawning processes, we’ve certainly learned how to build a clean definition function.) By presenting it properly, the duplication gets eliminated, and the example may more readily suggest that the server loop is a potentially reusable construct, or at least a common idiom.

By presenting it poorly, I can bet you a large amount of Erlang code out there is similarly difficult.

Why allow beginners to learn poor coding practice, then have to re-train them?

TDD for C++ Programmers

C++Recently I’ve been getting a good number of calls from C++ shops interested in doing TDD, despite my heavy Java background. I took on some of the business and had to turn away some to avoid being swamped. Many other folks I know (name dropping time!)–Tim Ottinger, James Grenning, JB Rainsberger, others–have also reported doing C++ work recently.

Is TDD finally taking hold in C++ shops? Does TDD even make sense for C++? I think so, and two current customers believe they’ve been seeing great benefits come from applying it. Building and delivering a C++ TDD course recently helped me come back up to speed in the language to the point where I was comfortably familiar with all of the things I hated about it. 🙂 It makes no sense to take such a difficult language and stab at it without the protection of tests.

I’ve been simultaneously writing more (after a typical winter writing freeze) and looking at Erlang–a much cooler language, challenging in a different kind of way. Meanwhile, my editor at PragProg has been asking for new book ideas. Here were some of my thoughts:

  • Refactoring 2012
  • Modern OO Design (not template metaprogramming!) / Simple Design
  • Object-Oriented Design “In a Flash” (card deck, like Agile in a Flash)

No matter how hard I try to run screaming from C++, there it is right behind me. It’s indeed a powerful language, and there is gobs and gobs of code written in it, and it’s about time we started trying to figure out how to make the best of it. It’s not going away in my lifetime. I also think C++ programmers are not well-served in terms of writings on TDD out there.

So… I decided it was going to be TDD in C++. Tim Ottinger and I put together and just sent out a proposal for a book tentatively named TDD for C++ Programmers (with a catchy subtitle, no doubt). We hope there’s enough demand and interest to get the proposal accepted. If all goes well, we’ll be soliciting reviewers in a few weeks.

I look forward to writing again with Tim! More in an upcoming blog post about our collaborative writing experience.

Legacy Quadrants for Increasing Confidence Coverage

Your legacy system presents you with a choice: Put a stake in the ground and attempt to do something about the daily pain it causes, or let it get incrementally and certainly worse. Often its challenges seem insurmountable–small improvements appear like ripples in the ocean. Perhaps an investment to improve the software product is not worth the return in value, given its future.

Chances are that most of the time, though, you’ll be stuck with your legacy system for some time coming, and it makes sense to put a foot down to stem the degradation. Working Effectively With Legacy Code provides you with primarily tactical guidance for preventing entropy, focusing mostly on individual code-level challenges. The Mikado Method provides you with primarily strategic guidance, focusing mostly on managing large-scale refactorings across a code base (legacy or not).

In terms of where to put your effort, the best strategy is always to begin with the areas of current impact–what needs to change for the feature you’re currently tackling? But if you can afford additional effort to add characterization tests–i.e. increase “confidence coverage”–consider this quadrant that pits rate of change against defects for a given class.

Legacy Quadrants

How might you determine what classes belong in which quadrants? Tim Ottinger devised a heat map while at one shop, a simple Python tool that cross-referenced JIRA tickets with trouble tickets reported in Git. (I got to help build it!) You might consider filtering the defects to ignore those with low severity.

The Trusty Old Engine. “If it ain’t broke, don’t fix it!” Classes that work and aren’t changing should be the last thing you worry about. Sometimes code in these areas is tempting: it may exhibit obvious duplication, for example, that a refactoring weenie might view as a quick-win opportunity. Don’t waste your time–there are bigger fish to try.

The Can of Worms. You should have very few classes in this quadrant–having lots of  stable classes that exhibit many defects suggests a reset on how your team does triage. Fixing these troubled classes might seem like opportunity to make your customer happy. Open the code up, however, and you’re likely to find fear-inspiring code. You might view this quadrant as a sandbox for ramping up on legacy code techniques: Classes here will provide you the time to experiment without having to worry as much about contention issues with the rest of the team.

The Mystery. Getting all the tiny bits correct in the face of rapid change is a significant challenge. A single, low-defect class with lots of change could suggest at least a couple things: The changes are generally trivial in nature, or the class rampantly violates single responsibility principle (for example, a large controller class that doesn’t simply delegate but also does work). With respect to the latter, perhaps you’ve been fortunate so far, or perhaps you have good test coverage.

It might be worth a small amount of effort to prevent any future problems in such an SRP-violating class. The nice thing is that splintering off new classes can be a fairly quick and risk-free effort, one that should begin to isolate areas of risk. You might then be able to code a few key characterization tests without extreme difficulty.

The Garbage Disposal. Usually there are at least a few classes in your system that are very large and fraught with problems, and they change all the time. This is the obvious place to improve your confidence coverage.

Good design, which starts with the application of the SRP, starts to solve many problems that the garbage disposal represents. Small, single-purpose classes provide more opportunities for reuse, make performance profiling easier, expose defects more readily, and can dramatically improve comprehension of the code, to name a few benefits. A garbage disposal is a dangerous place–with lots of hands in one big sinkhole, contention is frequent and merges can be painful. Design that adheres to the SRP thus also makes continuous integration easier.

My First TDD Exercise

Finding a good first exercise for TDD learners can be challenging. Some of the things I look for:

  • Not so simple that it’s trivial, e.g. Stack, lest students dismiss it as pointless. There should be opportunities for students to make a few simple mistakes, but…
  • …not so complex that students mire in the algorithm itself instead of focusing on technique.
  • Something students can relate to–not so obscure that students struggle with understanding the domain or problem.
  • Games and fluff (bowling, fizz-buzz, e.g.) can create a first impression–and first impressions can be very hard to break–that TDD is for trivialities only, despite what you say. These things can be fine as later exercises.
  • Not something that already exists in the library. (“Why am building a multi-map when C++ already has one?”)
  • Can be test-driven in 10 to 15 minutes by an experienced TDDer, meaning that it’s probably a 30-to-45-minute exercise for newbies pairing up.
  • Provides a good demonstration of the ability to incrementally grow a solution, including opportunities to spot and remove duplication.
  • Provides an opportunity to show how to deal with at least one exceptional case.

I’ve been demoing a stock portfolio tracker for some time–a simple collection class that allows purchases of stock symbols. With Java students, I follow up with a multi-map, a class that would be useful in most shops (though a similar jakarta implementation exists). Both have worked well.

The message an exercise sends can be damaging. As a second or third exercise, lately I’ve been using the Roman numeral converter kata. I think it’s a cool exercise that can show how you can build an elegant algorithm by following a simple, incremental approach. It’s had 99% effectiveness: Out of the past ~100 students, one guy–an algorithm specialist–took a completely negative view of TDD after it. His stance was that he could have derived the algorithm much more quickly using a top-down, non-test-driven approach. From that he dismissed TDD completely. During subsequent pairings, I think he saw some of its benefits (we talk a lot about algorithms and TDD), but it’s an uphill battle and it might have been too late.

Currently I’m experimenting with an exercise to implement the soundex algorithm. More on that in an upcoming blog post.

Dave Thomas’s PragProg CodeKata site is a great place to get ideas. You might also check out the TDD Problems site (to which I contributed a handful of problems).

Refactoring and Performance

In the first portion of the book Refactoring, Martin Fowler presents an example where he whittles a verbose method into an improved design. Here’s the original, more or less:

   public String statement() {
      double totalAmount = 0;
      int frequentRenterPoints = 0;
      Enumeration rentals = this.rentals.elements();
      String result = "Rental Record for " + getName() + "\n";

      while (rentals.hasMoreElements()) {
         double thisAmount = 0;
         Rental each = (Rental)rentals.nextElement();
         switch (each.getMovie().getPriceCode()) {
            case Movie.REGULAR:
               thisAmount += 2;
               if (each.getDaysRented() > 2)
                  thisAmount += (each.getDaysRented() - 2) * 1.5;
               break;
            case Movie.NEW_RELEASE:
               thisAmount += each.getDaysRented() * 3;
               break;
            case Movie.CHILDRENS:
               thisAmount += 1.5;
               if (each.getDaysRented() > 3)
                  thisAmount += (each.getDaysRented() - 3) * 1.5;
               break;
         }

         frequentRenterPoints++;

         if (each.getMovie().getPriceCode() == Movie.NEW_RELEASE && 
             each.getDaysRented() > 1)
            frequentRenterPoints++;

         result += "t" + each.getMovie().getTitle() + 
                   "t" + String.valueOf(thisAmount) + "\n";
         totalAmount += thisAmount;
      }
   }

And here’s the (relevant portion of the) factored code. I think this is as far as Fowler takes the example:

   public void addRental(Rental rental) {
      rentals.add(rental);
   }

   public String statement() {
      StringBuilder result = new StringBuilder();
      appendHeader(result);
      for (Rental rental: rentals)
         appendDetail(result, rental);
      appendSummary(result);
      return result.toString();
   }

   private int calculateFrequentRenterPoints() {
      int totalFrequentRenterPoints = 0;
      for (Rental rental: rentals)
         totalFrequentRenterPoints += rental.getFrequentRenterPoints();
      return totalFrequentRenterPoints;
   }

   private double calculateTotalCost() {
      double totalCost = 0;
      for (Rental rental: rentals)
         totalCost += rental.getCost();
      return totalCost;
   }

The while loop in the original statement method intertwines three things: construction of a text statement, addition into a total cost, and addition into a total for frequent renter points. The refactored version separates and isolates each of these three behaviors.

At the point of presenting this code, Fowler discusses the performance implications: Instead of a single set of iterations, the code now requires three iterations over the same collection. Many developers object strongly to this refactoring.

I’ve done quick (and rough, but relatively accurate) performance measurements for this example. Things aren’t three times worse, as a naive guess might suggest; instead we’re talking about a 7% degradation in performance. That’s still a significant amount in many systems, maybe even too much to accept. In many other systems, the degradation is negligible, given the context.

Some conclusions: First, always measure, never guess. Second, understand what the real performance requirements are. Create end-to-end performance tests early on. Run these regularly (nightly?), so that you know the day someone introduces underperforming code.

The refactored, better design provides the stepping stone to many other good things. The small, cohesive methods can each be read and understood in a few seconds. You can glance at each, and almost instantly know they must be correct. They’re also extremely easily tested if you aren’t so trustworthy of glances.

Looking further at the code, you might also spot an abstraction waiting to come into existence–the collection of rentals itself. Producing a statement is a responsibility separate from managing this collection. You could create another class named RentalSet or RentalCollection. Moving the calculation operations into the new class is a trivial operation, given that the calculation logic is now completely divorced from statement generation.

Other benefits accrue. Fowler talks about the ability to provide support for an HTML statement, in addition to existing support for a plaintext statement. Migration to a polymorphic hierarchy is simple and safe with the refactored design.

Following the mantra of “make it run, make it [the design] right, make it fast,” you can return to the performance issue now that you have an improved design. Suppose you must restore performance and eliminate the 7% degradation. You could un-factor the loop extract and again intertwine three elements in the body of one for loop.

But since you have a simpler design, the optimization doesn’t need to impact the design so negatively:

   public void addRental(Rental rental) {
      rentals.add(rental);

      // optimization--cache totals:
      totalFrequentRenterPoints += rental.getFrequentRenterPoints();
      totalCost += rental.getCost();
   }

   private int calculateFrequentRenterPoints() {
      return totalFrequentRenterPoints;
   }

   private double calculateTotalCost() {
      return totalCost;
   }

(Note the rare, debatably worthwhile comment, which I could eliminate by another method extract.)

Small methods are about many things, ranging from comprehension to reuse. In this case, having an optimized design allowed for an even simpler solution to the optimization of performance.

Test-Driving a Heap-Based Priority Queue

It’s been over 25 years since I had to code a priority queue. Should I really need to? Java now supplies a priority queue, and an important lesson of software development is that you should first seek to reuse what you can. But it’s worthwhile to know how to build such things yourself, in case you’re in an environment where one doesn’t exist.

As usual, though, I’m mostly interested in honing my ability to incrementally grow things, including interesting data structures. What can I learn from test-driving a priority queue?

All a priority queue does is return elements from a queue in priority order. Per the Wikipedia page, there are many ways to implement a priority queue, including a couple bad ways. Right off the bat, there’s no way I could force a priority queue to use a binary heap–one of the better choices–given only the client needs. Test-driving a solution and building the “simplest” possible implementation would result in the worst-performing solution.

So how to force the issue? Well, the nice thing about a binary heap is that it has a very concisely defined structure, one that I can verify easily. (I’ll refer to a binary heap as simply a heap for the rest of this article.) A binary heap is a tree where:

  • The key for each node is greater than or equal to each of its children.
  • The tree is complete–the last level is filled left-to-right, and all other levels are filled.

I know I want to end up with a heap–I need the performance–and will use that as a guiding factor in test-driving a solution. Does this violate YAGNI? Maybe. Other options? I could introduce a performance-related test, one where I tried to verify the big-O of each of the three operations (adding a new item, peeking at the highest priority item, and removing the highest priority item–the last two of which are often combined into a poll method). That would be tough to do well, and still might not guarantee an implementation that used a binary heap.

No, I need to be able to peek at the implementation. TDD will still be a great help. I still want to build the heap incrementally–there’s a decent amount of code involved, and plenty of opportunities to get it wrong. I’ll start simple, however, and not worry about heap characteristics until I get basic queue functionality working:

import org.junit.*;
import static org.junit.Assert.*;
import static org.hamcrest.CoreMatchers.*;

public class PriorityQueueTest {
   private static final Object ITEM_50 = "50";
   private static final Object ITEM_100 = "100";

   PriorityQueue queue;

   @Before
   public void create() {
      queue = new PriorityQueue();
   }

   @Test
   public void emptyWhenCreated() {
      assertThat(queue.isEmpty(), is(true));
   }

   @Test
   public void noLongerEmptyAfterAdd() {
      queue.add(ITEM_50, 50);

      assertThat(queue.isEmpty(), is(false));
   }

   @Test
   public void singletonQueueReturnsSoleItemOnPoll() {
      queue.add(ITEM_50, 50);

      assertThat(queue.poll(), is(ITEM_50));
   }

   @Test
   public void isEmptyAfterSoleElementRemoved() {
      queue.add(ITEM_50, 50);
      queue.poll();

      assertThat(queue.isEmpty(), is(true));
   }

   @Test
   public void returnsFIFOWithTwoAlreadyOrderedItems() {
      queue.add(ITEM_100, 100);
      queue.add(ITEM_50, 50);

      assertThat(queue.poll(), is(ITEM_100));
      assertThat(queue.poll(), is(ITEM_50));
      assertThat(queue.isEmpty(), is(true));
   }
}

I won’t bore you with the rote, incremental steps used to get to this point. For now the solution is simply a class that wraps an ordered list. There’s no support for pulling things out in other than FIFO order, since there are yet no tests that demand the capability.

import java.util.*;

public class PriorityQueue {
   private List<Object> heap = new ArrayList<Object>();

   public boolean isEmpty() {
      return heap.isEmpty();
   }

   public void add(Object item, int priority) {
      heap.add(item);
   }

   public Object poll() {
      return heap.remove(0);
   }
}

All those tests to get to this trivial bit of code? No big deal. About five minutes of work, and I’ve gotten some good cases documented and clamped down.

The next test makes things a little more interesting:

   @Test
   public void returnsInPriorityOrder() {
      queue.add(ITEM_50, 50);
      queue.add(ITEM_100, 100);

      assertThat(queue.poll(), is(ITEM_100));
      assertThat(queue.poll(), is(ITEM_50));
      assertThat(queue.isEmpty(), is(true));
   }

The implementation now requires actually using the priority, associating it with each item, hence the introduction of the Node class. It also means I need a little bit of “real heap logic” in the add method: If the second item is higher priority than the first, sift it to the front. Since there are only two items, it’s a hard-coded swap. The way I coded it also happens to work for single-element heaps.

import java.util.*;

public class PriorityQueue {
   private List<Node> heap = new ArrayList<Node>();

   class Node {
   final Object object;
   final int priority;

   Node(Object object, int priority) {
      this.object = object;
      this.priority = priority;
   }
   

   public boolean isEmpty() {
      return heap.isEmpty();
   }

   public void add(Object item, int priority) {
      Node node = new Node(item, priority);
      heap.add(node);
      if (node.priority > heap.get(0).priority)
         siftUp();
   }

   private void siftUp() {
      Node temp = heap.get(0);
      heap.set(0, heap.get(1));
      heap.set(1, temp);
   }

   public Object poll() {
      return heap.remove(0).object;
   }
}

Interestingly, I still don’t have a proper tree with parents and children–just a list with a couple slots filled in. In looking about at ways to implement a binary tree, the most appealing is to simply build it in-place in an array–something easily done given the constraint that the tree must be complete. Using an array will make it easy (and fast) to navigate from parent to child and vice versa.

The tests could use a bit of clarity. I add a helper method to simplify my assertion (and distill it to one line):

   @Test
   public void returnsFIFOWithTwoAlreadyOrderedItems() {
      queue.add(ITEM_100, 100);
      queue.add(ITEM_50, 50);

      assertQueuePollingOrder(ITEM_100, ITEM_50);
   }

   @Test
   public void returnsInPriorityOrder() {
      queue.add(ITEM_50, 50);
      queue.add(ITEM_100, 100);

      assertQueuePollingOrder(ITEM_100, ITEM_50);
   }

   private void assertQueuePollingOrder(Object... expectedItems) {
      List<Object> items = new ArrayList<Object>();
      while (!queue.isEmpty())
         items.add(queue.poll());
      assertThat(items, is(Arrays.asList(expectedItems)));
   }

Time to move on to the more interesting cases.

Thinking a bit about the next test (thinking is allowed, in fact encouraged, in TDD, particularly when determining what to do next)… Adding a third element to the heap will require a bit more generalized logic, and a fourth element (which would represent the third level of the heap’s tree) even more so. All complex enough without worrying about the complexity of removal, which requires re-heapifying.

One thing at a time. Rather than solve both challenges at once, I’ll first build out support for adding elements. Instead of using polling to verify that adding worked properly, I’ll verify that the priority queue’s underlying heap is valid.

Instead of creating a new test, I introduce a new assertion in each of my two existing tests (abusing the guideline of one assert per test).

   @Test
   public void returnsFIFOWithTwoAlreadyOrderedItems() {
      queue.add(ITEM_100, 100);
      queue.add(ITEM_50, 50);

      assertQueuePollingOrder(ITEM_100, ITEM_50);
      assertValid(queue.heap());
   }

   @Test
   public void returnsInPriorityOrder() {
      queue.add(ITEM_50, 50);
      queue.add(ITEM_100, 100);

      assertQueuePollingOrder(ITEM_100, ITEM_50);
      assertValid(queue.heap());
   }

   private void assertValid(List<PriorityQueue.Node> heap) {
      assertSubheap(heap, Integer.MAX_VALUE, 0);
   }

   private void assertSubheap(
         List<PriorityQueue.Node> heap, int priority, int root) {
      if (root > heap.size() - 1)
         return;

      PriorityQueue.Node node = heap.get(root);
      assertThat(priority > node.priority, is(true));

      assertSubheap(heap, node.priority, leftChild(root));
      assertSubheap(heap, node.priority, rightChild(root));
   }

   private int rightChild(int index) {
      return index * 2 + 2;
   }

   private int leftChild(int index) {
      return index * 2 + 1;
   }

Now I can quickly write the next failing test (with a lame name):

   @Test
   public void heapifiesThreeElements() {
      queue.add(ITEM_50, 50);
      queue.add(ITEM_75, 75);
      queue.add(ITEM_100, 100);

      assertValid(queue.heap());
   }

… and get it to pass quickly too:

   public void add(Object item, int priority) {
      Node node = new Node(item, priority);
      heap.add(node);
      int rightmost = heap.size() - 1;
      if (node.priority > heap.get(0).priority)
         siftUp(rightmost);
   }

   private void siftUp(int rightmost) {
      Node temp = heap.get(0);
      heap.set(0, heap.get(rightmost));
      heap.set(rightmost, temp);
   }

relevant lines: 4,6,9,11,12

Another test:

   @Test
   public void heapifiesMultipleLevels() {
      add(50, 75, 100, 125);
      assertValid(queue.heap());
   }

This time it requires a bit more manipulation of existing code. I rename siftUp to simply swap, because that’s all it does, and the “sift up” algorithm is in the body of the add method.

   public void add(Object item, int priority) {
      heap.add(new Node(item, priority));
      int current = heap.size() - 1;
      while (priority > parentPriority(current)) {
         swap(current, parent(current));
         current = parent(current);
      }
   }

   private int parentPriority(int index) {
      return heap.get(parent(index)).priority;
   }

   private int parent(int index) {
      return (index - 1) / 2;
   }

   private void swap(int x, int y) {
      Node temp = heap.get(x);
      heap.set(x, heap.get(y));
      heap.set(y, temp);
   }

It’s not bad, but I want each method to be at one level of abstraction, so I modify the add method:

   public void add(Object item, int priority) {
      addNewNode(item, priority);
      siftUpFromLast(priority);
   }

   private void addNewNode(Object item, int priority) {
      heap.add(new Node(item, priority));
   }

   private void siftUpFromLast(int priority) {
      int current = heap.size() - 1;
      while (priority > parentPriority(current)) {
         swap(current, parent(current));
         current = parent(current);
      }
   }

I think that’s it for add! To confirm, I use the example from Wikipedia:

   @Test
   public void integrationTest() {
      add(1, 2, 3, 7, 17, 19, 25, 36, 100);
      assertValid(queue.heap());
   }

Oh yeah–all those test lines required to add elements were getting annoying, so I added the following helper in the test (and updated other tests correspondingly):

   private void add(int... priorities) {
      for (int priority: priorities)
         queue.add(String.valueOf(priority), priority);
   }

On to polling. The general algorithm: capture value at front of heap (the highest priority per add‘s post-condition), move the rightmost item to the front, and re-heapify. To re-heapify: compare the item at the root with both its children, swap with the larger if necessary, and re-heapify the subtree.

From a test perspective, all we need to verify is that the highest priority item was returned and that the completeness post-condition held.

   @Test
   public void removeAnswersHighest() {
      add(20, 30, 40, 100, 50, 60);
      assertThat(queue.poll(), is((Object)"100"));
   }
   @Test
   public void removeReheapifies() {
      add(20, 30, 40);
      queue.poll();
      assertValid(queue.heap());
   }

After incrementally introducing changes based on those two test failures, here’s the explicit bit of code that covers lists up to three elements:

   public Object poll() {
      Object result = heap.remove(0).object;
      if (heap.size() == 2) {
         if (heap.get(0).priority < heap.get(1).priority)
            swap(0, 1);
      }
      return result;
   }

The addition of another item generates the next test failure:

   @Test
   public void removeReheapifyMustConsiderBothChildren() {
      add(40, 20, 30, 10);
      queue.poll();
      assertValid(queue.heap());
   }

I use programming by intention to express the algorithm clearly, starting with the poll method:

   public Object poll() {
      if (heap.size() == 1)
         return heap.remove(0).object;

      Object result = heap.get(0).object;
      reheapify();
      return result;
   }

   private void reheapify() {
      moveLastToFirstSlot();
      if (priorityLessThanEitherChild(0))
         swap(0, highestPriorityChild(0));
   }

The reheapify method has the odd side effect of eliminating the first (highest) element by overwriting it. That this is not explicit in the poll method suggests there’s probably a better expression of the entire algorithm.

I flesh out the trivial supporting methods:

   private void moveLastToFirstSlot() {
      Node rightmost = heap.remove(heap.size() - 1);
      heap.set(0, rightmost);
   }
private boolean priorityLessThanEitherChild(int index) {
      int priority = heap.get(index).priority;
      return priority < leftPriority(index) ||
             priority < rightPriority(index);
   }

   private int highestPriorityChild(int index) {
      return leftPriority(index) > rightPriority(index)
             ? leftChild(index)
             : rightChild(index);
   }

   private int rightPriority(int index) {
      return priority(rightChild(index));
   }

   private int leftPriority(int index) {
      return priority(leftChild(index));
   }

   private int priority(int index) {
      if (index > heap.size() - 1)
         return Integer.MIN_VALUE;
      return heap.get(index).priority;
   }

   private int leftChild(int index) {
      return index * 2 + 1;
   }

   private int rightChild(int index) {
      return index * 2 + 2;
   }

It’s a good chunk of code, but it still fit comfortably within my five-minute cycle limit.

Hey–I recognize those leftChild and rightChild methods from my test! I’ll go make ’em static so the test can re-use them.

About the only other interesting logic is rooted in the priority method, which returns Integer.MIN_VALUE if there’s no item at an index. That allows comparisons in priorityLessThanEitherChild to quietly fail when no child exists.

Almost there.  The final push is to force repeated swaps from root down through multiple subtrees:

   @Test
   public void removeMustReheapifyMultipleLevels() {
      add(50, 40, 20, 30, 10);
      queue.poll();
      assertValid(queue.heap());
   }

Getting a passing test requires a nice, small increment of code (and a TPP transformation from if=>while. Recursion is one up in the priority list–should I refactor this method?).

   private void reheapify() {
      moveLastToFirstSlot();
      int i = 0;
      while (priorityLessThanEitherChild(i)) {
         int child = highestPriorityChild(i);
         swap(i, child);
         i = child;
      }
   }

Both add and poll methods appear to be complete. Here’s a passing test that fires end-to-end:

   @Test
   public void pollsInPriorityOrder() {
      add(40, 20, 60, 80, 10, 30, 90, 70, 50);
      assertQueuePollingOrder(
         "90", "80", "70", "60", "50", "40", "30", "20", "10");
   }

Boy, do those quotes suck. Well, it’s time to do some global clean-up, so I’ll fix that along with some other things. Here’s the test class after some various cleanup for expressiveness:

import static org.hamcrest.CoreMatchers.*;
import static org.junit.Assert.*;
import java.util.*;
import org.junit.*;

public class PriorityQueueTest {
   PriorityQueue queue;

   @Before
   public void create() {
      queue = new PriorityQueue();
   }

   @Test
   public void emptyWhenCreated() {
      assertThat(queue.isEmpty(), is(true));
   }

   @Test
   public void noLongerEmptyAfterAdd() {
      queue.add("", 50);

      assertThat(queue.isEmpty(), is(false));
   }

   @Test
   public void singletonQueueReturnsSoleItemOnPoll() {
      queue.add("abc", 50);

      assertThat(queue.poll(), is((Object)"abc"));
   }

   @Test
   public void isEmptyAfterSoleItemRemoved() {
      queue.add("", 50);
      queue.poll();

      assertThat(queue.isEmpty(), is(true));
   }

   @Test
   public void returnsFIFOWithTwoAlreadyOrderedItems() {
      add(100, 50);
      assertQueuePollingOrder(100, 50);
      assertValid(queue.heap());
   }

   @Test
   public void returnsTwoUnorderedItemsInPriorityOrder() {
      add(50, 100);
      assertQueuePollingOrder(100, 50);
      assertValid(queue.heap());
   }

   @Test
   public void heapifiesThreeItems() {
      add(50, 75, 100);
      assertValid(queue.heap());
   }

   @Test
   public void heapifiesMultipleLevels() {
      add(50, 75, 100, 125);
      assertValid(queue.heap());
   }

   @Test
   public void removeAnswersHighest() {
      add(20, 30, 40, 100, 50, 60);
      assertThat(queue.poll(), is((Object)"100"));
   }

   @Test
   public void removeReheapifiesTwoItems() {
      add(20, 30, 40);
      queue.poll();
      assertValid(queue.heap());
   }

   @Test
   public void removeReheapifyMustConsiderBothChildren() {
      add(40, 20, 30, 10);
      queue.poll();
      assertValid(queue.heap());
   }

   @Test
   public void removeMustReheapifyMultipleLevels() {
      add(50, 40, 20, 30, 10);
      queue.poll();
      assertValid(queue.heap());
   }

   @Test
   public void pollExhaustsHeapInPriorityOrder() {
      add(40, 20, 60, 80, 10, 30, 90, 70, 50);
      assertQueuePollingOrder(90, 80, 70, 60, 50, 40, 30, 20, 10);
   }

   private void assertQueuePollingOrder(Integer... expectedItems) {
      List<Integer> items = new ArrayList<Integer>();
      while (!queue.isEmpty())
         items.add(Integer.parseInt((String)queue.poll()));
      assertThat(items, is(Arrays.asList(expectedItems)));
   }

   private void add(int... priorities) {
      for (int priority: priorities)
         queue.add(String.valueOf(priority), priority);
   }

   private void assertValid(List<PriorityQueue.Node> heap) {
      assertSubheap(heap, Integer.MAX_VALUE, 0);
   }

   private void assertSubheap(List<PriorityQueue.Node> heap, int priority, int root) {
      if (root > heap.size() - 1)
         return;

      PriorityQueue.Node node = heap.get(root);
      assertThat(priority > node.priority, is(true));

      assertSubheap(
         heap, node.priority, PriorityQueue.leftChild(root));
      assertSubheap(
         heap, node.priority, PriorityQueue.rightChild(root));
   }
}

And here’s the implementation:

import java.util.*;

public class PriorityQueue {
   private List<Node> heap = new ArrayList<Node>();

   static class Node {
      final Object object;
      final int priority;

      Node(Object object, int priority) {
         this.object = object;
         this.priority = priority;
      }

      public String toString() {
         return String.valueOf(priority);
      }
   }

   public boolean isEmpty() {
      return heap.isEmpty();
   }

   public void add(Object item, int priority) {
      addNewNode(item, priority);
      siftUpFromLast(priority);
   }

   private void addNewNode(Object item, int priority) {
      heap.add(new Node(item, priority));
   }

   private void siftUpFromLast(int priority) {
      int current = heap.size() - 1;
      while (priority > parentPriority(current)) {
         swap(current, parent(current));
         current = parent(current);
      }
   }

   private int parentPriority(int index) {
      return heap.get(parent(index)).priority;
   }

   private int parent(int index) {
      return (index - 1) / 2;
   }

   private void swap(int x, int y) {
      Node temp = heap.get(x);
      heap.set(x, heap.get(y));
      heap.set(y, temp);
   }

   public Object poll() {
      if (heap.size() == 1)
         return heap.remove(0).object;

      Object result = heap.get(0).object;
      reheapify();
      return result;
   }

   private void reheapify() {
      moveLastToFirstSlot();
      int i = 0;
      while (priorityLessThanEitherChild(i)) {
         int child = highestPriorityChild(i);
         swap(i, child);
         i = child;
      }
   }

   private void moveLastToFirstSlot() {
      Node rightmost = heap.remove(heap.size() - 1);
      heap.set(0, rightmost);
   }

   private boolean priorityLessThanEitherChild(int index) {
      int priority = heap.get(index).priority;
      return priority < leftPriority(index)
          || priority < rightPriority(index);
   }

   private int highestPriorityChild(int index) {
      return leftPriority(index) > rightPriority(index)
         ? PriorityQueue.leftChild(index)
         : PriorityQueue.rightChild(index);
   }

   int rightPriority(int index) {
      return priority(rightChild(index));
   }

   int leftPriority(int index) {
      return priority(leftChild(index));
   }

   private int priority(int index) {
      if (index > heap.size() - 1)
         return Integer.MIN_VALUE;
      return heap.get(index).priority;
   }

   List<Node> heap() {
      return heap;
   }

   static int rightChild(int index) {
      return index * 2 + 2;
   }

   static int leftChild(int index) {
      return index * 2 + 1;
   }
}

What else is there to complain about?

  • My test names could use some work. It’s hard to derive good test names when you’re driving incremental implementation of an algorithm, instead of an interface.
  • I probably have some tests I can get rid of.
  • The tests could be composed better. Worse, the test knows about the Node class (which the client need not know about). Maybe binary heap verification should be built into the heap implementation itself.
  • Should there be a PriorityQueue class and a separate Heap class? I thought about this a while and didn’t see the value.
  • I don’t use the Comparable interface to “simplify” and externalize comparisons (i.e. allow the client to specify how priorities compare).
  • I don’t have any tests that show what happens when priorities are equal.
  • I don’t use safety braces (i.e. braces around single-line if statements)–that’s why I have tests. I don’t make everything private that could be private (package access is fine by me, particularly in tests since I don’t have to type the access modifier).

What do I like?

  • I was able to test-drive the implementation, sticking to the five-minute rule for getting green bars. Things were a bit close a couple times (was there a more incremental way?).
  • I think the algorithm as implemented is very easy to follow (read each of the add and poll methods), particularly when compared to what is typically built (here’s one implementation; see, for example, the percolateDown method).

What do you think? Is test-driving a priority queue a good kata? I lay down the challenge flag for a better progression and implementation (shouldn’t be hard).

Atom