Test-Driving a Heap-Based Priority Queue

It’s been over 25 years since I had to code a priority queue. Should I really need to? Java now supplies a priority queue, and an important lesson of software development is that you should first seek to reuse what you can. But it’s worthwhile to know how to build such things yourself, in case you’re in an environment where one doesn’t exist.

As usual, though, I’m mostly interested in honing my ability to incrementally grow things, including interesting data structures. What can I learn from test-driving a priority queue?

All a priority queue does is return elements from a queue in priority order. Per the Wikipedia page, there are many ways to implement a priority queue, including a couple bad ways. Right off the bat, there’s no way I could force a priority queue to use a binary heap–one of the better choices–given only the client needs. Test-driving a solution and building the “simplest” possible implementation would result in the worst-performing solution.

So how to force the issue? Well, the nice thing about a binary heap is that it has a very concisely defined structure, one that I can verify easily. (I’ll refer to a binary heap as simply a heap for the rest of this article.) A binary heap is a tree where:

  • The key for each node is greater than or equal to each of its children.
  • The tree is complete–the last level is filled left-to-right, and all other levels are filled.

I know I want to end up with a heap–I need the performance–and will use that as a guiding factor in test-driving a solution. Does this violate YAGNI? Maybe. Other options? I could introduce a performance-related test, one where I tried to verify the big-O of each of the three operations (adding a new item, peeking at the highest priority item, and removing the highest priority item–the last two of which are often combined into a poll method). That would be tough to do well, and still might not guarantee an implementation that used a binary heap.

No, I need to be able to peek at the implementation. TDD will still be a great help. I still want to build the heap incrementally–there’s a decent amount of code involved, and plenty of opportunities to get it wrong. I’ll start simple, however, and not worry about heap characteristics until I get basic queue functionality working:

import org.junit.*;
import static org.junit.Assert.*;
import static org.hamcrest.CoreMatchers.*;

public class PriorityQueueTest {
   private static final Object ITEM_50 = "50";
   private static final Object ITEM_100 = "100";

   PriorityQueue queue;

   @Before
   public void create() {
      queue = new PriorityQueue();
   }

   @Test
   public void emptyWhenCreated() {
      assertThat(queue.isEmpty(), is(true));
   }

   @Test
   public void noLongerEmptyAfterAdd() {
      queue.add(ITEM_50, 50);

      assertThat(queue.isEmpty(), is(false));
   }

   @Test
   public void singletonQueueReturnsSoleItemOnPoll() {
      queue.add(ITEM_50, 50);

      assertThat(queue.poll(), is(ITEM_50));
   }

   @Test
   public void isEmptyAfterSoleElementRemoved() {
      queue.add(ITEM_50, 50);
      queue.poll();

      assertThat(queue.isEmpty(), is(true));
   }

   @Test
   public void returnsFIFOWithTwoAlreadyOrderedItems() {
      queue.add(ITEM_100, 100);
      queue.add(ITEM_50, 50);

      assertThat(queue.poll(), is(ITEM_100));
      assertThat(queue.poll(), is(ITEM_50));
      assertThat(queue.isEmpty(), is(true));
   }
}

I won’t bore you with the rote, incremental steps used to get to this point. For now the solution is simply a class that wraps an ordered list. There’s no support for pulling things out in other than FIFO order, since there are yet no tests that demand the capability.

import java.util.*;

public class PriorityQueue {
   private List<Object> heap = new ArrayList<Object>();

   public boolean isEmpty() {
      return heap.isEmpty();
   }

   public void add(Object item, int priority) {
      heap.add(item);
   }

   public Object poll() {
      return heap.remove(0);
   }
}

All those tests to get to this trivial bit of code? No big deal. About five minutes of work, and I’ve gotten some good cases documented and clamped down.

The next test makes things a little more interesting:

   @Test
   public void returnsInPriorityOrder() {
      queue.add(ITEM_50, 50);
      queue.add(ITEM_100, 100);

      assertThat(queue.poll(), is(ITEM_100));
      assertThat(queue.poll(), is(ITEM_50));
      assertThat(queue.isEmpty(), is(true));
   }

The implementation now requires actually using the priority, associating it with each item, hence the introduction of the Node class. It also means I need a little bit of “real heap logic” in the add method: If the second item is higher priority than the first, sift it to the front. Since there are only two items, it’s a hard-coded swap. The way I coded it also happens to work for single-element heaps.

import java.util.*;

public class PriorityQueue {
   private List<Node> heap = new ArrayList<Node>();

   class Node {
   final Object object;
   final int priority;

   Node(Object object, int priority) {
      this.object = object;
      this.priority = priority;
   }
   

   public boolean isEmpty() {
      return heap.isEmpty();
   }

   public void add(Object item, int priority) {
      Node node = new Node(item, priority);
      heap.add(node);
      if (node.priority > heap.get(0).priority)
         siftUp();
   }

   private void siftUp() {
      Node temp = heap.get(0);
      heap.set(0, heap.get(1));
      heap.set(1, temp);
   }

   public Object poll() {
      return heap.remove(0).object;
   }
}

Interestingly, I still don’t have a proper tree with parents and children–just a list with a couple slots filled in. In looking about at ways to implement a binary tree, the most appealing is to simply build it in-place in an array–something easily done given the constraint that the tree must be complete. Using an array will make it easy (and fast) to navigate from parent to child and vice versa.

The tests could use a bit of clarity. I add a helper method to simplify my assertion (and distill it to one line):

   @Test
   public void returnsFIFOWithTwoAlreadyOrderedItems() {
      queue.add(ITEM_100, 100);
      queue.add(ITEM_50, 50);

      assertQueuePollingOrder(ITEM_100, ITEM_50);
   }

   @Test
   public void returnsInPriorityOrder() {
      queue.add(ITEM_50, 50);
      queue.add(ITEM_100, 100);

      assertQueuePollingOrder(ITEM_100, ITEM_50);
   }

   private void assertQueuePollingOrder(Object... expectedItems) {
      List<Object> items = new ArrayList<Object>();
      while (!queue.isEmpty())
         items.add(queue.poll());
      assertThat(items, is(Arrays.asList(expectedItems)));
   }

Time to move on to the more interesting cases.

Thinking a bit about the next test (thinking is allowed, in fact encouraged, in TDD, particularly when determining what to do next)… Adding a third element to the heap will require a bit more generalized logic, and a fourth element (which would represent the third level of the heap’s tree) even more so. All complex enough without worrying about the complexity of removal, which requires re-heapifying.

One thing at a time. Rather than solve both challenges at once, I’ll first build out support for adding elements. Instead of using polling to verify that adding worked properly, I’ll verify that the priority queue’s underlying heap is valid.

Instead of creating a new test, I introduce a new assertion in each of my two existing tests (abusing the guideline of one assert per test).

   @Test
   public void returnsFIFOWithTwoAlreadyOrderedItems() {
      queue.add(ITEM_100, 100);
      queue.add(ITEM_50, 50);

      assertQueuePollingOrder(ITEM_100, ITEM_50);
      assertValid(queue.heap());
   }

   @Test
   public void returnsInPriorityOrder() {
      queue.add(ITEM_50, 50);
      queue.add(ITEM_100, 100);

      assertQueuePollingOrder(ITEM_100, ITEM_50);
      assertValid(queue.heap());
   }

   private void assertValid(List<PriorityQueue.Node> heap) {
      assertSubheap(heap, Integer.MAX_VALUE, 0);
   }

   private void assertSubheap(
         List<PriorityQueue.Node> heap, int priority, int root) {
      if (root > heap.size() - 1)
         return;

      PriorityQueue.Node node = heap.get(root);
      assertThat(priority > node.priority, is(true));

      assertSubheap(heap, node.priority, leftChild(root));
      assertSubheap(heap, node.priority, rightChild(root));
   }

   private int rightChild(int index) {
      return index * 2 + 2;
   }

   private int leftChild(int index) {
      return index * 2 + 1;
   }

Now I can quickly write the next failing test (with a lame name):

   @Test
   public void heapifiesThreeElements() {
      queue.add(ITEM_50, 50);
      queue.add(ITEM_75, 75);
      queue.add(ITEM_100, 100);

      assertValid(queue.heap());
   }

… and get it to pass quickly too:

   public void add(Object item, int priority) {
      Node node = new Node(item, priority);
      heap.add(node);
      int rightmost = heap.size() - 1;
      if (node.priority > heap.get(0).priority)
         siftUp(rightmost);
   }

   private void siftUp(int rightmost) {
      Node temp = heap.get(0);
      heap.set(0, heap.get(rightmost));
      heap.set(rightmost, temp);
   }

relevant lines: 4,6,9,11,12

Another test:

   @Test
   public void heapifiesMultipleLevels() {
      add(50, 75, 100, 125);
      assertValid(queue.heap());
   }

This time it requires a bit more manipulation of existing code. I rename siftUp to simply swap, because that’s all it does, and the “sift up” algorithm is in the body of the add method.

   public void add(Object item, int priority) {
      heap.add(new Node(item, priority));
      int current = heap.size() - 1;
      while (priority > parentPriority(current)) {
         swap(current, parent(current));
         current = parent(current);
      }
   }

   private int parentPriority(int index) {
      return heap.get(parent(index)).priority;
   }

   private int parent(int index) {
      return (index - 1) / 2;
   }

   private void swap(int x, int y) {
      Node temp = heap.get(x);
      heap.set(x, heap.get(y));
      heap.set(y, temp);
   }

It’s not bad, but I want each method to be at one level of abstraction, so I modify the add method:

   public void add(Object item, int priority) {
      addNewNode(item, priority);
      siftUpFromLast(priority);
   }

   private void addNewNode(Object item, int priority) {
      heap.add(new Node(item, priority));
   }

   private void siftUpFromLast(int priority) {
      int current = heap.size() - 1;
      while (priority > parentPriority(current)) {
         swap(current, parent(current));
         current = parent(current);
      }
   }

I think that’s it for add! To confirm, I use the example from Wikipedia:

   @Test
   public void integrationTest() {
      add(1, 2, 3, 7, 17, 19, 25, 36, 100);
      assertValid(queue.heap());
   }

Oh yeah–all those test lines required to add elements were getting annoying, so I added the following helper in the test (and updated other tests correspondingly):

   private void add(int... priorities) {
      for (int priority: priorities)
         queue.add(String.valueOf(priority), priority);
   }

On to polling. The general algorithm: capture value at front of heap (the highest priority per add‘s post-condition), move the rightmost item to the front, and re-heapify. To re-heapify: compare the item at the root with both its children, swap with the larger if necessary, and re-heapify the subtree.

From a test perspective, all we need to verify is that the highest priority item was returned and that the completeness post-condition held.

   @Test
   public void removeAnswersHighest() {
      add(20, 30, 40, 100, 50, 60);
      assertThat(queue.poll(), is((Object)"100"));
   }
   @Test
   public void removeReheapifies() {
      add(20, 30, 40);
      queue.poll();
      assertValid(queue.heap());
   }

After incrementally introducing changes based on those two test failures, here’s the explicit bit of code that covers lists up to three elements:

   public Object poll() {
      Object result = heap.remove(0).object;
      if (heap.size() == 2) {
         if (heap.get(0).priority < heap.get(1).priority)
            swap(0, 1);
      }
      return result;
   }

The addition of another item generates the next test failure:

   @Test
   public void removeReheapifyMustConsiderBothChildren() {
      add(40, 20, 30, 10);
      queue.poll();
      assertValid(queue.heap());
   }

I use programming by intention to express the algorithm clearly, starting with the poll method:

   public Object poll() {
      if (heap.size() == 1)
         return heap.remove(0).object;

      Object result = heap.get(0).object;
      reheapify();
      return result;
   }

   private void reheapify() {
      moveLastToFirstSlot();
      if (priorityLessThanEitherChild(0))
         swap(0, highestPriorityChild(0));
   }

The reheapify method has the odd side effect of eliminating the first (highest) element by overwriting it. That this is not explicit in the poll method suggests there’s probably a better expression of the entire algorithm.

I flesh out the trivial supporting methods:

   private void moveLastToFirstSlot() {
      Node rightmost = heap.remove(heap.size() - 1);
      heap.set(0, rightmost);
   }
private boolean priorityLessThanEitherChild(int index) {
      int priority = heap.get(index).priority;
      return priority < leftPriority(index) ||
             priority < rightPriority(index);
   }

   private int highestPriorityChild(int index) {
      return leftPriority(index) > rightPriority(index)
             ? leftChild(index)
             : rightChild(index);
   }

   private int rightPriority(int index) {
      return priority(rightChild(index));
   }

   private int leftPriority(int index) {
      return priority(leftChild(index));
   }

   private int priority(int index) {
      if (index > heap.size() - 1)
         return Integer.MIN_VALUE;
      return heap.get(index).priority;
   }

   private int leftChild(int index) {
      return index * 2 + 1;
   }

   private int rightChild(int index) {
      return index * 2 + 2;
   }

It’s a good chunk of code, but it still fit comfortably within my five-minute cycle limit.

Hey–I recognize those leftChild and rightChild methods from my test! I’ll go make ’em static so the test can re-use them.

About the only other interesting logic is rooted in the priority method, which returns Integer.MIN_VALUE if there’s no item at an index. That allows comparisons in priorityLessThanEitherChild to quietly fail when no child exists.

Almost there.  The final push is to force repeated swaps from root down through multiple subtrees:

   @Test
   public void removeMustReheapifyMultipleLevels() {
      add(50, 40, 20, 30, 10);
      queue.poll();
      assertValid(queue.heap());
   }

Getting a passing test requires a nice, small increment of code (and a TPP transformation from if=>while. Recursion is one up in the priority list–should I refactor this method?).

   private void reheapify() {
      moveLastToFirstSlot();
      int i = 0;
      while (priorityLessThanEitherChild(i)) {
         int child = highestPriorityChild(i);
         swap(i, child);
         i = child;
      }
   }

Both add and poll methods appear to be complete. Here’s a passing test that fires end-to-end:

   @Test
   public void pollsInPriorityOrder() {
      add(40, 20, 60, 80, 10, 30, 90, 70, 50);
      assertQueuePollingOrder(
         "90", "80", "70", "60", "50", "40", "30", "20", "10");
   }

Boy, do those quotes suck. Well, it’s time to do some global clean-up, so I’ll fix that along with some other things. Here’s the test class after some various cleanup for expressiveness:

import static org.hamcrest.CoreMatchers.*;
import static org.junit.Assert.*;
import java.util.*;
import org.junit.*;

public class PriorityQueueTest {
   PriorityQueue queue;

   @Before
   public void create() {
      queue = new PriorityQueue();
   }

   @Test
   public void emptyWhenCreated() {
      assertThat(queue.isEmpty(), is(true));
   }

   @Test
   public void noLongerEmptyAfterAdd() {
      queue.add("", 50);

      assertThat(queue.isEmpty(), is(false));
   }

   @Test
   public void singletonQueueReturnsSoleItemOnPoll() {
      queue.add("abc", 50);

      assertThat(queue.poll(), is((Object)"abc"));
   }

   @Test
   public void isEmptyAfterSoleItemRemoved() {
      queue.add("", 50);
      queue.poll();

      assertThat(queue.isEmpty(), is(true));
   }

   @Test
   public void returnsFIFOWithTwoAlreadyOrderedItems() {
      add(100, 50);
      assertQueuePollingOrder(100, 50);
      assertValid(queue.heap());
   }

   @Test
   public void returnsTwoUnorderedItemsInPriorityOrder() {
      add(50, 100);
      assertQueuePollingOrder(100, 50);
      assertValid(queue.heap());
   }

   @Test
   public void heapifiesThreeItems() {
      add(50, 75, 100);
      assertValid(queue.heap());
   }

   @Test
   public void heapifiesMultipleLevels() {
      add(50, 75, 100, 125);
      assertValid(queue.heap());
   }

   @Test
   public void removeAnswersHighest() {
      add(20, 30, 40, 100, 50, 60);
      assertThat(queue.poll(), is((Object)"100"));
   }

   @Test
   public void removeReheapifiesTwoItems() {
      add(20, 30, 40);
      queue.poll();
      assertValid(queue.heap());
   }

   @Test
   public void removeReheapifyMustConsiderBothChildren() {
      add(40, 20, 30, 10);
      queue.poll();
      assertValid(queue.heap());
   }

   @Test
   public void removeMustReheapifyMultipleLevels() {
      add(50, 40, 20, 30, 10);
      queue.poll();
      assertValid(queue.heap());
   }

   @Test
   public void pollExhaustsHeapInPriorityOrder() {
      add(40, 20, 60, 80, 10, 30, 90, 70, 50);
      assertQueuePollingOrder(90, 80, 70, 60, 50, 40, 30, 20, 10);
   }

   private void assertQueuePollingOrder(Integer... expectedItems) {
      List<Integer> items = new ArrayList<Integer>();
      while (!queue.isEmpty())
         items.add(Integer.parseInt((String)queue.poll()));
      assertThat(items, is(Arrays.asList(expectedItems)));
   }

   private void add(int... priorities) {
      for (int priority: priorities)
         queue.add(String.valueOf(priority), priority);
   }

   private void assertValid(List<PriorityQueue.Node> heap) {
      assertSubheap(heap, Integer.MAX_VALUE, 0);
   }

   private void assertSubheap(List<PriorityQueue.Node> heap, int priority, int root) {
      if (root > heap.size() - 1)
         return;

      PriorityQueue.Node node = heap.get(root);
      assertThat(priority > node.priority, is(true));

      assertSubheap(
         heap, node.priority, PriorityQueue.leftChild(root));
      assertSubheap(
         heap, node.priority, PriorityQueue.rightChild(root));
   }
}

And here’s the implementation:

import java.util.*;

public class PriorityQueue {
   private List<Node> heap = new ArrayList<Node>();

   static class Node {
      final Object object;
      final int priority;

      Node(Object object, int priority) {
         this.object = object;
         this.priority = priority;
      }

      public String toString() {
         return String.valueOf(priority);
      }
   }

   public boolean isEmpty() {
      return heap.isEmpty();
   }

   public void add(Object item, int priority) {
      addNewNode(item, priority);
      siftUpFromLast(priority);
   }

   private void addNewNode(Object item, int priority) {
      heap.add(new Node(item, priority));
   }

   private void siftUpFromLast(int priority) {
      int current = heap.size() - 1;
      while (priority > parentPriority(current)) {
         swap(current, parent(current));
         current = parent(current);
      }
   }

   private int parentPriority(int index) {
      return heap.get(parent(index)).priority;
   }

   private int parent(int index) {
      return (index - 1) / 2;
   }

   private void swap(int x, int y) {
      Node temp = heap.get(x);
      heap.set(x, heap.get(y));
      heap.set(y, temp);
   }

   public Object poll() {
      if (heap.size() == 1)
         return heap.remove(0).object;

      Object result = heap.get(0).object;
      reheapify();
      return result;
   }

   private void reheapify() {
      moveLastToFirstSlot();
      int i = 0;
      while (priorityLessThanEitherChild(i)) {
         int child = highestPriorityChild(i);
         swap(i, child);
         i = child;
      }
   }

   private void moveLastToFirstSlot() {
      Node rightmost = heap.remove(heap.size() - 1);
      heap.set(0, rightmost);
   }

   private boolean priorityLessThanEitherChild(int index) {
      int priority = heap.get(index).priority;
      return priority < leftPriority(index)
          || priority < rightPriority(index);
   }

   private int highestPriorityChild(int index) {
      return leftPriority(index) > rightPriority(index)
         ? PriorityQueue.leftChild(index)
         : PriorityQueue.rightChild(index);
   }

   int rightPriority(int index) {
      return priority(rightChild(index));
   }

   int leftPriority(int index) {
      return priority(leftChild(index));
   }

   private int priority(int index) {
      if (index > heap.size() - 1)
         return Integer.MIN_VALUE;
      return heap.get(index).priority;
   }

   List<Node> heap() {
      return heap;
   }

   static int rightChild(int index) {
      return index * 2 + 2;
   }

   static int leftChild(int index) {
      return index * 2 + 1;
   }
}

What else is there to complain about?

  • My test names could use some work. It’s hard to derive good test names when you’re driving incremental implementation of an algorithm, instead of an interface.
  • I probably have some tests I can get rid of.
  • The tests could be composed better. Worse, the test knows about the Node class (which the client need not know about). Maybe binary heap verification should be built into the heap implementation itself.
  • Should there be a PriorityQueue class and a separate Heap class? I thought about this a while and didn’t see the value.
  • I don’t use the Comparable interface to “simplify” and externalize comparisons (i.e. allow the client to specify how priorities compare).
  • I don’t have any tests that show what happens when priorities are equal.
  • I don’t use safety braces (i.e. braces around single-line if statements)–that’s why I have tests. I don’t make everything private that could be private (package access is fine by me, particularly in tests since I don’t have to type the access modifier).

What do I like?

  • I was able to test-drive the implementation, sticking to the five-minute rule for getting green bars. Things were a bit close a couple times (was there a more incremental way?).
  • I think the algorithm as implemented is very easy to follow (read each of the add and poll methods), particularly when compared to what is typically built (here’s one implementation; see, for example, the percolateDown method).

What do you think? Is test-driving a priority queue a good kata? I lay down the challenge flag for a better progression and implementation (shouldn’t be hard).

Don’t Stop Reading…

When writing, be careful of presenting an opposing point first with the notion that you’ll make your counterpoint in a subsequent paragraph or page. That first pitch is often as far as some readers get. Royce’s article “Managing the Development of Large Software Systems” led to decades of waterfall for this reason.

Recently, a pair-developer expressed concern over the use of multiple returns in a short method. I said I knew about the long-standing rule but that it created little value for very short methods. Two days later, I heard the same concern during another pairing session. Twice in one week? Curious!

I asked where they’d read about the rule. “Uncle Bob’s Clean Code” was the surprising response. “Hand me your copy!” I insisted, a bit worried, and quickly located the text in the functions chapter. The very last paragraph on page 48, starting a new section, states: “Dijkstra rules of structured programming… [say] …every function … should have one entry and one exit. Following these rules means that there should only be one return statement in a function…” Oh! Yeah, I learned that 30 years ago. Really, Bob?

Page 49, top paragraph: “”While we are sympathetic to the goals and disciplines of structured programming, these rules serve little benefit when functions are very small. It is only in larger functions that such rules provide significant benefits.”

Oh, ok. Whew! Crisis of faith averted.

Readers: Inattentive reading can be embarrassing!
Writers: Don’t assume readers will get to your point. Make it first.

Cute Little Abstractions

I’m refactoring several classes of over 2000 lines each. These are classic god-classes that mediate a handful of stubby little data-centric classes around them. I reduced the first, primary domain class by over 1100 lines, to under 1500, by simply pushing responsibilities around. Adding tests after the fact (using WELC techniques, of course) gave me the confidence to refactor. In just a few hours, I’ve been able to completely eliminate about 700 lines so far out of the 1100 that were shifted elsewhere. I’m loving it!

After cleaning up some really ugly code, I ended up with better (not great) code:

private boolean containsThreeOptionsSameType() {
   Map<String, Integer> types = new HashMap<String, Integer>();
   for (Option option : options)
      increment(types, option.getType());
   return count(types, Option.ALPHA) >= 3 || 
          count(types, Option.BETA) >= 3 || 
          count(types, Option.GAMMA) >= 3 || 
          count(types, Option.DELTA) >= 3;
}

private int count(Map<String, Integer> map, String key) {
   if (!map.containsKey(key))
      return 0;
   return map.get(key);
}

private void increment(Map<String, Integer> map, String key) {
   if (!map.containsKey(key))
      map.put(key, 1);
   else
      map.put(key, map.get(key) + 1);
}

I could clean up the complex conditional on the return statement (perhaps calling a method to loop through all types). Also, the count and increment methods contain a bit of duplication.

More importantly, the count and increment methods represent a missed abstraction. This is very common: We tend to leave little, impertinent, SRP-snubbing, OCP-violating methods lying about, cluttering up our classes.

(Aside: I’m often challenged during reviews about exposing things so I can test them. Long-term, entrenched developers are usually very good about blindly following the simple rule about making things as private as possible. But they’re perfectly willing to throw OO concept #1, cohesion, completely out the window.)

I decided to test-drive the soon-to-be-no-longer-missing abstraction, CountingSet. I ended up with an improved implementation of increment by eliminating the redundancy I spotted above:

import java.util.*;

public class CountingSet<T> {
   private Map<T, Integer> map = new HashMap<T, Integer>();

   public int count(T key) {
      if (!map.containsKey(key))
         return 0;
      return map.get(key);
   }

   public void add(T key) {
      map.put(key, count(key) + 1);
   }
}

Some developers would be appalled: “You built a cruddy little class with just four lines of real code? And for only one client?” And further: “It’s only a subset of a counting set, where are all the other methods?”

Sure thing, buddy. Here’s my client:

private boolean containsThreeOptionsSameType() {
   CountingSet<String> types = new CountingSet<String>();
   for (Option option : options)
      types.add(option.getType());
   return types.count(Option.ALPHA) >= 3 || 
          types.count(Option.BETA) >= 3 || 
          types.count(Option.GAMMA) >= 3 || 
          types.count(Option.DELTA) >= 3;
}

My client now only contains intent. The small ugliness with the parameterized map and boxing is now “information hidden.” I also have a simple reusable construct–I know I’ve had a similar need several times before. As for building and/or using a fully-functional CountingSet, there’s something to be said for creating “adapter” classes with the smallest possible, and most meaningful, interface.

Good enough for now. Don’t hesitate to create your own cute little abstractions!

Convention and Case

I’m learning Grails. I found a good tutorial written by Jason Rudolph. It does a great job of covering a broad range of the typical things you’d want to do in putting up a quick web site.

Unfortunately, it’s for an older version of Grails, but so far that hasn’t been a barrier; most of the changes I need to make are self-explanatory. I’m almost finished, and feel I have a good grasp on using Grails.

Still, I struggled for about two hours last night on a problem in going through the tutorial. Here’s a form I defined.

<g:form controller="user" method="post">
    ...
    <g:actionSubmit value="Log In"/>
    ...
</g:form>

And here’s some of the controller:

class UserController extends BaseController {
  def beforeInterceptor =
 [action:this.&auth, except: ['login', 'logout']]

  def index = { redirect(action:list,params:params) }

  def allowedMethods = [delete:'POST', save:'POST', update:'POST']

  def login = {
   // ... 
  }
  // ...
}

I think that’s all that’s relevant to show. Those of you who know Grails know how the form knows which controller method to call; I wasn’t paying enough attention and missed the simple cause of my problem, which was that a simple form submit from the login page kept generating a 404, URL not found. It showed me the request URI:

RequestURI=/racetrack/user/index

Yet that is a valid URI (index redirects, of course). In fact, if I copied the URL grails generated, and pasted it into the address bar, the proper page came up. I tried a few things, including explicitly specifying the action:

<g:form controller="user" method="post" action="login">

No dice, same problem:

HTTP ERROR: 404

NOT_FOUND

RequestURI=/racetrack/user/login

Frustrated, I tried a few other things. Still no dice. I looked at the HTML provided in the book and looked to ensure I typed it correctly (I almost always type my own sample code rather than paste it, I learn better that way). Looks pretty much the same. Web search, no exact match on my same problem; found a couple odd things that it might be, tried ’em, not the problem.

When in doubt, really make sure you have exactly the same thing. Whitespace and case. Case couldn’t possibly matter on button text, could it? Well, yes, especially when you follow this notion of programming by convention or programming by default or whatever you want to call it. An actionSubmit defines the controller method, either explicitly or implicitly. Explicitly, you can simply say:

<g:actionSubmit value="Log In" action="login"/>

In the absence of the action attribute, it uses the value attribute. Apparently, it lowercases the first letter for you, but then simply removes spaces, not lower-casing any subsequent letters. Thus it was looking for a controller action named “logIn,” apparently. The tutorial code has it typed as “Log in.” I typed it “Log In” (I thought it looked nicer). Harumph.

Dumb on my part, for not paying enough attention to how the form was supposed to figure out which controller method to call (I hadn’t figured this out yet, and thought this was another “by convention” element, perhaps from somewhere else). Dumb on Groovy’s part, for not showing me the URI with the improper casing.

Lesson for me: Pay more attention. Lesson for tools that play by convention: If you’re going to do something as clever as use button text to define an action, make sure you are picky about case everywhere, and make it clear from whence that action name came (i.e. better error messages). Lesson for tutorial writers: It’s hard work to write a great tutorial. A good one gets you through all the happy path circumstances. A great one helps you out with all the dumb mistakes you’ll inevitably make.

Violating Standards in Tests

Should your test code be subject to the same standards as your production code? I believe there should be different sets of standards. I am collecting interesting idioms that are normally shunned in good production code, but are acceptable and advantageous in test code.

An obvious example is the use of macros in tools like James Grenning’s CppUTest. The testing framework (an updated version of CppUnitLite) requires programmers to use macros to identify test functions:

TEST(HelloWorld, PrintOk)
{
   printHelloWorld();
   STRCMP_EQUAL("Hello World!n", buffer); 
}

No self-respecting C++ programmer uses macros anymore to replace functions; per Scott Meyers and many others, it’s fraught with all sorts of problems. But in CppUTest, the use of macros greatly simplifies the work of a developer by eliminating their need to manually register the name of a new test with a suite.

Another example is the use of import static in Java. The general rule of thumb, suggested by Sun themselves, is to not overuse import static. Deleting the type information for statically scoped methods and fields can obscure understanding of code. It’s considered appropriate only when use of static elements is pervasive. For example, most developers faced with coding any real math do an import static on the java.lang.Math functions.

However, I use static import frequently from my tests:

import static org.junit.Assert.*;
import org.junit.*;
import static util.StringUtil.*;

public class StringUtilCommaDelimitTest {
   @Test public void degenerate() {
      assertEquals("", commaDelimit(new String[] {}));
   }

   @Test public void oneEntry() {
      assertEquals("a", commaDelimit(new String[] {"a"}));
   }
   ...
}

Developers unquestioningly use import static for JUnit 4 assertions, as they are pervasive. But the additional use here is for the method commaDelimit, which is defined as static in the target class StringUtil. More frequently, I’ll have the test refer to (statically defined) constants on the target class. For a test reader, it becomes obvious where that referenced constant would be defined.

What other coding standards are appropriate for tests only, and not for production code?

Test Abstraction

I’m staring at a single CppUnit test function spanning hundreds of source lines. The test developer inserted visual indicators to help me pick out the eight test cases it covers:

//++++++++++++++++++++++++++++++++++++++++++++++++++

Each of these cases is brief: four to eight lines of data setup, followed by a execution statement enclosed in a CPPUNIT_ASSERT. Of course they could be broken up into eight separate test functions, but otherwise they are reasonable.

Prior to the eight tests there are two hundred lines of setup code. Most of the initialization sets data to reasonable default values so that the application code won’t crash and burn while being exercised.

I don’t know enough about the test to judge it in terms of its appropriateness as a “unit” test. It seems more integration test than anything. But perhaps all I would need to do is cleverly divorce the target function from all of those data setup dependencies, and break it up into eight separate test functions.

The aggregation of tests is typical, and no doubt comes from a compulsion to not waste all those 200 lines of work! The bigger problem I have is the function’s lack of abstraction. Uncle Bob always says, “abstraction is elimination of the irrelevant and amplification of the essential.” When it comes down to understanding tests, it is usually a matter of how good a job the developer was at abstracting intent. Two hundreds of lines of detailed setup does not exhibit abstraction!

For a given unit test, I always want to know why a given assertion should hold true, based on the setup context. The lengthy object construction and initialization should be encapsulated in another method, perhaps createDefaultMarket(). Relevant pieces of data can be layered atop the Market object: applyGroupDiscountRate(0.10), applyRestrictionCode(), etc. Not only does it help explain the data differences and correlate the setup with the result, it makes it easier to read the test, and easier to write new tests (reuse!).

I often get blank stares when I ask developers to make their tests more readable. Would they respond better to requests to improve their use of abstraction?

Code to Discuss

A good software team knows to talk openly about the code. One suggestion is to get together once a day, maybe at day end, to just talk about or look at some code, even if you are already pairing. The get-together need not take long nor be formal in any manner. Fifteen to thirty minutes should be sufficient. Toss some code up on the wall using an LCD projector and have at it.

Initially, such discussions can be touchy, as critique can be mistakenly viewed as insults or attacks. Over time, the regular exercise of discussing code should form a group that better understands how to communicate effectively with each other.

What to talk about? I brainstormed some ideas for a specific set of teams, things I knew they needed to discuss, and provide that starter list here. Given the almost limitless number of reasons to talk about the code, there is little excuse to forego the forum.

  • “here’s something cool that I did today!”
  • “how could I better express this test?”
  • “here is a mocking technique I found valuable”
  • “this test didn’t make sense to me”
  • “this area of the code concerns me”
  • test granularity
  • test naming
  • BDD
  • how TDD changed my design
  • naming standards
  • coding standards
  • warning standards
  • JUnit 4.4+: Hamcrest matchers, theories, etc.
  • “we’re not writing enough tests”
  • code coverage (taking a look at the actual lines exercised by tests)
  • WTFs in the code or test. Can be amusing, but watch the toes!
  • test execution speed. Splitting into slow/fast suites.
  • identifying and speeding up slow tests
  • test smells
  • various forms of test double (see XUnit Patterns)
  • simple design vs SOLID
  • “we’re not refactoring our production code enough”
  • “we’re not refactoring our tests enough”
  • object mothers for tests
  • “the build is a pain in the rear”
  • looking at a new mock tool like Mockito

And so on. Perhaps get a volunteer (or two) to “open” a topic each week. They talk for a few minutes, and then you open things up for group discussion.

Scaring You Straight: Legacy Madness

legacyMadness

I saw this classic comic book cover and couldn’t resist. There could be a whole True Coding Crime series.

Comments and Footnotes

Will I ever stop railing on about comments? I don’t think so. As long as it’s possible to write code in a less than clear manner, people will always be compelled to explain their coding deficiencies. And I will always be compelled to resist.

As a writer, I try to ensure that the main text in a book or article is clear to the reader. Every rare once in a while (perhaps a couple dozen times in a book), I feel compelled to write extra information, something that supplements the text. I do so in the form of a footnote. The footnote provides non-essential information. It’s something that might help add some insights to what I just wrote, perhaps, or a humorous but otherwise distracting note.

My editors would never let me use footnotes to re-explain a passage of text. A coder attempting to salvage bad code with comments is like me adding footnotes to try and explain poorly written text. If the mainline text didn’t explain it well, why should a footnote explain it any better?

Footnotes and comments are ultimately a distraction. Ever try to read a book with footnotes on almost every page (and I’m not talking citations or references)? You may as well be reading two books, in some cases. As writers, we owe it to our readers write more clearly and to minimize these distractions.

 

Comments:

Excellent analogy Jeff.

Today I was asked what I think about assertThat for jUnit4 framework. It brings nothing more than a little bit more verbosity for tests, but won’t help people write better tests. Those , whom tests are of a poor readability and quality will remain the same with a new constructions, too.

Cryptics

In posting messages about the new for-each loop syntax in Java 5.0, many developers have complained about its crypticism. A for-each loop might look like:

for (Employee employee: department)
 terminate(employee);

You would read this for-each loop as, “for each employee (of type Employee) in department).” Many developers would have preferred something like:

for (Employee employee in department)
 terminate(employee);

But alas, Sun resists new Java keywords like “in” because they have the potential to break gobs of existing code.

It’s not that big a deal to me, since I view the loop construct as idiomatic. Once I saw it and it was explained to me, or perhaps twice, it became ingrained. Subsequent encounters required no such explanation or even translation time.

Nonetheless, the new for-each loop is a slightly cryptic construct. Not quite as bad as the ternary operator (conditional ? true-value : false-value), but still not obvious the first time. Some developers can’t stand it.

My take is that Java is a fairly cryptic language to start with, particularly if you have no background in C-based languages (C, C++, Objective C, C#, and whatever else). The old C-style for loop isn’t obvious either. Typical C code is rife with odd characters such as *, %, &, {, }, and |.

This obvious observation led me to create Jeff’s hypothesis of cryptics:

Jeff’s Definition of Cryptics: The crypticism of a language is directly proportional to the mean number of shift keystrokes required.

It’s a pretty useless definition, granted. But taking a look at a couple languages that appear on opposite ends of the spectrum shows that there is some validity. Curly braces in C-based languages { use the shift key, as does * (pointer dereference). The new generics stuff in 1.5 uses angle brackets < and >, both requiring use of shift. The tertiary operator uses both : and ?. Strings are set off by use of the ” character. And C uses lots of parentheses.

Smalltalk, in contrast, uses brackets ([ and ]) to set off blocks. No shift-combination necessary! Boolean operators are spelled out instead of using & and |. Strings are set off by use of the ‘ character. The major exception to the rule is the use of the colon to designate a selector (but this also suggests that perhaps you should pass fewer parameters to your methods).

Of course, crypticism is related to how easy it is to read text, not write it. But it’s no coincidence that the cryptics require shift-combinations on a standard keyboard or typewriter. These are the characters that make things difficult to read when used to excess.

I was curious enough to pose the idea to Dan Ingalls, who had a lot to do with the design of early Smalltalk implementations. He indicated that indeed there was a conscious effort to reduce the number of shift keystrokes required. This was due to the fact that one goal for Smalltalk was that children would be able to program in it.

Lots of shift-key combinations = lots of crypticism. I’m still amazed at the fact that modern languages retain so much of the unnecessary constructs and ugliness of a language that is 30 years old.

Atom