Clean Tests: Common Concerns, Part 1

The unit tests you produce in TDD represent an investment. You’ll want to ensure they continue to return on value and not turn into a liability.


Magnetospheric Multiscale
“Magnetospheric Multiscale,” courtesy NASA Goddard Space Flight Center.License.

As a first step, find the right frame of mind. Start by thinking about your unit tests as part of your production system, not as an adjunct effort or afterthought. We consider the diagnostic devices built into complex machines (e.g. automobiles) as part of the product. Software is a complex machine. Your tests provide diagnostics on the health of your software product.

Here are some common concerns about TDD:

  • The tests will have their own defects.
  • I’ll have twice as much code to maintain.
  • The tests are handcuffs.
  • Testing everything (via TDD) results in diminishing returns.

I’ll tackle each one of these in a separate post, offering some thoughts about how to reduce your anxiety level. (No doubt I’ll add other concerns along the way.) Then I’ll move on to how to craft tests that are more sustainable and anxiety-free.

The tests will have their own defects.

It is absolutely possible for a unit test to be defective. Sometimes it’s not verifying what you think it is, sometimes you’re not executing the code you think you are, sometimes there’s not even an assert, and sometimes things aren’t in the expected state to start with.

Strictly adhering to the TDD cycle (red-green-refactor) almost always eliminates this as a real-world problem. You must always see the completed test fail at least once, and for the right reason (pay attention to the failure messages!), before making it pass. And writing just enough code to make it pass should help reinforce that it was *that code* that passed the test.

Other ways to ensure your tests start and stay honest:

  • Pair or mob (or review the tests carefully with some other form of review).
  • Minimize use of complex test doubles, such as fakes.
  • Avoid partial mocks at all costs.
  • Don’t mock past immediate collaborators.
  • Employ contract tests.
  • Fix the design issues that demand complex test setup.
  • Stick to single-behavior tests.
  • Design tests for clarity. Emphasize test abstraction and employ AAA.
  • Always read the tests carefully as your point of entry to understanding.
  • Improve each test, as warranted, each time you touch it.

Next: The tests are handcuffs.

The Compulsive Coder, Episode 6: this Duplication is Killing Me

A few months ago, I got caught up in a series of contentious debates about duplication in test code. I’m the sort who believes that most of our design heuristics–simple design in this case–fall victim to the reality of the pirate’s code: “The code is more what you’d call ‘guidelines’ than actual rules.” Which is OK by me; rules are made to be broken.

this
Image courtesy drivethrucafe; license

Unit tests are different beasts than production code. They serve a different purpose. For the most part, they can and should follow the same standards as production code. But since one of their primary purposes is to provide summary documentation, there are good reasons to adopt differing standards. (I solicited some important distinctions on Twitter a few months ago; this will be fodder for an upcoming post.)

With respect to Beck’s four rules of simple design, many have contended that expressiveness slightly outweighs elimination of duplication, irrespective of being production code or tests. With production code, I’m on the fence.

But for tests? The documentation aspect of tests is more relevant than their adherence to OO design concepts. Yes, duplication across tests can cause some future headaches, but the converse–tests that read poorly–is a more typical and bigger time-waster. Yes, get rid of duplication, absolutely, but not if it causes the readability of the tests to suffer.

My dissenting converser was adamant that eliminating duplication was king, citing Beck himself in Test Driven Development by Example (2002), on page one, no doubt: Step four in the rhythm of TDD says you run all the tests, and step five, “Refactor to remove duplication.” In that summary of steps, Beck suggests no other reasons to refactor! I can understand how a literalist-slash-originalist might be content with putting their foot down on this definition, never mind that “refactoring” has a broader definition anyway, never mind Beck’s simple design to the contrary, and never mind 14 years passage in which we’ve discussed, reflected, and refined our understanding of TDD.

Why a contentious debate? The dissenter was promoting a framework; they had created some hooks designed to simplify the creation tests for that framework. A laudable goal, except for the fact that I had no clue what the tests were actually proving. That just wads my undies.

Where am I going with all this? Oh yeah. So as I watched the dissenter scroll through some of their tests & prod code onscreen, I noted this repeated throughout:

public void doSomeStuff() {
   this.someField = someMethod();
   int x = this.someField + 42;
   // etc.
}

Apparently their standard is to always identify field usage by scoping it with this. (I bit my lip, because at that point I was weary of the protracted debate, and figured I could write about it later. Here I am.)

“Hey, over-using this might be a good idea, because we can more easily identify where fields are used, which is important information!”

OK, I’m offended that duplication-monists don’t view the use of this in this case (i.e. not to disambiguate) as unnecessary duplication. It is. Squash it. Instead, use the power of your sophisticated IDEA or Eclipse IDEA, and colorize fields appropriately. (I like orange.) They’ll stand out like sore thumbs.


Previous Compulsive Coder blog entry: Extract Method Flow
Next Compulsive Coder blog entry: Please AAA

NetBeans

I was very excited to start using Infinitest once again. But as things seem to go with me, I ended up moving on to something else that dominated my mindshare–in this case, a new project where I am using NetBeans. Why NetBeans? Because someone set up the project to use Maven, and all I’ve heard is that NetBeans is magical when it comes to integrating with Maven (and Eclipse, not so much).

As I would expect, both Maven and Maven support in NetBeans are really nice as long as they’re working well. But Maven seems to adhere to the 80/20 rule. It’s great 80% of the time, up until you have to do something a little out of the ordinary, or when things aren’t working like they’re supposed to. Then the numbers reverse–you spend 80% of your time trying to work on the remaining 20%. In the case of Maven, I’m struggling with getting coverage reports against multiple modules, and it’s also presenting me with a defect (operator error?) where NetBeans reports half the Maven modules as “badly formed Maven projects.” Google searches, and assistance from others with more Maven experience than me, have not been very fruitful.

This is the third time I’ve used Maven, and I’m warming up to it more each time, but it’s still been a frustration. Part of frustration is always lack of knowledge, particularly around the behind-the-scenes workings that you need to know when you hit the 20% esoterica. Thanks to those who have helped!

On the other hand, this is also the third time I’ve used NetBeans heavily. I’m not really warming to it much. Granted, part of my disappointment with NetBeans is the same problem. The best way to learn not to hate something is to pair with someone who knows it well, and I’ve not been able to get much pairing time.

Other than the Maven support, NetBeans seems much weaker than Eclipse. It’s slower (e.g. a Ctrl-O to search for types takes many seconds to return class names, compared to Eclipse’s Ctrl-Shift-t which is almost immediate). It’s flakier (e.g. half the time, ctrl-F6 to run a test pops up its JUnit GUI, the other half the time I see only the console output; also, the formatting mechanism appears to be broken). It seems to be the last of the three main Java IDEs for which people write plugins (e.g. Infinitest is only available for Eclipse and IDEA, not NetBeans). And it’s just missing key important things (e.g. there is no inline method refactoring until NetBeans 7.0).

I can live without every last bell and whistle. But it’s always disappointing to see other people whistle in cool little ways, like Kerievsky’s nice trick with inline method, and not get to play along.

Comments:

There is no way to read the Kerievsky trick without becoming a member of the Yahoo group. It would be helpful to cut and paste the content.
# posted by Anonymous Anonymous : 5/08/2010 12:59:00 AM
Aug 5, 2009, from Joshua K:

One of the refactoring strategies I defined and use quite a bit is called Parallel Change. You put new behavior next to old behavior (wherever it is) and then do a Gradual Cutover from old to new. When you’re done cutting
over to the new code, you need to get rid of the old code.

Say you’ve got some old method that was called in 10 places in a class. The new code, which had been placed under each of the 10 calls to the old code, is now doing the real work. You don’t need the old method and the calls to it.

In the past, I would delete the method, then follow the chain of red marks to delete each call to the old method. That was before a smart student pointed out a better way:

* Delete the body of the old method
* Inline the old method – effectively inlining “nothing”

That approach is much faster and far more satisfying.

Is this something folks already do? It had not occurred to me to use inline in that way to delete code.

This does go along with an observation I’ve made that the better you get at refactoring, the more you use the Inline Method refactoring.
posted by Blogger Jeff L. : 5/10/2010 07:44:00 AM

Too Much Incremental Refactoring

Having reasonably comprehensive tests is great. I currently have a good-sized codebase for an application I’m working on. In one key subsystem, representing about 1/6 of the whole application so far, there are about 25 classes and 86 code-level tests. Most of the classes are small, one to three methods. Coverage is in the very high 90s. Sixteen of the 86 tests are necessarily “slow” tests (centered around persistence), so I run them in a separate suite. Combined, they take 3/4 of a second; the “fast” tests run almost instantly.

I’ve been very happy with the design and implementation of this subsystem so far, with the exception of one central class for which the tests are a bit ugly. Up until now, I’ve been able to introduce significant new features in very short order, with minimal shotgun surgery and no fear of breaking anything (the sixteen integration tests are a key part of eliminating this fear).

But I just introduced one would-be small change that introduced significant pain. There were two causes of the pain: one, I chose to use an array to represent one significant collection, and two, I chose to expose that implementation through parameter passing. Why did I choose an array? Well, it represented a fixed collection of data for a class, which means it was easiest to code as an array (think initialization). It also made for effective tests, particularly given the expressiveness of varargs.

My new change pushed me to prefer a parameterized collection instead of an array. So I bit the bullet and figured I’d just incrementally refactor my way away from the array. Worked great! In an incremental fashion–with no one step taking more than 5 minutes before getting a green bar (and most taking 30 seconds)–I got to my desired solution steady and sure. I made a couple simple mistakes along the way, and relished the way the red bar immediately informed me of my stupidity. The problem, though, was that there were far too many of these wonderfully short steps, adding up to about an hour of work. “Ok, made the change here…ohhh, there, it ripples to a half dozen other methods!”

The lesson that I was reminded of was that I would have been much better off having encapsulated the collection early on. Instead of passing about Widget[] or List<Widget>, this change would have been considerably simpler had I created a type named WidgetList. I would have had to expose some overloaded “bridge” methods while I fixed the tests that used varargs, but in 20/20 hindsight, the change would have taken me maybe 10 minutes.

Of course I knew better, even in the heady moments of thinking the arrays made things a bit simpler. There’s plenty of literature that covers this, and I’ve read it. I’ve chastised others for the same thing plenty of times, even recently. But for whatever reason, I didn’t bother. Laziness, not paying enough attention, not caring enough. Of course it’s a violation of encapsulation, but technically I can also look at my problem as having allowed the duplication of my choice for the collection’s implementation to run rampant throughout the system.

A pair would probably have squashed the potential headache early on. Maybe. Take a look at your code–where are you passing around exposed collections?

Request for a Blog Pair

I encountered a bit of code recently that screams out for refactoring. Problem is, I’m not sure of the most effective way to do this.

The context: this is a tic-tac-toe game that Paul Nelson and I paired on, in order to explore mobile device programming (Paul has an iPhone, I have a BlackBerry). Paul’s not here right now to pair, unfortunately, hence the blog request (a really slow way of virtual pairing).

Here’s the relevant code:

private boolean isRowWinner(int row) {
   for (int column = 0; column &lt; SIZE; column++)
      if (grid[row]
!= currentPlayer) return false; return true; } private boolean isColumnWinner(int column) { for (int row = 0; row &lt; SIZE; row++) if (grid[row]
!= currentPlayer) return false; return true; } private boolean isDescendingDiagonalWinner() { for (int i = 0; i &lt; SIZE; i++) if (grid[i][i] != currentPlayer) return false; return true; } private boolean isAscendingDiagonalWinner() { for (int i = 0; i &lt; SIZE; i++) if (grid[i][SIZE - 1 - i] != currentPlayer) return false; return true; }

The field grid is simply a 3×3 matrix of Player references; Player is an enum; currentPlayer is a Player reference.

What’s the most effective way to simplify these four methods, seemingly rampant with duplication? I’m not looking for a “clever” solution; I’m looking for one that still retains high levels of expressiveness.

Original Comments:

No test??? :-p

Here’s my refactoring. It removes duplications, but I’m not sure if it is more expressive.

private boolean isMarkedByCurrentPlayer(int row, int column) {
return grid[row]

== currentPlayer;
}

private boolean isWinner(int a, int b, int bb) {
for (int i = 0; i < SIZE; i++) {
if (!isMarkedByCurrentPlayer(i * a, i * b + bb))
return false;
}
return true;
}

private boolean isWinner(int a, int b) {
return isWinner(a, b);
}

private boolean isRowWinner(int row) {
return isWinner(0, 1);
}

private boolean isColumnWinner(int column) {
return isWinner(1, 0);
}

private boolean isDescendingDiagonalWinner() {
return isWinner(1, 1);
}

private boolean isAscendingDiagonalWinner() {
return isWinner(1, -1, SIZE – 1);
}

One mistake. Should be:

private boolean isWinner(int a, int b) {
return isWinner(a, b, 0);
}

Ah yes, tests… Of course there are tests. 🙂 But now you’ll probably want *all* of the code. I’ll post a zip file tonight, but here are the tests in raw form (this comment mechanism doesn’t allow pre tags):

package domain;

import org.junit.*;
import static org.junit.Assert.*;
import static domain.Player.*;

public class GameTest {
private Game game;

@Before
public void initialize() {
game = new Game();
}

@Test
public void create() {
assertGrid0(” “); // using this form,
assertGrid1(” “); // source can be formatted,
assertGrid2(” “); // but still retain visual usefulness
assertFalse(game.isOver());
}

@Test
public void firstMoveAlwaysX() {
game.makeAMove(1, 1);

assertGrid0(” “);
assertGrid1(” X “);
assertGrid2(” “);
}

@Test
public void secondMoveMadeByY() {
game.makeAMove(1, 1);
game.makeAMove(1, 0);

assertGrid0(” “);
assertGrid1(“OX “);
assertGrid2(” “);
}

@Test
public void movesAlternateXandY() {
game.makeAMove(1, 1);
game.makeAMove(1, 0);
game.makeAMove(2, 0);

assertGrid0(” “);
assertGrid1(“OX “);
assertGrid2(“X “);
}

@Test
public void movesToTakenSquaresDisallowed() {
try {
game.makeAMove(1, 1);
game.makeAMove(1, 1);
} catch (IllegalMoveException expected) {
assertTrue(expected.getMessage().contains(“[1, 1]”));
assertGrid0(” “);
assertGrid1(” X “);
assertGrid2(” “);
}
}

@Test
public void xWinsByFillingTopRow() {
assertNonWinningMove(X, 0, 0);
assertNonWinningMove(O, 1, 0);
assertNonWinningMove(X, 0, 1);
assertNonWinningMove(O, 1, 1);
assertWinningMove(X, 0, 2);

assertGrid0(“XXX”);
assertGrid1(“OO “);
assertGrid2(” “);
}

@Test
public void oWinsByFillingTopRow() {
assertNonWinningMove(X, 1, 0);
assertNonWinningMove(O, 0, 0);
assertNonWinningMove(X, 1, 1);
assertNonWinningMove(O, 0, 1);
assertNonWinningMove(X, 2, 2);
assertWinningMove(O, 0, 2);

assertGrid0(“OOO”);
assertGrid1(“XX “);
assertGrid2(” X”);
}

@Test
public void oWinsByFillingAnyRow() {
assertNonWinningMove(X, 1, 0);
assertNonWinningMove(O, 2, 0);
assertNonWinningMove(X, 1, 1);
assertNonWinningMove(O, 2, 1);
assertNonWinningMove(X, 0, 2);
assertWinningMove(O, 2, 2);

assertGrid0(” X”);
assertGrid1(“XX “);
assertGrid2(“OOO”);
}

@Test
public void completedMixedRowNotAWin() {
assertNonWinningMove(X, 0, 0);
assertNonWinningMove(O, 0, 1);
assertNonWinningMove(X, 0, 2);
}

@Test
public void xWinsByFillingFirstColumn() {
assertNonWinningMove(X, 0, 0);
assertNonWinningMove(O, 2, 1);
assertNonWinningMove(X, 1, 0);
assertNonWinningMove(O, 2, 2);
assertWinningMove(X, 2, 0);

assertGrid0(“X “);
assertGrid1(“X “);
assertGrid2(“XOO”);
}

@Test
public void oWinsByFillingAnyColumn() {
assertNonWinningMove(X, 0, 0);
assertNonWinningMove(O, 2, 1);
assertNonWinningMove(X, 1, 0);
assertNonWinningMove(O, 1, 1);
assertNonWinningMove(X, 1, 2);
assertWinningMove(O, 0, 1);

assertGrid0(“XO “);
assertGrid1(“XOX”);
assertGrid2(” O “);
}

@Test
public void xWinsByFillingDescendingDiagonal() {
assertNonWinningMove(X, 0, 0);
assertNonWinningMove(O, 0, 1);
assertNonWinningMove(X, 1, 1);
assertNonWinningMove(O, 0, 2);

assertGrid0(“XOO”);
assertGrid1(” X “);
assertGrid2(” “);

assertWinningMove(X, 2, 2);
}

@Test
public void yWinsByFillingAscendingDiagonal() {
assertNonWinningMove(X, 0, 0);
assertNonWinningMove(O, 0, 2);
assertNonWinningMove(X, 0, 1);
assertNonWinningMove(O, 1, 1);
assertNonWinningMove(X, 1, 0);

assertGrid0(“XXO”);
assertGrid1(“XO “);
assertGrid2(” “);

assertWinningMove(O, 2, 0);
}

@Test
public void draw() {
assertNonWinningMove(X, 0, 0);
assertNonWinningMove(O, 0, 2);
assertNonWinningMove(X, 1, 2);
assertNonWinningMove(O, 1, 1);
assertNonWinningMove(X, 0, 1);
assertNonWinningMove(O, 2, 1);
assertNonWinningMove(X, 2, 0);
assertNonWinningMove(O, 1, 0);

assertGrid0(“XXO”);
assertGrid1(“OOX”);
assertGrid2(“XO “);

game.makeAMove(2, 2);
assertTrue(game.isOver());
assertEquals(Player.NO_ONE, game.winner());
}

private void assertNonWinningMove(Player player, int row, int col) {
assertEquals(player, game.currentPlayer());
game.makeAMove(row, col);
assertFalse(moveString(row, col) + ” unexpectedly completed game”, game.isOver());
}

private void assertWinningMove(Player winner, int row, int col) {
assertEquals(winner, game.currentPlayer());
game.makeAMove(row, col);
assertTrue(moveString(row, col) + ” unexpectedly did not win game”, game.isOver());
assertEquals(winner, game.winner());
}

private void assertGrid0(String row) {
assertRow(row, 0);
}

private void assertGrid1(String row) {
assertRow(row, 1);
}

private void assertGrid2(String row) {
assertRow(row, 2);
}

private void assertRow(String row, int rowIndex) {
for (int colIndex = 0; colIndex < row.length(); colIndex++) {
Player player = Player.toEnum(row.charAt(colIndex));
assertEquals(moveString(rowIndex, colIndex), player, game.square(rowIndex, colIndex));
}
}

private String moveString(int rowIndex, int colIndex) {
return “move at [” + rowIndex + “,” + colIndex + “]”;
}
}

Bartosz–I have two failing tests after applying your code. Email me and I’ll send the code back. I like where it’s headed but it needs better names for the variables, and some of the magic numbers need explanation.

So, it’s the next proof of refactoring without tests being a bad idea (or refactoring with tests written after, because that is what I did).

private static final int[] COLUMN = new int[] { 1, 0, 0 };

private static final int[] ROW = new int[] { 0, 1, 0 };

private static final int[] ASC_DIAGONAL = new int[] { 1, -1, SIZE – 1 };

private static final int[] DESC_DIAGONAL = new int[] { 1, 1, 0 };

private boolean isMarkedByCurrentPlayer(int row, int column) {
return grid[row]

== currentPlayer;
}

private boolean isWinner(int row, int column, int[] corr) {
for (int i = 0; i < SIZE; i++) {
if (!isMarkedByCurrentPlayer(row + i * corr[0], column + i * corr[1] + corr[2]))
return false;
}
return true;
}

public boolean isRowWinner(int row) {
return isWinner(row, 0, ROW);
}

public boolean isColumnWinner(int column) {
return isWinner(0, column, COLUMN);
}

public boolean isDescendingDiagonalWinner() {
return isWinner(0, 0, DESC_DIAGONAL);
}

public boolean isAscendingDiagonalWinner() {
return isWinner(0, 0, ASC_DIAGONAL);
}

Still, some magic numbers could be removed.

Jeff, I couldn’t figure out how to leave a trackback, but I posted my solution at http://blog.gdinwiddie.com/2008/07/29/tictactoe/ rather than put into a comment here. I went for a table-driven design.

Ah, it appears that the trackback is automatic.

+1 for Adam’s solution
-1 for Adam’s variable naming
-1 for Jeff’s obscure test
===
Bunny–I think that’s Bartosz’s nice solution, not Adams.

Talk about the obscurity of the tests. What would make for less obscure tests? My pair and I thought it was about the easiest way possible to follow the tests–a number of one-liners to represent each move, and a visual set of assertions to show the resulting board.

George–I think Paul and I had discussed a couple similar solutions. For tic-tac-toe, that one’s a nice idea. What if it’s a game like connect 4?

Jeff, for larger boards like Connect 4, I don’t think I would use the table lookup, as it would be rather large. Even with automated generation of the table, there would be significant penalty from overlapping winning patterns.

I’ve not tried it, but it occurs to me that it might be worthy to have a table of straight-line paths, and then traverse those paths counting the consecutive squares of the same player. These straight-line paths could be arrays of Positions.

I think a contributing factor to the need for refactoring in the isXyzWinner() methods is that the concept of Position is implicit, rather than explicit. It’s made of separate row and column values. This requires the code that deals with positions to know how they connect with each other. By pushing these details out, the code is required to handle the variations in the way rows and columns are traversed. By expressing the locations as Positions, the code just needs to deal with sequences of positions.

The straight-line paths I describe above for Connect 4 are really the same thing as the winning paths used in my Tic Tac Toe solution.

It also occurs to me that Path could be a class, itself. It could be initialized with a starting Position and x and y increments for a step along the path. It could also provide an iterator to allow walking through the Positions of the path checking it against the win criteria.

This is another possible implementation of separating the navigation across the board from the evaluation of the winning condition.

I didn’t look at the other solutions, so this might be a duplicate, but I would probably do this:

1. Unroll the loops in each of the four methods.
2. Inline the four methods.
3. Tease apart the duplication in the resulting mess.

In my experience, a more incremental approach, like inlining two of the methods then removing duplication, leads me down ratholes, so I prefer the break-it-all-down-then-build-it-back-up approach.

Less Refactoring

If you could have only one refactoring, which would it be? Which refactoring could you least afford to give up?

To me the obvious answer is the rename refactoring. It’s an activity that I do very frequently. In fact, the way I code is dependent upon my ability to do a rapid rename.

I try to follow Uncle Bob’s mantra “never be blocked” at even the smallest granularities. I won’t stew for more than a few seconds to derive a name for a class, field, method, or other identifier. Instead, I type. If my brain is really frozen, I’ll surge forward by typing an “x” or the word “something.” No worries–I usually think of a better name shortly thereafter. I then do a ctrl-1/rename, type the new name, press enter, and Eclipse completes the job for me very safely and quickly.

The name isn’t final at that point. I’ll continue to update an identifier’s name as my understanding of what it represents gets refined (or changes).

Recently I committed myself to working more in-depth with Groovy. I’m happy with its improved simplicity, consistency, and expressiveness over the tedium that is Java. As always, I spent a while using Groovy’s bare-bones tools (groovyc, groovyConsole, etc.) so that I understood them enough to survive if I had to. Done enough of that! Now I’m looking to my productivity experience by introducing an IDE, so I downloaded and installed the Groovy plugin for Eclipse.

Harumph, and that’s even after ignoring (a) a couple bizarre errors that I received because I hadn’t forced a clean and (b) the obtuseness of the exceptions. No, the big harumph is that there are no ctrl-1 or refactoring options anywhere to be found. I’ve heard IDEA is better (and hope to try it out here soon), but as I understand, the rename challenge will never be completely resolved: safe rename refactorings require type information. You can provide that in Groovy, but figure on most developers not bothering.

And for now, the Eclipse plugin isn’t even trying. I created a private, typed field–something for which a refactoring operation could be very safe–but now I can’t change its name without resorting to search/replace. The fact that I may need to discard my newer programming style of “continual rename” makes me feel dirty.

Even with some forethought, programmers will always choose some names that end up less than ideal. And since developers are notorious for not changing things that might break, the poor names will remain as long as the IDE can’t do it safely. Maintenance costs will be higher.

The main point of my post is that after years of using a strongly-typed language, I am now resisting languages like Groovy and Ruby to an extent. Not because I’m worried about the defects that could happen at runtime that wouldn’t otherwise-my use of TDD generally means that’s not an issue. Instead, my complaint is about lack of productivity, and worse, the decreased amount of refactoring that will likely result.

The question is, do benefits accruing from the increased simplification of the code outweigh the negatives of more difficult refactoring?

Comments:

[Full disclosure: I may have an interview with you soon, so I may just be sucking up here with my comment.]

I’ve collected a few links and thoughts on refactoring languages like Ruby at my blogki, and sometimes share similar concerns.

Some counter-arguments I believe I’ve heard on this topic … and may or may not agree with:

– Ruby doesn’t require as many refactorings as a language like Java does. It’s easier to morph your designs without outside aide. Much like the GoF Design Patterns is mostly moot in Smalltalk — you don’t need a design pattern for something that is second nature.

– Refactoring was invented in Smalltalk … why worry about refactoring tools for Ruby? They’ll come.

– … (well, I’m tapped out. I guess I didn’t need a list here.)

I like your post, though. I’m very fond of my friend “Call Hierarchy” in Eclipse — esp. when working through a mess of code. It’s faster to analyze call paths then to try and make some changes and then wait for a test suite run to tell me if I went wrong (obviously, both are valuable, though). Of course, the codebase should probably make more use of coding to interfaces and not have such a long running test suite … but … such is life.

Ramble off.

I know first hand how anal you can be about refactoring, so I understand what you’re talking about. I’ve just been able to introduce Groovy into our environment and have been running into a lot of the limitations of the Eclipse Groovy support. I’ve been pleasantly surprised at IntelliJ’s Groovy support, which includes refactoring. It’s not a perfect solution, as I still prefer Eclipse for my “normal” development, but it’s a decent stopgap until Eclipse’s Groovy support becomes more first class.

Refactoring Metaphor

I’m not overly fond of metaphors. They’re never perfect and are usually abused.

One metaphor for refactoring is dirty dishes. The person who chooses not to clean his dishes as he goes finds himself with a growing, dirty pile of dishes. At some point, presumably, the developer runs out of dishes and can no longer eat.

It’s an adequate metaphor. Refactoring is something you should do constantly, a little bit each time you touch code. Do a meal, wash the dishes for the meal. Your likelihood to throw up your arms in disgust at a huge stack of dishes diminishes.

There are problems, however, with the metaphor. If you don’t refactor your code, it becomes exponentially more difficult to introduce new changes without the possibility of causing problems. Also, companies know they can get away with buying cheap dishes, perhaps even paper or plastic, instead of wasting their time cleaning them. Some companies will throw away expensive china if it’s gotten too dirty. And all companies know that there are a lot of places to leave the dirty dishes lying around.

A Jiffy Lube mechanic, on the other hand, works in a pit of fairly limited size. He changes oil frequently (let’s imagine that a car represents a task on a project). The mechanic can choose to ignore the oil oozings that drip onto the floor. Or, he can take the extra time to clean it up with each oil change.

If the mechanic chooses to ignore the dirty liquid, it begins to build up on the floor. At first, things are a little slippery and a bit of a nuisance. In a short amount of time, perhaps a day or two, the mechanic begins to fall down on the job, taking a bit longer to complete each oil change.

You know where this is going, of course. It gets exponentially worse and worse; soon the mechanic swims against the black filthy tide, and can do one oil change a day if he works hard enough. Finally, the mechanic just drowns, never to change a filter again.

XP or no XP, test-driven development or no test-driven development, programming or something else, all systems have entropy. All systems turn crufty and ultimately become useless. The promise of XP is to allow you to sustain that system for a longer period of time before the costs become too high. This is the hope for any system. Constant, hardcore refactoring at the micro level is essential for this to happen. Don’t let the code drown you!

Book Review: XP Refactored

A copy of the Amazon review of the book Extreme Programming Refactored: The Case Against XP, by Matt Stephens and Doug Rosenberg

Note: This review appeared at Amazon for three months before it was removed, due to a customer complaint about the reference to another reviewer. It was a spotlight review for at least a couple weeks due to the large number of “helpful”/”not helpful” votes it received. The review appears unchanged, except for added links, below.

Additional note (10-Feb-2004): If you’ve arrived at this review from the SoftwareReality web site, you’ll expect a “rant,” “paranoia,” and “anger.” Judge for yourself, but I think the below review is a reflection of what passes for humor in XP Refactored. If my review is a rant (it may well be), then the whole of XP Refactored is one big rant. If my review demonstrates anger (doubtful), then XP Refactored is one massive hate-piece.

**
Some good points ruined by vindictiveness/unsupported claims
Reviewer: Jeff Langr from Colorado Springs, CO United States, September 18, 2003

In XP Refactored, the authors do a good job of covering many of the problems with how XP has been applied. But they waste a large chunk of the book on the Chrysler project (he said, she said), goofy songs (do you old geezers know anything other than the Beatles?), and some things that even the XP community has already admitted were problematic and that have been corrected.

The authors point out (pg93) that the C3 project was cancelled in February 2000, and Beck’s book Extreme Programming Explained came out “later the same year.” This is patently false; a search on this site itself proves that the book came out in October 1999. (Which also meant that Beck probably finished writing the book sometime in the summer of 1999.) My immediate reaction was that if the authors are willing to make up dates in order to rip XP, then what else is dishonest in the book?

I have a bigger problem with the guest testimonials than the authors’ own material. It all started with the disclaimer that many of the testimonials were anonymous, due to the contributors’ fears of reprisal. Once I read that, I wasn’t sure what was written by real people and what–er, who–was made up. Is someone as arrogant and cocksure as David Van Der Klauw a real person?

Each testimonial is just one (made-up?) person’s opinion, and is rarely backed up by any meaningful facts. For example, the possibly fictitious Van Der Klauw claims (pg192) that “many things of moderate or greater complexity cannot be test-first coded.” I’d say he’s wrong. Yes, there is a small number of things that cannot be tested, but by gosh, if any significant portion of your system is so complex that you can’t figure out how to test it, you don’t know much about design. In any case, Van Der Klauw isn’t even sure about his own assertions: “I may change my opinion over time.”

The best fact that Mr. Van Der Klauw has to offer is that he has “13 years of software development experience,” and that should be enough for anyone to believe his countless diatribes.

My biggest beef with the authors’ slam against the actual XP practices is related to refactoring, and to their misunderstanding of its value either with or without respect to XP. While refactoring can result in constantly altering the object-level design or even the architecture of the system, that is something that happens only infrequently in practice (and it can work). Instead, refactoring is best viewed as a means of constantly keeping code clean and maintainable through low-level code changes such as reorganizing methods and renaming things. In fact, this is the bulk of what good refactoring involves. Rarely is there any of the thrashing that the authors repeatedly suggest.

At one point (page 212), the authors recommend that you should never touch a piece of code once it works: “leave it well enough alone.” Wow. Everyone out there is just a perfect coder. Not. Anyone who’s ever hired a high-priced consultant (like good old Doug or Matt) to produce code has been burned by overly complex and often overdesigned code that had to be rewritten because it was completely indecipherable and unmaintainable.

The authors repetitively promote up-front design as a panacea. Fortunately they quietly admit that it’s virtually impossible to produce a perfect design up front, something that just about everybody has agreed upon. The question becomes, how much design should you do up front? The more you do up front, the more of a waste of time it is, if indeed it can’t be perfect.

Nowhere does XP preclude you from spending a half day per iteration–or even longer–to discuss design, produce UML diagrams, etc. In fact, that’s what Object Mentor (the consulting firm of Bob Martin, who is frequently quoted in the book) has recommended in their XP courses.

The authors also recommend writing all requirements down in detail at the start of the project (as in waterfall), and getting a signoff. Never mind the folly of trying to finalize everything up front; worse still is that the authors think that it’s ok for programmers to “spend a few months doing nothing” while waiting for the specs.

Unfortunately, I’ve seen contracts cancelled, jobs lost, and companies fold where the customer wanted to change their mind a few months after the “requirements were complete.” Too bad! You signed here on the requirements, our design is perfect, it’s done, and you can’t change it!

The authors do make concessions to the fact that XP has brought a lot of good things to the table, such as the heavy emphasis on test-driven development. Even the authors’ agile approach resembles much of XP. With this in mind, the absolute hatred that comes through in much of the remainder of the book strikes me as just a bunch of personal vendettas. I understand where they are coming from, as some members of the XP community can be arrogant and annoying (just like the authors), and a lot of XP has been overhyped and misunderstood. Still, one shouldn’t charge people this kind of money in order to air their dirty laundry.

By the way, this book is nowhere near serious, as another well-known reviewer states below. XP Refactored frequently uses words like “moronic” and “asinine” and thus smacks of being an abusive ad hominem attack, not a serious critique.

I was prepared to give the book 3 stars based on some of the value contained within, but there’s just too much vitriol and defensiveness in the book to make it an enjoyable read. Docked a star to 2.

What Can Go Wrong With XP

Extreme programming (XP), on the surface, is a hot, trendy way of developing software. Its name appeals to the reckless and annoys the cautious (“anything extreme can’t be good!”). A recent article in Wired magazine attests to its popularity amongst the cool set, who believe that XP is a silver bullet. The article also talks about some of XP’s severest detractors, who view XP as dangerous hype. The truth lies somewhere between. XP alone will not save your project. Applied properly, XP will allow your team to become very successful at deploying quality software. Applied poorly, XP will allow your team to use a process as an excuse for delivering–or not delivering–low quality software.

Applying XP

XP is a highly disciplined process that requires constant care. It consists of a dozen or so core disciplines that must all be followed in order for XP to work. The practices bolster one another, and each exists for not one but for many reasons. For example, test-driven development and constant code refactoring are critical to ensuring that the system retains a high quality. High code quality is required in order to be able to sustain iterative, incremental development. And without constant peer review, many developers would quickly lapse from doing these critical practices. XP provides this review through the discipline of pair programming.

Yet many shops shun pairing, viewing it unnecessary for developers to pair all the time. Not pairing is not necessarily a bad thing, but in lieu of pairing, some other form of peer review must take place. (Fagan inspections are an OK second choice.) Without the peer review, code quality quickly degrades, and you now may have a huge liability that no one is aware of. The lesson is that if you choose not to do an XP practice, you must understand what you have just taken out. You must then find another way to provide what the practice brought to the table.

Good software development will always require these things:

  • planning
  • requirements gathering and analysis
  • design/coding
  • testing
  • deployment
  • documentation
  • review
  • adherence to standards

Contrary to popular misconception, XP says that you should be doing all of these things, all of the time.

Common Traps

The most common mistakes I’ve seen in the application of XP include: not creating a dedicated customer team, not doing fixed iterations, misunderstanding test-driven development, not refactoring continuously, not doing design, and not pairing properly.

Not Creating a Dedicated Customer Team

If you read earlier literature on XP, you will be told that there is one customer for an XP team. The practice was called “On-site Customer.” Since then, Beck and others have clarified this practice and renamed it “Whole Team.” The whole team aspect is that we (developers and non-developers) are all in this together.

The customer side of the fence has been expanded to include all people responsible for understanding what needs to be built by the developers. In most organizations this includes the following roles.

Customer / Customer Proxy In most cases, the true customer (the person buying and using the system) is not available. Product companies have a marketing representative that understands what the market is looking for. The customer or proxy is the person that understands what the functionality needs to be in the system being built.
Subject Matter Experts Used on an as-needed basis.
Human Factors Experts Developers are good at building GUIs or web applications that look like they were built by… developers. The user experience of a system should be designed and presented to the development team as details to stories presented at iteration planning.
Testers Most customers have neither the background nor the time to define acceptance tests (ATs), which is the responsibility of the customer team. Testers work closely with the customer to build specific test cases that will be presented to the development team.
Project Managers While a well-functioning XP team has little need for hands-on project management, project managers in a larger organization play a key role in coordinating efforts between multiple projects. They can also be responsible for reporting project status to management.
Business/Systems Analysts Customers, given free reign, will throw story after story to the development team that represents addition of new functionality. The development team will respond well and rapidly produce a large amount of functionality. Unfortunately, other facets of requirements will often be missed: usability, reliability, performance, and supportability (the URPS in FURPS).

In an XP project, it is debatable as to whether these types of requirements should be explicitly presented as stories, or should be accommodated by functionality stories. I’ve seen the disastrous results of not ensuring that these requirements were met, so I recommend not trusting that your development team will meet them.

The business or systems analyst in an XP team becomes the person who understands what it takes to build a complete, robust system. The analyst works closely with the customer and/or testers to define acceptance tests, and knows to introduce tests representing the URPS requirements.

Note that these are roles, not necessarily full-time positions. Members of the development team will often be able to provide these skills. Just remember that the related work comes at a cost that must be factored into iteration planning.

Not Doing Fixed Iterations

Part of the value of short-cycle iterative development is the ability to gather feedback on many things, including rate of development, success of development, happiness of the customer, and effectiveness of the process. Without fixed-length iterations (i.e. in an environment where iterations are over when the current batch of stories is completed), there is no basis for consistent measurement of the data produced in a given iteration. This means there is also no basis for comparison of work done between any two iterations.

Most importantly, not having fixed iterations means that estimates will move closer to the meaningless side of the fence. Estimating anything with any real accuracy is extremely difficult. The best way to improve estimates is to make them more frequently, basing them on past work patterns and experiences, and to make them at a smaller, more fixed-size granularity.

Putting short deadlines on developers is painful at first. Without fixed deadlines, though, customers and management are always being told, “we just need another day.” Another day usually turns into another few days, another few weeks. Courage is needed by all parties involved. Yes, it is painful to not finish work by a deadline. But after a few iterations of education, developers learn the right amount of work to target for the next two weeks. I’ll take a few failures in the small over a failed project any day.

Misunderstanding Test-Driven Development

Small steps.
Small steps.
Small steps.

I can’t repeat this enough. Most developers that learn TDD from a book take far larger steps than intended. TDD is about tiny steps. A full cycle of test-code-refactor takes on average a few minutes. If you don’t see a green bar from passing tests within ten minutes, you’re taking too large steps.

When you are stuck on a problem with your code, stand up and take a short break. Then sit down, toss the most recent test and associated code, and start over again with even smaller steps. If you are doing TDD properly, this means you’ve spent up to ten minutes struggling; I guarantee that you’ll save gobs more struggling time by making a fresh attempt.

Tests are specs. If you do test-driven development correctly, there is no code in the system that can’t be traced back to a spec (test). That means that you don’t write any code without a test. Never mind “test everything that can possibly break.” Test everything. It’s much harder to get into trouble.

As an example of a naive misunderstanding of TDD, I’ve seen tests that look like this:

public void testFunction() {
   SomeClass thing = new SomeClass();
   boolean result = thing.doSomething();
   assertTrue(result);
}

In the production class, the method doSomething is long, does lots of things, and ends by returning a boolean result.

class SomeClass {
   boolean doSomething() {
      boolean result = true;
      // lots and lots of functionality
      // in lots and lots of lines of code
      // ...
      return result;
   }
}

Of course, the test passes just fine. Never mind that it tests none of the functionality in the doSomething method. The most accurate thing you could do with this code is reduce the doSomething method to a single statement:

boolean doSomething() {
   return true;
}

That’s all the test says it does. Ship it! 😉

Test everything until you can’t stand it. Test getters and setters, constructors, and exceptions. Everything in the code should have a reason for being, and that reason should be documented in the tests.

Not Refactoring Continuously

Understanding how to approach refactoring and do it properly is probably the most misunderstood aspect of XP. Not refactoring well is also the quickest way to destroy a codebase, XP or not.

Refactoring is best viewed as the practice of being constantly vigilant about your code. It fits best as part of the test-driven development cycle: write a test (spec), write the code, make the code right. Don’t let another minute go by without ensuring you haven’t introduced junk into the system. Done in this fashion, refactoring just becomes an engrained part of your every-minute every-day way of building software.

As soon as you aren’t on top of it, however, code turns to mush, even with a great design.

XP purists will insist that constant, extreme refactoring allows your design to emerge constantly, meaning that the cost of change remains low. This is absolutely true, as long as your system always retains the optimal design for the functionality implemented to date. Getting there and staying there is a daunting task, and it requires everyone to be refactoring zealots.

Many shops experimenting with XP, however, understand neither what it means to be a refactoring Nazi nor what optimal design is. Kent Beck’s concept of simple design can be used as a set of rules. The rules are straightforward; adhering to them takes lots of peer pressure.

Not Doing Design

I’ve heard far too many people saying that “you don’t do design in XP.” Hogwash (and I’m being nice).

As with everything else in XP, the “extreme” means that you do design all the time. You are designing at all sorts of levels. Until you get very good at following all the XP disciplines, the iteration planning meeting should be your primary avenue for design.

Iteration planning consists of task breakdowns. The best way to do task breakdowns is to start by sketching out a design. UML models should come into play here. Class diagrams, sequence diagrams, state models, and whatever other models are needed should be drawn out and debated. The level of detail on these models should be low–the goal is to solve the problem of general direction, not specify all the minute details.

For a typical two-week iteration, a planning session should last no longer than half a day (how much new stuff that needs to be designed can possibly fit into two weeks?). But if you’re just getting started, and the design skills on the team are lacking, there is no reason you can’t spend more time in the iteration planning meeting. The goal should be that you spend less time next iteration.

Design should also be viewed as a minute-to-minute part of development. With each task tackled, a pair should sketch out and agree on general direction. With each new unit test, the pair should review the code produced and determine if the design can be improved. This is the refactoring step. Ideally you’re not seeing large refactorings that significantly alter the design at this point, but things like commonality being factored out into a Template Method design pattern will happen frequently.

At any given time, all members of the development team should have a clear picture of the design of the system. Given today’s tools, there is no reason that a detailed design snapshot can’t be produced at a moment’s notice. In fact, I highly recommend taking a snapshot of the existing system into the iteration planning meeting to be used as a basis for design discussions.

In XP, design is still up-front; it’s just that the increments are much smaller, and less of it is written down. In a classic, more waterfall-oriented process, more design is finalized and written down up front. The downside in XP is that there is some rework based on changing requirements. The downside in the classic process is that there is no such thing as a perfect design, and following an idealistic model results in overdesigned and inflexible systems. There are always new requirements, and a process that teaches how to be prepared to accommodate them is better than one that does not.

Not Pairing Properly

Part of pairing is to ensure knowledge sharing across the team, not just amongst two people. Ensure that your pairs switch frequently–at least once per day if not more often. Otherwise you will get pockets of ability and knowledge. Corners of your system will turn into junk, and you will have dependencies on a small number of knowledgeable people.

Developers will resist switching pairs frequently. It is a context switch, and context switches are expensive. If you are in the middle of a task, and the person you were working with leaves only to be replaced by someone else, you must take the time to get that person up to speed. Frequent pair switching forces you to get good at context switching, which forces you to improve your code maintainability. If your code is poorly written and difficult to understand, the newly arrived pair will waste far more time getting up to speed. Ideally, the new pair should balk and insist upon more code clarity before proceeding.

The ability to quickly understand code and associated unit tests is what makes software maintenance costs cheap and consistent. Maintainable code is what allows you to be able to sustain the quality of your system over time.

Atom