The Compulsive Coder, Episode 7: Please AAA

Q. What behavior does the following JUnit test describe? (In other words, what’s an appropriate name for the test?)

        MockInitialContext mic = MockInitialContextFactory.getContext();
        mic.map.put(ClassicConstants.JNDI_CONTEXT_NAME, "tata");

        LoggerFactory.getLogger(ContextDetachingSCLTest.class);

        ContextJNDISelector selector = (ContextJNDISelector) ContextSelectorStaticBinder.getSingleton().getContextSelector();
        Context context = selector.getLoggerContext();
        assertEquals("tata", context.getName());
        System.out.println(selector.getContextNames());
        assertEquals(2, selector.getCount());

Maybe it helps, just a bit, if I tell you that the test name is testCreateContext.

Why must I spend time reading through each line of a test like this to understand what it’s really trying to prove?

The test designer might have made the central point of the test obvious, by visually chunking the test into its Arrange-Act-Assert (AAA) parts:

        MockInitialContext mic = MockInitialContextFactory.getContext();
        mic.map.put(ClassicConstants.JNDI_CONTEXT_NAME, "tata");
        LoggerFactory.getLogger(ContextDetachingSCLTest.class);
        ContextJNDISelector selector = (ContextJNDISelector) ContextSelectorStaticBinder.getSingleton().getContextSelector();

        Context context = selector.getLoggerContext();

        assertEquals("tata", context.getName());
        System.out.println(selector.getContextNames());
        assertEquals(2, selector.getCount());

Well, I think that’s right at least, given what we know.

(Yes, the word “test” in the name duplicates the requisite @Test annotation. And yes, there are other problems with the test, e.g. printing anything to the console. But it’s excruciatingly simple to first ensure the structure of a test is clear.)

Standards are only that if they are constantly adhered to.


Previous Compulsive Coder blog entry: this Duplication Is Killing Me
Next Compulsive Coder blog entry:

C++11 Via TFL (Test-Focused Learning): Uniform Initialization

I’ve been working with the new features of C++11 for many months as I write a book on TDD in C++. The updates to the language make for a far-more satisfying experience, particularly in terms of helping me write clean code and tests.

I haven’t written any Java code for over a half-year, and I don’t miss it one bit (I do miss the IDEs a bit, although I’m enjoying working in vim again, particularly with a few new tips picked up from Use Vim Like a Pro and Practical Vim). New language features such as lambdas and type inferencing represent a leap-frogging that shine a little shame on Oracle’s efforts.

source: Naval History and Heritage Command
Image source: Naval History & Heritage Command

Over a series of upcoming blog entries, I will be demonstrating many of the new language features in C++ via test-focused learning (TFL). This entry: uniform initialization, the new scheme for universally-consistent initialization that also simplifies the effort to initialize collections, arrays, and POD types.

One goal of this blog series is to see how well the tests can communicate for themselves. Prepare for a lot of test code (presume they all pass, unless otherwise noted) and little blather. Please feel free to critique by posting comments; there’s always room for improvement around the clarity of tests (particularly regarding naming strategy). Note that TFL and TDD have slightly different goals; accordingly, I’ve relaxed some of the TDD conventions I might otherwise follow (such as one assert per test).

The Basics

Note: Your compiler may not be fully C++11-compliant. The examples shown here were built (and tested) under gcc 4.7.2 under Ubuntu. The unit testing tool is Google Mock (which supports the Hamcrest-like matchers used here, and includes Google Test).

TEST(BraceInitialization, SupportsNumericTypes) {
   int x{42};
   ASSERT_THAT(x, Eq(42));

   double y{12.2};
   ASSERT_THAT(y, DoubleEq(12.2));
}

TEST(BraceInitialization, SupportsStrings) {
   string s{"Jeff"};
   ASSERT_THAT(s, Eq("Jeff"));
}

TEST(BraceInitialization, SupportsCollectionTypes) {
   vector<string> names {"alpha", "beta", "gamma" };
   ASSERT_THAT(names, ElementsAre("alpha", "beta", "gamma"));
}

TEST(BraceInitialization, SupportsArrays) {
   int xs[] {1, 1, 2, 3, 5, 8};
   ASSERT_THAT(xs, ElementsAre(1, 1, 2, 3, 5, 8));
}

Those tests are simple enough. Maps are supported too:

TEST(BraceInitialization, SupportsMaps) {
   map<string,unsigned int> heights {
      {"Jeff", 176}, {"Mark", 185}
   };

   ASSERT_THAT(heights["Jeff"], Eq(176));
   ASSERT_THAT(heights["Mark"], Eq(185));
}

Explicit initialization of collections isn’t nearly as prevalent in production code as it is in tests. I’m tackling uniform initialization first because I’m so much happier with my resulting tests. The ability to create an initialized collection in a single line is far more expressive than the cluttered, old-school way.

TEST(OldSchoolCollectionInitialization, SignificantlyCluttersTests) {
   vector<string> names;

   names.push_back("alpha");
   names.push_back("beta");
   names.push_back("gamma");

   ASSERT_THAT(names, ElementsAre("alpha", "beta", "gamma"));
}

No Redundant Type Specification!

Uniform initialization eliminates the need to redundantly specify type information when you need to pass lists.

TEST(BraceInitialization, CanBeUsedOnConstructionInitializationList) {
   struct ReportCard {
      string grades[5];
      ReportCard() : grades{"A", "B", "C", "D", "F"} {}
   } card;

   ASSERT_THAT(card.grades, ElementsAre("A", "B", "C", "D", "F"));
}
TEST(BraceInitialization, CanBeUsedForReturnValues) {
   struct ReportCard {
      vector<string> gradesForAllClasses() {
         string science{"A"};
         string math{"B"};
         string english{"B"};
         string history{"A"};
         return {science, math, english, history};
      }
   } card;

   ASSERT_THAT(card.gradesForAllClasses(), ElementsAre("A", "B", "B", "A"));
}
TEST(BraceInitialization, CanBeUsedForArguments) {
   struct ReportCard {
      vector<string> subjects_;

      void addSubjects(vector<string> subjects) {
         subjects_ = subjects;
      }
   } card;

   card.addSubjects({"social studies", "art"});

   ASSERT_THAT(card.subjects_, ElementsAre("social studies", "art"));
}

Direct Class Member Initialization

Joyfully (it’s about time), C++ supports directly initializing at the member level:

TEST(BraceInitialization, CanBeUsedToDirectlyInitializeMemberVariables) {
   struct ReportCard {
      string grades[5] {"A", "B", "C", "D", "F"};
   } card;

   ASSERT_THAT(card.grades, ElementsAre("A", "B", "C", "D", "F"));
}

Class member initialization essentially translates to the corresponding mem-init. Be careful if you have both:

TEST(MemInit, OverridesMemberVariableInitialization) {
   struct ReportCard {
      string schoolName{"Trailblazer Elementary"};
      ReportCard() : schoolName{"Chipeta Elementary"} {}
   } card;

   ASSERT_THAT(card.schoolName, Eq("Chipeta Elementary"));
}

Temporary Type Name

TEST(BraceInitialization, EliminatesNeedToSpecifyTempTypeName) {
   struct StudentScore {
      StudentScore(string name, int score) 
         : name_(name), score_(score) {}
      string name_;
      int score_;
   };
   struct ReportCard {
      vector<StudentScore> scores_;
      void AddStudentScore(StudentScore score) {
         scores_.push_back(score);
      }
   } card;

   // old school: cardAddStudentScore(StudentScore("Jane", 93));
   card.AddStudentScore({"Jane", 93}); 

   auto studentScore = card.scores_[0];
   ASSERT_THAT(studentScore.name_, Eq("Jane"));
   ASSERT_THAT(studentScore.score_, Eq(93));
}

Be careful that use of this feature does not diminish readability.

Defaults

TEST(BraceInitialization, WillDefaultUnspecifiedElements) {
   int x{};
   ASSERT_THAT(x, Eq(0));

   double y{};
   ASSERT_THAT(y, Eq(0.0));  

   bool z{};
   ASSERT_THAT(z, Eq(false));

   string s{};
   ASSERT_THAT(s, Eq(""));
}
TEST(BraceInitialization, WillDefaultUnspecifiedArrayElements) {
   int x[3]{};
   ASSERT_THAT(x, ElementsAre(0, 0, 0));

   int y[3]{100, 101};
   ASSERT_THAT(y, ElementsAre(100, 101, 0));
}
TEST(BraceInitialization, UsesDefaultConstructorToDeriveDefaultValue) {
   struct ReportCard {
      string school_;
      ReportCard() : school_("Trailblazer") {}
      ReportCard(string school) : school_(school) {}
   };

   ReportCard card{};

   ASSERT_THAT(card.school_, Eq("Trailblazer"));
}

Odds & Ends

TEST(BraceInitialization, CanIncludeEqualsSign) {
   int i = {99};
   ASSERT_THAT(i, Eq(99));
}

… but why bother?

It’s always nice when a new language feature makes it a little harder to make the dumb mistakes that we all tend to make from time to time (and sometimes, such dumb mistakes are the most devastating).

TEST(BraceInitialization, AvoidsNarrowingConversionProblem) {
   int badPi = 3.1415927;
   ASSERT_THAT(badPi, Eq(3));

   int pi{3.1415927}; // emits warning by default
//   ASSERT_THAT(pi, Eq(3.1415927));
}

Running the AvoidsNarrowingConversionProblem test results in the following warning:

warning: narrowing conversion of ‘3.1415926999999999e+0’ from ‘double’ to ‘int’ inside { } [-Wnarrowing]

Recommendation: use the gcc compiler switch:

-Werror=narrowing

…which will instead cause compilation to fail.

Use With Auto

TEST(BraceInitialization, IsProbablyNotWhatYouWantWhenUsingAuto) {
   auto x{9};
   ASSERT_THAT(x, A<const initializer_list<int>>());
   // in other words, the following assignment passes compilation. Thus x is *not* an int.
   const initializer_list<int> y = x;
}

The Most Vexing Parse?

It’s C++. That means there are always tricky bits to avoid.

TEST(BraceInitialization, AvoidsTheMostVexingParse) {
   struct IsbnService {
      IsbnService() {}
      string address_{"http://example.com"};
   };

   struct Library {
      IsbnService service_;
      Library(IsbnService service) : service_{service} {}
      string Lookup(const string& isbn) { return "book name"; }
   };

   Library library(IsbnService()); // declares a function(!)
//   auto name = library.Lookup("123"); // does not compile

   Library libraryWithBraceInit{IsbnService()};
   auto name = libraryWithBraceInit.Lookup("123"); 

   ASSERT_THAT(name, Eq("book name"));
}

All the old forms of initialization in C++ will still work. Your best bet, though, is to take advantage of uniform initialization and use it at every opportunity. (I’m still habituating, so you’ll see occasional old-school initialization in my code.)

Legacy Quadrants for Increasing Confidence Coverage

Your legacy system presents you with a choice: Put a stake in the ground and attempt to do something about the daily pain it causes, or let it get incrementally and certainly worse. Often its challenges seem insurmountable–small improvements appear like ripples in the ocean. Perhaps an investment to improve the software product is not worth the return in value, given its future.

Chances are that most of the time, though, you’ll be stuck with your legacy system for some time coming, and it makes sense to put a foot down to stem the degradation. Working Effectively With Legacy Code provides you with primarily tactical guidance for preventing entropy, focusing mostly on individual code-level challenges. The Mikado Method provides you with primarily strategic guidance, focusing mostly on managing large-scale refactorings across a code base (legacy or not).

In terms of where to put your effort, the best strategy is always to begin with the areas of current impact–what needs to change for the feature you’re currently tackling? But if you can afford additional effort to add characterization tests–i.e. increase “confidence coverage”–consider this quadrant that pits rate of change against defects for a given class.

Legacy Quadrants

How might you determine what classes belong in which quadrants? Tim Ottinger devised a heat map while at one shop, a simple Python tool that cross-referenced JIRA tickets with trouble tickets reported in Git. (I got to help build it!) You might consider filtering the defects to ignore those with low severity.

The Trusty Old Engine. “If it ain’t broke, don’t fix it!” Classes that work and aren’t changing should be the last thing you worry about. Sometimes code in these areas is tempting: it may exhibit obvious duplication, for example, that a refactoring weenie might view as a quick-win opportunity. Don’t waste your time–there are bigger fish to try.

The Can of Worms. You should have very few classes in this quadrant–having lots of  stable classes that exhibit many defects suggests a reset on how your team does triage. Fixing these troubled classes might seem like opportunity to make your customer happy. Open the code up, however, and you’re likely to find fear-inspiring code. You might view this quadrant as a sandbox for ramping up on legacy code techniques: Classes here will provide you the time to experiment without having to worry as much about contention issues with the rest of the team.

The Mystery. Getting all the tiny bits correct in the face of rapid change is a significant challenge. A single, low-defect class with lots of change could suggest at least a couple things: The changes are generally trivial in nature, or the class rampantly violates single responsibility principle (for example, a large controller class that doesn’t simply delegate but also does work). With respect to the latter, perhaps you’ve been fortunate so far, or perhaps you have good test coverage.

It might be worth a small amount of effort to prevent any future problems in such an SRP-violating class. The nice thing is that splintering off new classes can be a fairly quick and risk-free effort, one that should begin to isolate areas of risk. You might then be able to code a few key characterization tests without extreme difficulty.

The Garbage Disposal. Usually there are at least a few classes in your system that are very large and fraught with problems, and they change all the time. This is the obvious place to improve your confidence coverage.

Good design, which starts with the application of the SRP, starts to solve many problems that the garbage disposal represents. Small, single-purpose classes provide more opportunities for reuse, make performance profiling easier, expose defects more readily, and can dramatically improve comprehension of the code, to name a few benefits. A garbage disposal is a dangerous place–with lots of hands in one big sinkhole, contention is frequent and merges can be painful. Design that adheres to the SRP thus also makes continuous integration easier.

Unit Tests Are Still a Waste of Time

Per a recent Dr. Dobbs article, the only reason to write unit tests is as a “convenience … to track down bugs faster.” If that’s the only value proposition, then I must agree with the author that writing unit tests is a “luxury” that only infrequently makes sense.

You’d be far better off investing the time in improving your integration tests or (much better) bolstering the tests that demonstrate customer acceptance criteria. They run much more slowly, but provide more value than writing unit tests after you code: They not only help prevent you from creating defects in the system, they can also be a key negotiation point, and they can act as living documentation on the behaviors your system supports.

Fortunately, writing tests first can create myriad other benefits that outweigh their cost: reduced system size, tests that improve developer understanding of system behaviors, more decoupled/cohesive designs, cleaner code, and the ability to change code safely and rapidly as it needs to be changed. Yes, there’s some churn along the way, but it’s usually mild (and milder as the quality of design improves). These benefits are why I’ve done TDD for a dozen years now, having witnessed some fantastic successes along the way.

Is TDD Faster than TAD?

I do TDD mostly to support the ability to keep my code and design clean, with high confidence that I’m not breaking something in the process. Most developers who do not practice TDD refactor inadequately, because of the fear based on the very real likelihood of breaking the software. I can’t blame ’em for the fear. I’ve helped developers in environments where defects related to downtime had the potential to cost the company millions of dollars per minute.

“Inadequate? How dare you!” Yes, inadequate. The best TADers (test-after developers) claim around 70%-75% coverage. The problem is that there’s usually some good, chewy, gristly meat in that 25% to 30% that isn’t covered, and is thus not easily or safely maintained. Costs go up. The system degrades in quality by definition.

In contrast, I have very high confidence that I can move code about when doing TDD. I can also improve the structure and expressiveness of a system, making it far easier to maintain.

Duplication is sooo easy to introduce in a system. It’s harder to spot, and even harder to fix if you don’t have fast tests providing high levels of coverage. I’ve seen one real case where introducing close-to-comprehensive unit tests on a system resulted in shrinkage down to about 1/3 its original size over a reasonably short period of time. And with most systems I’ve encountered, I can scan through the source code and quickly spot rampant bits of unnecessary duplication.

Good code structure and expressiveness is also lacking in most systems. If you’ve been a developer for more than a few weeks, it’s almost a certainty that you’ve spent way too much time slogging through a multi-hundred or multi-thousand line long function, trying to understand how it works and how to fix it without breaking something else. In a well-structured (cohesive) system, the time to pinpoint and fix things can be reduced by almost an order of magnitude.

TADers simply don’t eliminate duplication and clean up the code to this extent. It’s not a goal of TAD.

Which would you rather maintain? The TAD system, or a system that’s half the size and double the expressiveness?

There are many other reasons TDD allows me to go faster than TAD. The converse of my “why TAD sucks” reasons should hint at many of them.

Responding to a Comment on “Tradeoffs”

As mentioned two posts ago, it’s not possible to see new comments made on my old blog. Here’s a comment I received on Tradeoffs.

The basic thrust of the “Tradeoffs” blog post is that code & fix cycles–after development thinks they’re done, and software gets run through a test team–we typically hit a cycle where QA encounters no end of problems that must be fixed.

“I think you’re comparing a project that uses TDD to a project that writes no unit tests until, ‘Done,’ right?

“Why are you ignoring the projects that write unit tests after each unit is written?

“This is faster than both the approaches you describe. It’s faster than TDD because you write far fewer tests that you don’t need and faster than your other (waterfall?) approach because you find problems sooner.”

/Schmoo

Schmoo asks a good question–writing unit tests shortly after a “unit” is written could conceivably be a way of minimizing the code & fix cycle.

I’ve seen it tried, however. In reality, it helps a little, not a lot and not nearly as much as TDD. The code & fix cycle is still there, a bit shorter than it would have been otherwise.

Now, add good automated acceptance tests into the mix, and I believe Schmoo’s premise is closer to true. But unit tests? No. The primary reason is that most test-after-development (TAD) unit tests are insufficient with respect to coverage, particularly in the more complex areas. Why? Check out my earlier blog post, The Realities of Test-After Development.

As far as TDD generating more tests that you don’t need, depends on what your goals are. I’ve found that TDD, which requires you to write tests for just about everything, results in much faster development than TAD, even as TAD produces less coverage. More on these topics in a future blog post.

Tradeoffs

tradeoffs-751776

This graph, intended to show the differences in outcome between TDD and not, is a sketch of my observations combined with information extrapolated from other people’s anecdotes. One datapoint is backed by third party research: Studies show that it takes about 15% longer to produce an initial solution using TDD. Hence I show in the graph the increased amount of time under TDD to get to “done.”

The tradeoff mentioned in the title is that TDD takes a little longer to get to “done” than code ‘n’ fix. It requires incremental creation of code that is sometimes replaced with incrementally better solutions, a process that often results in a smaller overall amount of code.

When doing TDD, the time spent to go from “done” to “done done” is minimal. When doing code ‘n’ fix, this time is an unknown. If you’re a perfect coder, it is zero time! With some sadness, I must report that I’ve never encountered any perfect coders, and I know that I’m not one. Instead, my experience has shown that it almost always takes longer for the code ‘n’ fixers to get to “done done” than what they optimistically predict.

You’ll note that I’ve depicted the overall amount of code in both graphs to be about the same. In a couple cases now, I’ve seen a team take a TDD mentality, apply legacy test-after techniques, and subsequently refactor a moderate-sized system. In both cases they drove the amount of production code down to less than half the original size, while at the same time regularly adding functionality. But test code is usually as large, if not a bit larger, than the production code. In both of these “legacy salvage” cases, the total amount of code ended up being more or less a wash. Of course, TDD provides a large bonus–it produces tests that verify, document, and allow further refactoring.

Again, this graph is just a rough representation of what I’ve observed. A research study might be useful for those people who insist on them.

Unit Testing Maturity Scale

  • Level 0 – Not Performed: Tests? We don’t need no stinkin’ tests!
  • Level 1 – Performed Informally: Test-after development (TAD). Sporadic coverage, individual-level process adherence.
  • Level 2 – TDD:
    • TDD Level 0: Chaotic. Some developers do TDD. Others don’t care if the tests break.
    • TDD Level A: Performed Informally. The team agrees that unit tests must pass in the build. Developers pick and choose what should be test-driven.
    • TDD Level B: Performed Consistently. All team members build unit tests. Some unit tests are not focused enough (tendency toward integration tests). Tests do not “document” very well. Sporadic production code refactoring enabled as a benefit of having tests.
    • TDD Level C: Breakthrough. Team members use each TDD cycle to refactor production code and tests, but constrain to single classes (test+target). Most tests clearly document class capabilities.
    • TDD Level D: Optimizing. Standards around test organization, naming, and structure are evident from the code. BDD in heavy use. Team actively looks for reuse across tests, and continually refactors “mother” and other test utility classes.

This is a work in progress! Please help me iteratively and incrementally improve it.

Reviewing TAD

I’ve recently participated in a number of after-the-fact code reviews. Developers are supposed to be writing tests first, and are supposed to ask for help when they’re not sure how to do so. Here’s what it looks like:

C = one unit of time spent coding
T = one unit of time spent testing
I =  "    "   "   "    "   in an inspection or review meeting
R =  "    "   "   "    "   doing rework based on results of review
x =  "    "   "   "    "   doing other work

CCCC xxxx TTTTT xxx IIIII RRR II

We’re obviously in TAD (test-after development) mode here. Pay attention to the right hand side, which roughly indicates: When could this code actually ship?

Note that there are multiple Rs because many people are involved in the review meeting. Also note that there is a little more time spent on testing, and yet the coverage never ends up good enough (rarely more than 75%). And finally, note the rework and re-inspection that must take place.

What the same model looks like with TDD:

TCTCTCTC xxxx IIII xxx RR I

So, a little less time building the right code in the first place, and also means a little less time in rework. It also means a little less time in review meetings, since the tests can help effectively describe whether or not the developer captured the requirements correctly.

With pairing and TDD:

TTCCTTCCTTCC xxxx xxx

I’ve doubled up the Ts and Cs because there are now two people involved. I’m not even going to claim that pairing developers will produce a better solution faster, even though I’ve seen this to be the case. Note that this is now the shortest line by far. (It will even accommodate some amount of formal review–much smaller now because it’s already been actively reviewed at least once and preferably twice by the pairs building it–and still be shorter than the original process.)

I’m also not accounting for “ramp-up” time wasted. The intervening xxx’s in the inspection-based flow means that minds have moved on to other things. When it comes time to review code, or rework it, developers must put their head back around something that they might have done days ago. Particularly without good tests, this can waste considerable time.

I’m in an environment now where test-first is only beginning to take roots. Every day I see ample evidence why test-after is simply a bad, bad idea.

I Know What You Did

It takes about two seconds to determine whether or not someone used TDD to derive a set of unit tests. While it’s possible for someone to write as high-quality unit tests using TAD (test-after development), it just doesn’t happen. Invariably I see many of the same problems, most significant of which is that the TAD tests don’t cover nearly enough of the cases.

Most of the time, developers doing TAD simply refuse to follow the single-responsibility principle (SRP) at the class or method level. (Despite bizarre claims by Jeff Atwood and Joel Spolsky, asking people to push in the direction of Uncle Bob’s five principles of class design is a reasonable and sensible thing to do.) They build large classes and large methods–not small, cohesive, potentially reusable units–and then complain about the difficulty of testing them.

My simple answer for now is, “How can you expect to write unit tests when you don’t have any units?”

Atom