Agile Java Rewrite?

I’ve had the idea to write a book on refactoring for some time, and even produced a proposal a couple months ago. The focus would be “refactoring in the context of TDD,” aka continual design. I only got halfway through a sample chapter, which the publisher requests. (Writing the sample also reassures me it’s the book I want to and can write.) This past week I heard that a Big Name is currently working on a refactoring book rewrite. Hmm, maybe this project is not something that I should be doing.

Agile Java has been in print since 2005. It still sells copies–not a lot–and I also still receive thanks for the book from many folks who found it a great way to learn (TDD + OO + Java from the ground up). I feel a bit guilty that people still use a 12+year-old-book featuring a version of Java end-of-lifed for about 8 years! Time for a rewrite, as some have requested?

A dozen years later, what would change?

  • New approach based on new versions of Java: The book was written to coincide with Java 5 and its dramatic language changes, which also made it easier for a simpler, more-OO focus than previously would have been possible. With Java 9 on the horizon (July 2017, maybe?), a new version would find lambdas being introduced fairly early to allow new developers to focus on the value of more-functional approaches to solving problems.
  • Potential new “Additional Lesson” chapters focusing on full-stack considerations (HTTP and JPA, for example)… or maybe this whole notion of covering peripheral technologies gets dropped. Yeah, let’s do that, and keep this to a somewhat-reasonable length.
  • Updated for the latest version of JUnit (5).
  • Use of fluent assertions (probably Hamcrest with mention of other frameworks).
  • Emphasis on one behavior per test, as well as the value of tests-as-documentation.

Otherwise, the tone and general flow would remain the same. Hopefully not the page length: Few folks want to read 750-page books anymore.

I’m the sort to find the stuff I wrote a while back to be lacking, so I might be tempted to rewrite a few things…

C++11: Using Lambdas to Support a Times-Repeat Loop

lambdaMore often than not, explicit for loops that execute n times usually need to know what what n is. Sometimes they don’t, in which case the for-loop structure seems mildly tedious. It’s certainly idiomatic, and you’ve seen ’em all a million times:

for (auto i = 0; i < 5; i++) {
   // code to repeat here
}

Still, having seen them all a million times, you’ve encountered many either accidental or purposeful variations:

for (auto i = 1; i < 5; i++) {
   // code to repeat here
}
for (auto i = 0; i <= 5; i++) {
   // code to repeat here
}
for (auto i = 5; i >= 0; i++) {
   // code to repeat here
}
for (auto i = 0; i < 5; i += 2) {
   // code to repeat here
}

The result: A for loop is something you must always pause to inspect. It’s too easy for your brain to miss one of these variants.

I’d much rather be able to simply send a timesRepeat message to an integer, as I can do in Smalltalk (Ruby has a similar message, times):

   5 timesRepeat: [ "code to repeat here" ]

In C++, overloading the multiplication operator would allow me to code something like:

   5 * [] { /* code to repeat here */ }

(The [] is the capture clause, which indicates that the following expression is a lambda, and also specifies how variables might be captured. In this case, the empty capture clause [] indicates that no variables are captured.)

A lambda function is of type function, therefore I can support the times-repeat expression with the following:

int operator*(int number, function<void (void)> func) {
   for (int i = 0; i < number; i++) func();
}

In other words, simply translate the lambda into the corresponding for loop, iterating number times, and calling func() each time, where number represents the receiver (5 in the example usage above).

A simple test demonstrates that the construct works:

unsigned int totalGlobal;

TEST(Lambdas, CanSupportedASimplifiedTimesRepeatLoop) {
   5 * [] { totalGlobal++; };

   ASSERT_THAT(totalGlobal, Eq(5));
}

(Use of the global variable totalGlobal allows writing the lambda expression without requiring variable capture.)

The resulting expression is succinct and even better than idiomatic, it’s obvious. I don’t think I’d go as far as to say that you should replace all for loops, just ones where access to the index is not required.

You could, of course. Here’s the result, a couple small tweaks to the first solution.

int operator*(int number, function<void (int)> func) {
   for (int i = 0; i < number; i++) func(i);
}

TEST(Lambdas, CanUnnecessarilyReplaceAForLoop) {
   unsigned int total{0};

   4 * [&] (int i) { total += i; };

   ASSERT_THAT(total, Eq(0 + 1 + 2 + 3));
}

The operator overload now simply passes the loop index i to the function being executed.

In the test, I now choose to use a local variable to capture the total. The capture clause of [&] indicates that any variable accessed by the lambda function is captured by reference. The function now also includes a parameter list, specifying in this case the index parameter of i.

I doubt I’d promote use of the second form (lambda with indexer) in production code. The first (times repeat) seems succinct and appropriate.

C++ Via TFL: The Range-Based For Loop

C++ catches up with constructs common in Java and C# with the new range-based for loop.

catInfinity
Image source: normanack, Flickr

A simple example should suffice to demonstrate its common use.

TEST(ARangeBasedForLoop, CanIterateOverAVector) {
   vector collectedNumbers;
   vector numbers{1, 2, 3};

   for (int each: numbers)
      collectedNumbers.push_back(each);

   ASSERT_THAT(collectedNumbers, Eq(vector { 1, 2, 3}));
}

It’s always nice to know the appropriate names for the syntactical parts of things. Here’s an overly clever test that shows what things are called in the range-based for loop.

TEST(ARangeBasedForLoop, HasAppropriateNamesForItsSyntacticalParts) {
   typedef int type_specifier_seq;
   vector expression{1, 2, 3, };
   vector collectedNumbers;
   auto statement = [&] (int each) -> void { collectedNumbers.push_back(each); };

   for (type_specifier_seq simple_declarator: expression) statement(simple_declarator);

   ASSERT_THAT(collectedNumbers, Eq(vector { 1, 2, 3}));
}

The range-based for loop supports iterating over a number of common things. Here are a few.

TEST(ARangeBasedForLoop, CanIterateOverAnArray) {
   vector collectedNumbers;
   int numbers[]{1, 2, 3};

   for (int each: numbers)
      collectedNumbers.push_back(each);

   ASSERT_THAT(collectedNumbers, Eq(vector { 1, 2, 3}));
}

TEST(ARangeBasedForLoop, CanIterateOverAMap) {
   map dictionary{{1, "uno"}, {2, "dos"}};
   map collectedDictionary;

   for (pair pair: dictionary)
      collectedDictionary[pair.first] = pair.second;
   ASSERT_THAT(collectedDictionary, Eq(map{{1, "uno"}, {2, "dos"}}));
}

TEST(ARangeBasedForLoop, CanIterateOverAString) {
   vector collectedChars;

   for (char each: "abc")
      collectedChars.push_back(each);

   ASSERT_THAT(collectedChars, Eq(vector{ 'a', 'b', 'c', 0 }));
}

I’ve shown explicit type information in the prior examples. You should usually prefer auto, instead.

TEST(ARangeBasedForLoop, ShouldPreferAnInferencingTypeSpecifier) {
   vector collectedNumbers;

   for (auto each: {1, 2, 3})
      collectedNumbers.push_back(each);

   ASSERT_THAT(collectedNumbers, Eq(vector{ 1, 2, 3}));
}

If you want to update each element as you iterate…

TEST(ARangeBasedForLoop, CanAccessEachElementByReference) {
   vector strings{"a", "b"};

   for (auto& each: strings)
      each += each;

   ASSERT_THAT(strings, Eq(vector{ "aa", "bb" }));
}

Or if you want better performance, but want to disallow modifying each element.

TEST(ARangeBasedForLoop, CanAccessEachElementByConstReference) {
   vector strings{"a", "b"};
   vector collectedStrings;

   for (const auto& each: strings)
      // each += each; // fails compilation
      collectedStrings.push_back(each);
      

   ASSERT_THAT(strings, Eq(vector{ "a", "b" }));
}

The range-based for loop can iterate over anything that implements begin() and end() (each returning an iterator object–something that implements operator!=, operator*, and operator++). Here is a simple class that supports iteration across a range of numbers, (also supporting the ability to skip elements).

class IntSequence {
public:
   IntSequence(int start, int stop, int by=1): start_{start}, stop_{stop}, by_{by} {}

   class iterator {
   public:
      iterator(int value, int by) : current_(value), by_(by) {}

      bool operator!=(const iterator& rhs) const { 
         return current_ <= rhs.current_; 
      }

      int& operator*() { return current_; }

      iterator operator++() {
         current_ += by_;
         return *this;
      }
   private:
      int current_;
      int by_;
   };
   iterator begin() { return iterator(start_, by_); }
   iterator end() { return iterator(stop_, by_); }

private:
   int start_;
   int stop_;
   int by_;
};

And here’s an example demonstrating use:

TEST(ARangeBasedForLoop, CanIterateOverAnythingImplementingBeginAndEnd) {
   vector collectedNumbers;
   int start{3};
   int stop{10};
   int by{2};
   IntSequence sequence(start, stop, by);

   for (auto each: sequence)
      collectedNumbers.push_back(each);

   ASSERT_THAT(collectedNumbers, Eq(vector{3, 5, 7, 9}));
}

(This example could be construed as contrived: Why not simply use a regular ol’ for loop? Most of the time, that idiom is simple and probably preferred, but you might find value in the ability to pass around a sequence concept to other functions, or to serialize it, or otherwise use it where having an object abstraction might simplify code.)

Note: Code build using gcc 4.7.2 under Ubuntu.

C++11: Sum Across a Collection of Objects Using a Lambda or a Range-Based For Loop

I’m always disappointed when my Google search doesn’t turn up useful results on the first page. Often, an answer to a C++ question takes me to a StackOverflow page, or to a cplusplus page, where I usually find what I want. This time I didn’t find what I want on the first page (the stackoverflow link wasn’t quite it), hence this blog post.


Image source: jfgormet, Flickr

I’ve just started a series on test-focused learning (TFL) for the new C++11 features. I’m jumping the gun a little here since I’ve not yet covered lambdas (or auto, or the range-based for, or rvalue references), but I wanted searchers to have (what I think is) a better answer on the first page.

The story: Iterate a vector of objects, adding into a sum by dereferencing a member on each object, then return the sum from a function.

Here’s the declaration for the Item class:

class Item {
   public:
      Item(int cost) : cost_{cost} {}
      int Cost() { return cost_; }
   private:
      int cost_;
};

Here’s the assertion (I coded this test-first, of course):

ASSERT_THAT(
   TotalCost({Item(5), Item(10), Item(15)}), 
   Eq(5 + 10 + 15));

You can get this test to pass in three statements using the range-based for loop.

int TotalCost(vector<Item>&& items) {
   int total{0};
   for (auto item: items) total += item.Cost();
   return total;
}

That’s clean and simple to understand, explanation of syntax barely needed.

Here’s the implementation using accumulate and a lambda.

int TotalCost(vector<Item>&& items) {
   return accumulate(items.begin(), items.end(), 0, 
     [] (int total, Item item) { return total + item.Cost(); });
}

That’s it. If you’re not familiar with accumulate (I wasn’t, hence my Google search), it takes a range, an initial value, and a function. If you’re not familiar with using lambdas to declare functions:

  • [] declares that the lambda requires no capture of other variables
  • between () is the parameter list for the arguments that accumulate passes to the function
  • between the {} is the function’s implementation

Which do you prefer (or neither), and why?

C++11 Via TFL (Test-Focused Learning): Uniform Initialization

I’ve been working with the new features of C++11 for many months as I write a book on TDD in C++. The updates to the language make for a far-more satisfying experience, particularly in terms of helping me write clean code and tests.

I haven’t written any Java code for over a half-year, and I don’t miss it one bit (I do miss the IDEs a bit, although I’m enjoying working in vim again, particularly with a few new tips picked up from Use Vim Like a Pro and Practical Vim). New language features such as lambdas and type inferencing represent a leap-frogging that shine a little shame on Oracle’s efforts.

source: Naval History and Heritage Command
Image source: Naval History & Heritage Command

Over a series of upcoming blog entries, I will be demonstrating many of the new language features in C++ via test-focused learning (TFL). This entry: uniform initialization, the new scheme for universally-consistent initialization that also simplifies the effort to initialize collections, arrays, and POD types.

One goal of this blog series is to see how well the tests can communicate for themselves. Prepare for a lot of test code (presume they all pass, unless otherwise noted) and little blather. Please feel free to critique by posting comments; there’s always room for improvement around the clarity of tests (particularly regarding naming strategy). Note that TFL and TDD have slightly different goals; accordingly, I’ve relaxed some of the TDD conventions I might otherwise follow (such as one assert per test).

The Basics

Note: Your compiler may not be fully C++11-compliant. The examples shown here were built (and tested) under gcc 4.7.2 under Ubuntu. The unit testing tool is Google Mock (which supports the Hamcrest-like matchers used here, and includes Google Test).

TEST(BraceInitialization, SupportsNumericTypes) {
   int x{42};
   ASSERT_THAT(x, Eq(42));

   double y{12.2};
   ASSERT_THAT(y, DoubleEq(12.2));
}

TEST(BraceInitialization, SupportsStrings) {
   string s{"Jeff"};
   ASSERT_THAT(s, Eq("Jeff"));
}

TEST(BraceInitialization, SupportsCollectionTypes) {
   vector<string> names {"alpha", "beta", "gamma" };
   ASSERT_THAT(names, ElementsAre("alpha", "beta", "gamma"));
}

TEST(BraceInitialization, SupportsArrays) {
   int xs[] {1, 1, 2, 3, 5, 8};
   ASSERT_THAT(xs, ElementsAre(1, 1, 2, 3, 5, 8));
}

Those tests are simple enough. Maps are supported too:

TEST(BraceInitialization, SupportsMaps) {
   map<string,unsigned int> heights {
      {"Jeff", 176}, {"Mark", 185}
   };

   ASSERT_THAT(heights["Jeff"], Eq(176));
   ASSERT_THAT(heights["Mark"], Eq(185));
}

Explicit initialization of collections isn’t nearly as prevalent in production code as it is in tests. I’m tackling uniform initialization first because I’m so much happier with my resulting tests. The ability to create an initialized collection in a single line is far more expressive than the cluttered, old-school way.

TEST(OldSchoolCollectionInitialization, SignificantlyCluttersTests) {
   vector<string> names;

   names.push_back("alpha");
   names.push_back("beta");
   names.push_back("gamma");

   ASSERT_THAT(names, ElementsAre("alpha", "beta", "gamma"));
}

No Redundant Type Specification!

Uniform initialization eliminates the need to redundantly specify type information when you need to pass lists.

TEST(BraceInitialization, CanBeUsedOnConstructionInitializationList) {
   struct ReportCard {
      string grades[5];
      ReportCard() : grades{"A", "B", "C", "D", "F"} {}
   } card;

   ASSERT_THAT(card.grades, ElementsAre("A", "B", "C", "D", "F"));
}
TEST(BraceInitialization, CanBeUsedForReturnValues) {
   struct ReportCard {
      vector<string> gradesForAllClasses() {
         string science{"A"};
         string math{"B"};
         string english{"B"};
         string history{"A"};
         return {science, math, english, history};
      }
   } card;

   ASSERT_THAT(card.gradesForAllClasses(), ElementsAre("A", "B", "B", "A"));
}
TEST(BraceInitialization, CanBeUsedForArguments) {
   struct ReportCard {
      vector<string> subjects_;

      void addSubjects(vector<string> subjects) {
         subjects_ = subjects;
      }
   } card;

   card.addSubjects({"social studies", "art"});

   ASSERT_THAT(card.subjects_, ElementsAre("social studies", "art"));
}

Direct Class Member Initialization

Joyfully (it’s about time), C++ supports directly initializing at the member level:

TEST(BraceInitialization, CanBeUsedToDirectlyInitializeMemberVariables) {
   struct ReportCard {
      string grades[5] {"A", "B", "C", "D", "F"};
   } card;

   ASSERT_THAT(card.grades, ElementsAre("A", "B", "C", "D", "F"));
}

Class member initialization essentially translates to the corresponding mem-init. Be careful if you have both:

TEST(MemInit, OverridesMemberVariableInitialization) {
   struct ReportCard {
      string schoolName{"Trailblazer Elementary"};
      ReportCard() : schoolName{"Chipeta Elementary"} {}
   } card;

   ASSERT_THAT(card.schoolName, Eq("Chipeta Elementary"));
}

Temporary Type Name

TEST(BraceInitialization, EliminatesNeedToSpecifyTempTypeName) {
   struct StudentScore {
      StudentScore(string name, int score) 
         : name_(name), score_(score) {}
      string name_;
      int score_;
   };
   struct ReportCard {
      vector<StudentScore> scores_;
      void AddStudentScore(StudentScore score) {
         scores_.push_back(score);
      }
   } card;

   // old school: cardAddStudentScore(StudentScore("Jane", 93));
   card.AddStudentScore({"Jane", 93}); 

   auto studentScore = card.scores_[0];
   ASSERT_THAT(studentScore.name_, Eq("Jane"));
   ASSERT_THAT(studentScore.score_, Eq(93));
}

Be careful that use of this feature does not diminish readability.

Defaults

TEST(BraceInitialization, WillDefaultUnspecifiedElements) {
   int x{};
   ASSERT_THAT(x, Eq(0));

   double y{};
   ASSERT_THAT(y, Eq(0.0));  

   bool z{};
   ASSERT_THAT(z, Eq(false));

   string s{};
   ASSERT_THAT(s, Eq(""));
}
TEST(BraceInitialization, WillDefaultUnspecifiedArrayElements) {
   int x[3]{};
   ASSERT_THAT(x, ElementsAre(0, 0, 0));

   int y[3]{100, 101};
   ASSERT_THAT(y, ElementsAre(100, 101, 0));
}
TEST(BraceInitialization, UsesDefaultConstructorToDeriveDefaultValue) {
   struct ReportCard {
      string school_;
      ReportCard() : school_("Trailblazer") {}
      ReportCard(string school) : school_(school) {}
   };

   ReportCard card{};

   ASSERT_THAT(card.school_, Eq("Trailblazer"));
}

Odds & Ends

TEST(BraceInitialization, CanIncludeEqualsSign) {
   int i = {99};
   ASSERT_THAT(i, Eq(99));
}

… but why bother?

It’s always nice when a new language feature makes it a little harder to make the dumb mistakes that we all tend to make from time to time (and sometimes, such dumb mistakes are the most devastating).

TEST(BraceInitialization, AvoidsNarrowingConversionProblem) {
   int badPi = 3.1415927;
   ASSERT_THAT(badPi, Eq(3));

   int pi{3.1415927}; // emits warning by default
//   ASSERT_THAT(pi, Eq(3.1415927));
}

Running the AvoidsNarrowingConversionProblem test results in the following warning:

warning: narrowing conversion of ‘3.1415926999999999e+0’ from ‘double’ to ‘int’ inside { } [-Wnarrowing]

Recommendation: use the gcc compiler switch:

-Werror=narrowing

…which will instead cause compilation to fail.

Use With Auto

TEST(BraceInitialization, IsProbablyNotWhatYouWantWhenUsingAuto) {
   auto x{9};
   ASSERT_THAT(x, A<const initializer_list<int>>());
   // in other words, the following assignment passes compilation. Thus x is *not* an int.
   const initializer_list<int> y = x;
}

The Most Vexing Parse?

It’s C++. That means there are always tricky bits to avoid.

TEST(BraceInitialization, AvoidsTheMostVexingParse) {
   struct IsbnService {
      IsbnService() {}
      string address_{"http://example.com"};
   };

   struct Library {
      IsbnService service_;
      Library(IsbnService service) : service_{service} {}
      string Lookup(const string& isbn) { return "book name"; }
   };

   Library library(IsbnService()); // declares a function(!)
//   auto name = library.Lookup("123"); // does not compile

   Library libraryWithBraceInit{IsbnService()};
   auto name = libraryWithBraceInit.Lookup("123"); 

   ASSERT_THAT(name, Eq("book name"));
}

All the old forms of initialization in C++ will still work. Your best bet, though, is to take advantage of uniform initialization and use it at every opportunity. (I’m still habituating, so you’ll see occasional old-school initialization in my code.)

Atom