Langr Software Solutions | TDD Katas / Exercises: Name Normalizer (3 / 5+)

The 3rd in series of blog posts in which I describe TDD katas & exercises that I’ve used for training purposes.

Name Normalizer

The name normalizer transforms a name from its typical western form (for example, “Henry David Thoreau”, where the surname appears last) into surname-comma-first form (for example, “Thoreau, Henry D.”), presumably to facilitate sorting a list of names by last name. Here’s an ordered list of tests to drive an incremental derivation of the name normalizer:

Returns empty string given an empty string or null
Returns a single-word name (mononym) straight-up (e.g. “Plato”)
Swaps first and last names (“Haruki Murakami” => “Murakami, Haruki”)
Trims leading & trailing whitespace
Initializes the middle name (“Langr, Jeffrey J.”)
Does not initialize a single-letter middle name (“Truman, Harry S”… the only one I can think of)
Initializes each of multiple middle names (“Louis-Dreyfus, Julia S. E.”)
Appends suffixes to end (“King, Martin L., Jr.”)
Throws when name contains two commas (e.g. “Thurston, Howell, III”)

Additional potential features:

Salutations (Mr., Mrs., Dr., etc.)
Remove periods from UK salutations (use some sort of property setting?) when last letter of salutation is same as last letter of the abbreviated version (e.g. “Mister” => “Mr”); do not remove period otherwise (“Captain => Capt.”)
Instead of throwing on two commas, assume that the first comma sets of a list of suffixes, each separated by commas
Alphabetize by certain name prefixes: le, du, di, del, des (e.g. “Le Pew, Pepe”)
Alphabetize by de when surname is one syllable; otherwise alphabetize by last name (e.g. “De Claire, Jaime”; “Maupassant, Guy de”). This feature can drive the introduction of test doubles (i.e. a service that returns the number of syllables given a surname).

This is one of those exercises that you could probably keep busy with for a full morning if you really wanted–there are plenty more interesting rules about names throughout the world.

Duration: 30-45 minutes

Core themes:

Learning the R-G-R rhythm
The value of refactoring toward composed & cohesive functions; clarity
Confidence / safety
Incremental growth of a solution

This has become my go-to first TDD exercise; I have students do it using “TDD paint by numbers,” i.e. they are provided with the tests already written (then uncomment and implement them, one-by-one). TDD paint-by-numbers allows you to launch very quickly into the first exercise, with minimal need for up-front discussion or explanation. Particularly, you can avoid any discussion about the testing or assertion framework.

Often I will show students a horrible implementation of the name normalizer before discussing their exercise, then ask them what sorts of problems it exhibits. I ask how safe they would feel if they had to add a new feature.

Recently I have been starting this exercise with support for the first ~3 tests already coded. First, this allows me to set the stage for what I hope their code looks like (highly declarative). Second, since they aren’t coding the first test, I can defer the typical angst about providing a hard-coded first implementation. (That discussion comes up and is addressed in a second exercise.) Third, it suggests that TDD isn’t just for “from-scratch” things. Fourth, it allows me to reiterate the point about the safety of making incremental additions to existing code with good tests.

Additional behaviors for which students may want tests (depending on their implementation or confidence level) can include appending suffixes to mononyms or stripping spaces from mononyms.

If I’ve shown students the horrible implementation first, talked a bit about declarative coding or programming by intention, and then stayed atop of them as they do their exercises, it’s possible they will produce reasonably refactored code for this exercise. (But you know how people are…)

Without these caveats, the solutions are generally a big mess. I’ve sometimes gone the route of letting them produce a mess, then re-running the exercise as a demo or in a mob, in which case I press the issue about appropriate refactoring.

My GitHub page contains a repository for Name Normalizer, in which you can find some starter tests in various programming languages. Others have already contributed; please feel free to do so. If you poke around at the branches, you will find some sample solutions as well (not all languages come with solutions–feel free, too, to provide one).

** For pairing TDD novices. Impacts to duration can include: