Eliminating duplication is rule #2 of Beck’s Simple Design.
Here’s a starter list of the costs that duplication creates:
- increased effort to change all occurrences in the system. Think about 5 lines of logic duplicated 30 times throughout your codebase, and now they need to all be changed.
- effort to find all occurrences that need to change. This is often compounded by small changes (e.g. renaming) made over time to the duplicated code.
- risk of not finding all occurrences in the system. Imagine you need to change the 5 lines of logic; you locate 29 occurrences but miss the 30th. You’ve now shipped a defect.
- risk of making an incorrect change in one of those places. 30x increases chances of screwing up by 30 times… or more. Tedium induces sleepiness.
- increased effort to understand variances. Is that small change to one of the occurrences intentional or accidental?
- increased effort to test each duplicate occurrence in its context. And more time to maintain all these additional tests.
- increase in impact of a defect in the duplicated logic. “Not only is module A failing, but so are modules B through Z and AA through DD.”
- increased effort to understand the codebase overall. More code, more time, less fun.
- minimized potential to re-use code. Often the 5 duplicated lines are smack-dab in the middle of other methods, minimizing your potential for some form of algorithmic substitution.
- self-perpetuation. A codebase and culture that doesn’t promote code sharing just makes it that much harder to do anything about it.
No doubt some of you can add additional costs associated with duplicate code in a system. Please do (in the comments).