Architecture Corruption

2016-01-03

Wondering why your project takes longer and longer time to compile and build? Wondering why fewer and fewer people know about how the whole system works? Wondering why many responsible developers giving up refactoring messy code in the system and accepting it as “it works, do not change”? Ever wondering why and how technical debts accumulate under your nose all the time?

More specifically, why on earth we have to open 79 projects in Visual Studio or Xcode to compile an application? How many times does this ring a bell to us and we had to accept it as “this is how it works”? How many hours and money we have already wasted by compiling projects that have not been changed in last 3 months, or even worse, have not been touched in last 2 years, multiple times a day, both locally and on CI?

To understand and explain these problems, we have to start from the very beginning of any project.

Phase 1, everyone is happy (more or less).

This is the beginning, like the big bang of the universe. Things are simple. There is no supermassive black holes or huge giant stars and your product has a small but promising feature set.

Luckily you have the best developers on earth, not only the product was developed quickly but also the code base and the dev process is taken care of by your talented and responsible developers:

Project is well designed and decoupled to a few modules
Components are having clear separation of concerns
Clear and unified API between components
Tests are there
Continuous Integration and maybe Continuous Delivery to some degree.

IDE opens the project lightening fast. Local builds from command line finish on you half way to the coffee machine and CI builds complete before you find the first interesting link on Hacker News.

There is no reason for not being happy, so it is extremely fun working on this project and the productivity is like Chuck Norris on steroids.

Phase 2. so long, the good golden days

Your product is a great hit! The customer base is exploding.

To keep customers happy and paying, more feature sets need to be added.

You need to expand the team, recruiting the best developers you can find in the country.

New joiners with the help of existing senior, talented, responsible developers, quickly learned how the existing project works as it is well designed and simple and new joiners are also talented.

Code base starts to explode as well like customer base, attributing to more developers finishing features like Chuck Norris on steroids.

While you are hosting a meetup showing off the cool design you use in the product and explaining how awesome it is working in this company, attracking more talented and cool developers to join your company, somethings are quietly happening to your project:

With so many features, to avoid code duplication and reuse, more and more common functionalities are extracted to modules
A few components start to have overlapping responsibility
Some APIs start to look different than others
A few components start to become tightly coupled and even worse, circular depending on each other.
Fewer tests were written and many new functionalities have zero coverage
Builds are slower that before, they usually take a cup of coffee to finish.
Developers don’t like this, but to satisfy the growing customers and to nuke the shameless copycats that are stealing customers, they had to trade off.

One of the tradeoff people hates are longer and longer build time. Responsible developers don’t want to play sword while compiling and task switching kills productivity.

Luckily, the team noticed some technical debts are quietly accumulating, and while producing new features, they start to add more tests, refactor bad smell code, refactor build scripts and tweak build locally and on CI. Well, these debt removing is just not as fast as developing features like Chuck Norris on steroids and not as interesting as feature developing either.

Anyway, with efforts, the increasing build time seems to be controlled and the build time is stable at no more than a cup of coffee. Some bad code is refactored and it is a little bit clearer when looking at the code base. You are not super happy, but still satisfied. After a few months, some developers start to miss the old Chuck Norris on steroids days, since now the productivity is just like Chuck Norris, or like Chuck Norris without martial arts and weapon.

Phase 3. The return of the Tar Pit

The original product is still a great a success and with the huge customer base, you discovered that another product(s) can keep them paying and not getting bored, and attracting new customers.

Not only expanding the original product team, the company creates several new teams for the new products.

You are recruiting as many talented and responsible developers on earth as you can find.
Common modules in the original product are extracted to be a separated internal foundation project so that new products can use it directly.

To jumpstart the new teams, many senior developers in the original team are moved the new teams to teach and lead the new joiners.

Senior developers left of the original team are split even further, some of them are moved to new extracted “Foundation” project.

A few senior developers missed the old Chuck Norris on steroids days so much that they moved their feet to another company.

There are not so many senior developers left on any of the teams, due to the split up and quitting.

The Foundation project is not provided as compiled binaries, but source code instead. The motivation is to ease and encourage contribution from product teams.

The reality is, except the original product team, most of the new product teams make contributions rarely.

All products want to provide new features that need new support from Foundation project, hence more modules are added to the foundation by the foundation team or as contributions.

With the explodes of products and features, the foundation project grows rapidly and people start to think to split the foundation project even further.

Now to open a product source code within an IDE, you have to open 79 modules/projects, including 60 projects from the Foundation project.

A coffee break is far from enough to finish a local build and you need five of them. CI build won’t finish even after reading 5 interesting articles on Hacker News including all comments.

Everyone is not happy and recruiters have to explain to new joiners the rumors of not having test environments and unit tests.

APIs in the foundation project is a mess now since the speed of refactoring and teaching new joiners is much slower than bad code creation.

Modules in the Foundation project have complex dependency and many of them are circular.

To address these issue tremendous efforts have been done:

Set up a CI team to focus on the CI infrastructure.
CI team bought a lot of most beefy servers for CI
CI team implements distribute builds.
Code reviews are implemented
Upgrade developers computer to latest models as often as possible

But most of the time, developers had to trade off the clean code with features requests. After a few month, the situation is like a Tar Pit. A few more developers both senior or new decides to seek happiness and legendary Chuck Morris somewhere else. Not so many developers are left knowing the whole picture and can explain everything in the 79 projects in the IDE.

Many existing developers more than once want to reduce the technical debt but decided and being suggested not to do so because of tightly coupled modules and complex dependencies and unpredictable result.

It takes a new joiner longer and longer time to understand the complex architecture and how everything works together.

And the productivity? Teams still release products like ninja turtles but without ninja.

Some important features are postponed and even canceled because the risk and cost of implementing them do not justify the business case, which due to high technical debts.

What went wrong?

People like simplicity. But it is extremely hard for teams to keep simple after the project grows rapidly for two years.

Producing software is not like labor intensive industry: the more people added the more produced output. As it has been pointed out 40 years ago by the book “Mythical Man-Month”, the more people/teams added, it produces not only more features but also chaos. In the short term, this chaos can be solved and digested in many forms by teams, but in the long term, as old developers leave and new joiners join, with the natural forgetting process, feature combinational explosion, the code base will inevitably out of control and as long as the projects are not finished, the architecture gets more and more corrupted day by day.

Any way to avoid this?

It seems like there is no silver bullet. But for the example above, there are definitely ways to ease the situation.

The root cause of most issues in Phase 2 and 3, is that the code base gets bigger and bigger, which is natural for a rapidly growing product or product families. It is easier to ignore issues in a large code base than in a smaller code base.

So to avoid the Tar Pit, the key is to control the size of the project to a certain range.

It is not the easiest thing on earth. There will always be pressures for delivering new features and new products. But there are still ways to limit the code base:

First way: focusing on the relevant code base

Refactor modules to binary isolated projects. It means that components that have fewer modifications in last few month get extracted to a totally separated project.
Provide binary release instead of source for the binary isolated projects
The final product depends on the binary formats of the infrequently changed modules.

The first immediate benefit from this approach is much shorter build time. Now you only need to compile the relevant code of your product and link other projects in binary format.

Second way: Isolated processes or so-called micro services

Separate different functionalities to isolated processes in the product.

It might be hard to understand for the isolated process concept, but here is an example:
The traditional way of implementing a music player is to provide it as a monolithic desktop application. Imagine an extremely successful music or video streaming company, it will face same challenge explained in this article.

So instead of growing the monolithic desktop application, the app is redesigned to a wrapper that showing independent micro functionalities in different windows, mostly web based and html5 enabled.

By this way, they limit the size of the code base of each micro-service/functionality and by doing this, it is forced to always have a clear separation of concern between services and it will be extremely hard to introduce highly coupled modules.

Bad for encouraging contribution?

The first way, distributing un-relevant code as binary libs when doing a contribution, it is not as easy as having the project you are going to contribute to in those 79 projects in your IDE at the same time. But it doesn’t make contribution impossible. Think about the Java Ecosystem with binary dependency management tools like Ivy, Maven, Gradle, it never stops people contributing internally inside a company or outside to open source libraries like Spring, JUnit, Jetty, etc. By using binary management systems, it never discouraged people from contributing to libraries.

So that question here is, how do you see the easiness of doing a contribution versus the benefits of keeping the relative code base small and avoiding the Tar Pit sooner or later you will face? Which way gains more for the teams and company?

Microservices for native applications

And the second way, since it separates functionalities to micro services running as isolated processes on servers, it will be very hard for native applications like games. But still, it is a very interesting and genius way to attack the problem.