SCALING

Most scaling problems are diagnosed after the team breaks. I designed the solution before hiring the first engineer.

The brief was straightforward: take a legacy system, modernize the architecture, and absorb a large wave of new hires. The company was in post-COVID expansion mode, budgets were open, and the pressure was to grow fast.

I had seen this movie before. I knew how it ended when you skipped the planning.

The context

The team was five people working on a legacy monolith. The business needed more features, faster. The answer from above was to hire, and hire quickly. Some of the incoming engineers were earmarked for new teams that did not yet exist. In the meantime, they needed somewhere to go, and my team was one of the destinations.

We were also in the middle of an M&A integration. A scale-up had recently been fully absorbed into a larger corporate group, which meant two engineering cultures, multiple systems to integrate, and organizational complexity layered on top of the technical complexity. It was not a clean environment in which to triple a team.

The task assigned to me was specific: take the legacy monolith, migrate it toward a modern architecture, and scale the team to handle the expanded scope. The architectural direction, moving away from the monolith, had already been decided. What had not been decided was how to do it while the team was growing and the system was still in production.

The pressure

The instinct in a fast-growth phase is to add people and trust that velocity will follow. It does not. A monolithic codebase with five people is already a coordination challenge. With fifteen, it becomes a collision course. Everyone is working in the same system, stepping on each other's changes, creating merge conflicts, blocking each other's deployments. The team gets bigger and slower at the same time, which is the worst possible outcome when the business is expecting more output, not less.

I had seen this pattern enough times to know that adding people to a monolith without changing the architecture first does not scale the team. It scales the coordination problem.

The question was not whether to split the system. That was already decided. The question was when, and how to do it in parallel with a hiring wave that was not going to wait.

The diagnosis

The core insight was simple: the architecture and the team structure had to evolve together, not sequentially. You cannot split a team cleanly if the system they work on is still a single coupled unit. And you cannot break a monolith into independent services if the team has no clear boundary to organize around.

Conway's Law applies here in a direct way. The shape of the software tends to mirror the shape of the organization that builds it. If you want two autonomous teams, you need two autonomous systems. If you design the system boundaries first, you can design the team boundaries to match, and the two reinforce each other instead of fighting each other.

The other thing I knew was that timing matters enormously. The right moment to split a team is when it reaches a size that makes coordination visibly harder, but before it reaches a size that makes the split itself chaotic. Too early and you fragment knowledge before it has been distributed. Too late and you are trying to reorganize a team that is already dysfunctional.

The approach

The architectural split and the organizational split were planned in parallel from the start. The monolith was divided into bounded domains, each with clear ownership and explicit API contracts defining how they communicated. The contracts were the key: they allowed teams to evolve their own systems independently without requiring constant synchronization.

The team growth followed the architecture. As new engineers joined, they were assigned to specific domains rather than to the monolith as a whole. This gave them a defined scope from day one, which made onboarding faster and prevented the diffuse confusion that comes from dropping someone into a large shared codebase without clear ownership.

The split into two squads happened through mitosis, not surgery. The original team grew to a point where it made sense to divide, then divided into two units of roughly equal size, each with a clear domain mandate and an experienced core. New hires continued to join the expanded units, not the original group.

Communication between the two teams was handled primarily through the API contracts and asynchronous protocols, written documentation, architecture decision records, defined interfaces. The goal was to make alignment a property of the system rather than a property of the calendar. The teams did not need to be in the same meetings every day because the contracts told them what the other side expected.

Onboarding planning was treated as a first-class concern. Who joins when, which domain they start in, who is responsible for bringing them up to speed. This was not left to chance or to the general hiring timeline.

The result

The team went from five to fifteen engineers across two autonomous squads. The monolith was decomposed into independent services aligned to team boundaries. Delivery was maintained throughout the growth period, with no significant regression in velocity during the transition.

The architectural blockers that had been inherent in the monolith, the merge conflicts, the deployment dependencies, the shared state problems, disappeared as the split progressed. The two teams could work at full speed in their own domains without waiting on each other.

The structure built here became the foundation that the company carried into the following years, including through the headcount reduction described in the crisis management case. A well-designed team topology does not just help during growth. It also determines how resilient the organization is when conditions change.

What I took from this

Scaling a team is an architectural problem before it is a people problem.

The common failure mode is to hire first and reorganize later, under the assumption that more people means more output and that the structure can be sorted out once the team is bigger. That assumption is wrong. The structure determines whether the new people are an asset or a source of friction, and by the time the friction is visible, it is much harder to fix.

The parallel evolution of architecture and organization is not a complicated idea, but it requires committing to a plan before you have all the information. You are designing for a team size you have not reached yet, and an architecture that does not fully exist yet. That requires judgment about where the natural boundaries are and discipline to hold to them as the pressure to just add people builds.

The other thing that matters is the contracts. Autonomous teams only work if the interfaces between them are explicit and stable. Without that, autonomy becomes isolation, and isolation becomes divergence. The contract is what allows two teams to work independently without losing coherence.