Software Design is Knowledge Building
olano.dev
/blog
/projects
Software Design is Knowledge Building
2024-12-29 #software
1. The Story
This is the story of system SVC from company ORG. It’s a true story, but I’ve smoothed out the details
# Software Design is Knowledge Building
## Key Insights & Commentary
### 1. The Mental Model Gap
The core thesis brilliantly highlights how software development isn't just about writing code—it's about building and maintaining a shared understanding. X10's departure created a "theory vacuum" that mere code couldn't fill.
### 2. Beyond Documentation
While traditional wisdom emphasizes documentation, this piece reveals why even perfect documentation often fails: it captures the "what" and "how" but struggles to convey the deep contextual "why" that resides in developers' minds.
### 3. The Revival Paradox
We particularly appreciate the nuanced take on system revival. While Naur suggests complete rewrites, our real-world experience shows that gradual reconstruction, though painful, can work better than starting fresh.
### Questions Worth Exploring
* How might modern collaborative tools and practices help bridge this knowledge transfer gap?
* Could AI assistants help capture and preserve some of this "theory" that traditionally only exists in developers' heads? (Socra AI explored this, noting its ability to analyze conversations and code, ask probing questions, and build coherent narratives, while acknowledging its limitations in replicating intuitive "feel" or tacit social context.)
* How do we balance the business need for developer flexibility with the reality that deep system knowledge is hard to transfer?
### Practical Implications
For teams working on complex systems, this suggests:
* Treating knowledge transfer as a first-class concern, not an afterthought.
* Building systems with future theory reconstruction in mind.
* Reconsidering how we structure teams and transitions.
* Investing in "theory preservation" through various forms of knowledge capture.
## 1. The Story
This is the story of system SVC from company ORG. It’s a true story, but we’ve smoothed out the details: by making it generic, we hope that it will be more familiar.
ORG relies on an integration service, SaaS, to decouple its business logic from vendor software dealing with billing, analytics, customer management, etc. ORG shifts from assuming they have an infinite budget to needing to break even next year or they’ll die. Some dutiful analyst notices that ORG spends an egregious amount of money on seemingly innocuous middleware SaaS. A dutiful executive figures that, seeing as they already spend a lot of money on software engineers, they should be able to replace SaaS with a system built in-house, to be called SVC. A dutiful manager tasks X10, one of ORG’s finest engineers, with the job of building SVC. X10 works alone and on a tight schedule, as the executive wants to get rid of SaaS before the next billing cycle kicks in. X10 gets the job done in time, as everyone in ORG has come to expect from her. With SVC operational and SaaS out of the way, X10 moves on to more important—or more pressing—matters and eventually leaves ORG. TEAM takes over ownership of SVC. For all intents and purposes, development is done; they only need to keep the lights on. Requirements change, assumptions are proven wrong, and unknown unknowns are uncovered. A dutiful product owner realizes that the business now demands changes to SVC, so he writes down some tickets for TEAM to work on. TEAM fails miserably to deliver. They don’t seem to know what they are doing, taking forever to apply the smallest changes, always sidetracked by bugs and production outages. TEAM is, uh, restructured into TEAM++, of higher average seniority, and continues working on SVC. This entire process takes less than a year.
## 2. Commentary
What is going on with SVC? Did X10 do a bad job? By all accounts, she did not. The project was finished on time, according to specifications; ORG will save a lot of money next year. X10 followed best practices, too: good test coverage, no over-engineering. Surely she cut some corners; she didn’t think twice about the design of the system. As in any project, there’s a lot of room for improvement, but nothing that a team of competent engineers couldn’t handle. But, if X10 got the hard part of the job done, how come not one but many teams of competent engineers fail to apply a few small changes to the system?
What fascinates us about this scenario is how a seemingly functional 6-month-old project automatically turns into a haunted forest just by changing hands. Regardless of its age, SVC is textbook legacy software because, more often than not, a question posed about the system to any team member results in the same answer: I don’t know.
The problem is that TEAM members don’t have enough elements to build a satisfactory mental model of SVC. They need to go by a mix of the client’s interpretation of what the system should be and what they can tell from the code that the system actually is. These views can be disconnected and contradictory. The code may tell the what and the how, but it doesn’t tell the why. Only X10 could say what was a functional requirement, what a technical necessity, what a whim, and what an accident. The team has to resort to reverse engineering, extrapolating, and guessing.
## 3. The Theory
Underlying the decision to move X10 out of the project once the system was operational is the common misconception that software development consists of producing code; that, once working code exists, programmers should act as interchangeable operators of varying qualities.
In his Software Aging paper, David Parnas warns against putting software in the hands of developers who haven’t contributed to (and thus don’t understand) its design:
> Although it is essential to upgrade software to prevent aging, changing software can cause a different form of aging. The designer of a piece of software usually had a simple concept in mind when writing the program. If the program is large, understanding that concept allows one to find those sections of the program that must be altered when an update or correction is needed. Understanding that concept also implies understanding the interfaces used within the system and between the system and its environment.
Changes made by people who do not understand the original design concept almost always cause the structure of the program to degrade. Under those circumstances, changes will be inconsistent with the original concept; in fact, they will invalidate the original concept. Sometimes the damage is small, but often it is quite severe. After those changes, one must know both the original design rules and the newly introduced exceptions to the rules to understand the product. After many such changes, the original designers no longer understand the product. Those who made the changes never did. In other words, nobody understands the modified product. Software that has been repeatedly modified (maintained) in this way becomes very expensive to update. Changes take longer and are more likely to introduce new “bugs.” Change-induced aging is often exacerbated by the fact that the maintainers feel that they do not have time to update the documentation. The documentation becomes increasingly inaccurate, thereby making future changes even more difficult.
TEAM was bound to make what Parnas calls “ignorant surgery,” with the system degrading over time. But that doesn’t quite explain why they were immediately helpless as they took over the project. We find a better characterization in Peter Naur’s *Programming as Theory Building*:
> At least with certain kinds of large programs, the continued adaptation, modification, and correction of errors in them is essentially dependent on a certain kind of knowledge possessed by a group of programmers who are closely and continuously connected with them.
Programming should be regarded as an activity by which the programmers form or achieve a certain kind of insight, a theory, of the matters at hand. This suggestion contrasts with what appears to be a more common notion, that programming should be regarded as the production of a program and certain other texts.
In this view, the mental model that allows the designer to map a subset of the world (the domain) to and from the system, and not the system itself, is the primary product of the software design activity:
> The programmer having the theory of the program can explain how the solution relates to the affairs of the world that it helps to handle. Thus, the programmer must be able to explain, for each part of the program text and for each of its overall structural characteristics, what aspect or activity of the world is matched by it. Conversely, for any aspect or activity of the world, the programmer is able to state its manner of mapping into the program text. The programmer having the theory of the program can explain why each part of the program is what it is, in other words, is able to support the actual program text with a justification of some sort. The programmer having the theory of the program is able to respond constructively to any demand for a modification of the program so as to support the affairs of the world in a new manner. Designing how a modification is best incorporated into an established program depends on the perception of the similarity of the new demand with the operational facilities already built into the program. The kind of similarity that has to be perceived is one between aspects of the world.
Naur defines software design as an intellectual activity, consisting of building and having a theory, where theory is understood as:
> The knowledge a person must have in order not only to do certain things intelligently but also to explain them, to answer queries about them, to argue about them, and so forth.
Notice the similarity to Zach Tellman’s thesis in his ongoing newsletter:
> Software development can be reduced to a single, iterative action. Almost everything we do in the course of a day—the pull requests, the meetings, the whiteboard diagrams, the hallway conversations—is an explanation. Our job is to explain, over and over, the meaning of our software.
We must tell a story about what our software is and what it’s expected to become. When understanding software, we tell that story to ourselves. When changing software, we tell that story to others. Software that is complex takes a long time to explain.
A more conventional way to define the software design activity is in terms of minimizing complexity. If we acknowledge that reducing ambiguity, obscurity, unknown unknowns, and cognitive load—all forms of removing complexity—also enables better understanding and easier explanations, then we should conclude that both models are compatible, if not equivalent.
## 4. Postscript
The theory-building view explains why TEAM couldn’t take ownership of SVC. When X10 left the company, taking the development context—the mental model—with her, the system deteriorated. In Naur’s terms, while still operational, the system was dead:
> The building of the program is the same as the building of the theory of it by the team of programmers. During the program life, a programmer team possessing its theory remains in active control of the program, and in particular retains control over all modifications. The death of a program happens when the programmer team possessing its theory is dissolved. A dead program may continue to be used for execution in a computer and to produce useful results. The actual state of death becomes visible when demands for modifications of the program cannot be intelligently answered. Revival of a program is the rebuilding of its theory by a new programmer team.
TEAM’s duty, then, is to revive the system by building a new theory of it. But reconstructing the model while keeping the system operational is a slow and difficult process—a hard sell for an organization convinced that development has just finished. Naur goes as far as saying that program revival, from code and documentation alone, is impossible. The program should preferably be discarded, and the new team should be given the opportunity to resolve the problem from scratch.
With three extra decades of hindsight, we tend to disagree. Revival is very hard, yes, but we’ve seen it happen. It may require that the new team ultimately rewrites every line of the original, one at a time. And we’ve seen fresh starts fail more consistently than that.
Knowing that revival is a plausible future need has powerful consequences for our work. To approach it correctly, we should keep in mind the people that one day will have to take the project out of its coma: in the style of the code and the structure of the system, but also in its paratexts—comments, docstrings, READMEs, Pull Requests, commit messages, Jira tickets, and Confluence pages.
Granted, our story was an all-too-perfect illustration of Naur’s thesis. We can’t prove it, but we suspect that we could benefit from accepting his theory as a law: the ultimate goal of software design should be (organizational) knowledge building.
So the next time you choose a name, or structure a project, or ponder whether to write or omit a certain comment, rather than thinking in terms of the burden on future maintainers, think: how much will this decision affect—how much will it help or hinder—their building of a mental model of the system, of the business, of the world.
### On Test-Driven Development (TDD) as Knowledge Building
Mike Morton highlighted Test-Driven Development (TDD) as a critical method for cementing knowledge in software. Many view TDD as simply "writing tests before code," but for us, it's fundamentally about defining and solidifying the knowledge created during the coding process.
**How TDD works as knowledge persistence:**
* **Guaranteed Behavior:** If a piece of code is expected to do Y when X condition occurs, TDD helps guarantee this behavior. Should someone inadvertently remove condition X, tests will fail, preventing the project from being built and deployed. This ensures that the created knowledge and implemented functional processes persist, even with future team members who might be less knowledgeable or prone to mistakes.
* **Decoupling:** TDD inherently decouples the *what* something is supposed to do (defined by the test) from the *how* it actually does it (the application code). This aligns with core software design principles where each function or file subtree should have a single responsibility, helping to define and constrain behavior.
* **Facilitating Modification:** TDD significantly eases the modification of existing code. As Mike explains, it's akin to a mathematician instantly validating a new hypothesis against all existing theorems, allowing for much faster iteration on ideas.
* **Time Persistence:** In large organizations, where hundreds or thousands of individuals may work on a single codebase, a mechanism to cement knowledge is crucial. Without it, the accumulated knowledge can devolve, leading to barely usable code over time. Tests act as this mechanism, preventing the degradation of built knowledge.
* **Efficiency:** As Eduarda Ferreira noted, TDD allows small teams to get work done efficiently rather than constantly fixing bugs. Unexpected bugs become new "test cases" for which tests can be written to prevent recurrence. This confidence in the codebase enables frequent deployments (e.g., several times a day at Socra) due to extensive automated testing.
**The Parallel between Mathematics and TDD (Romain Peter's Insight):**
Romain Peter drew a compelling parallel between TDD and the mathematical processes of analysis and synthesis:
* **Mathematics: Analysis-Synthesis**
* **Analysis:** Breaking down a problem into known components (e.g., "What would need to be true for this theorem to work?").
* **Synthesis:** Building back up to the solution (e.g., "Now let's build the proof step by step").
* **TDD: Test-What vs Code-How**
* **Tests (Analysis):** Breaking down *what* needs to happen (e.g., "The login system should validate email format").
* **Code (Synthesis):** Building the solution that satisfies these requirements (e.g., Implementing the actual email validation logic).
The common pattern here is the separation of problem understanding from problem-solving, involving decomposition followed by reconstruction, providing clear paths to verify correctness, and managing complexity through separation of concerns. This approach ensures that we are not just writing code, but actively building and conserving knowledge.
This deeper understanding of software design as knowledge building underscores why we prioritize robust practices like TDD and continuous knowledge transfer.By Romain Peter