Book Notes: A philosophy of software design
John Ousterhout is in good company when he claims that the fundamental issue with building software is managing complexity. For example, according to Ben Mooseley et al. semenal paper on programming, the biggest bane in large-scale software systems is managing complexity. It is what grinds large systems to a halt. Enter John’s book, A Philosophy Of Sofware Design, a book that he says has two purposes:
- Understand the nature of complexity and how it arises.
- Provide the reader with guidelines to prevent the system from becoming unnecessary complex.
In this blog post I jot down interesting ideas that I took away from the book; please note that these are not exhaustive nor are they a substitute for reading the book.
John defines software complexity as follows: Anything related to the structure of a software system that makes it difficult to understand and modify the system. This could be a class/function that was written in an extremely convoluted manner, or on a macro scale, the system interacts with a couple dozen services and it’s ambiguous what any service actually does. He gives a rough mathematical definition of complexity:
C = SUM_OF(cp * tp)
The equation above can be read as: The total complexity of a system (C) is equal to the sum of all parts of the system weighed by how complex the part(cp) is, multiplied by the amount of time a developer has to spend dealing with the part(tp).
It follows from the above that if a system has a few parts that are very complex(high cp), but they rarely change(low tp), it could be that they may not add a lot of complexity to the system.
The symptoms of complexity are:
-
Change Amplification: A seemingly simple change requires modifying the code in multiple places.
-
Cognitive Load: Refers to how much a developer must know about the entire system in order to complete a task. A higher cognitive load means that the developer need to know a lot about the system, which increases the likelihood of bugs because he may have missed something important.
-
Unknown unknowns: According to John, this is the worst one; the developer never has a complete idea of what he needs to know about the system in order to complete a task, so he writes some code and prays that he hasn’t broken something.
At a high level, there are two main causes for the symptoms outlined above:
1.Dependencies: On a micro level, a function could have many dependencies because it has a dozen different parameters, on a mezzo level, a small class has two dozen public properties which obfuscate what the class actually does, and on a macro level you can imagine the system interacting with a dozen different micro services. In each of these instances the dependencies definitely lead to a higher cognitive load and probably unknown unknowns.
2.Obscurity: This occurs when important information is hidden from the developer; a variable is improperly named or a very convoluted piece of code has no (or even worse, misleading) documentation.
Tactical Programming: Writing code such that the task gets done as quickly as possible; this mindset involves very little (if any) thinking about the the program and how the code being written fits into the overall architecture of the program. If the code is working, it’s good enough.
Strategic Programming: Working code isn’t good enough; the most important thing is the long-term structure of the system and the primary goal must be to produce great design. Strategic programming requires an investment mindset. John suggests to spend 10-20% of development time thinking about the code architecture and finding ways to improve the quality of the code. This amount of time is small enough to not greatly slow down development time while being large enough to accrue benefits.
John claims that a common issue in software development is having too many classes(a condition he calls classitites), leading to high cognitive load as the developer needs to toggle between various classes to make sense of a given functionality. He argues that modules should be deep; A deep module is one which provides a lot of functionality while having a very simple interface.
Information Hiding refers to the process of ensuring that a module’s design decisions aren’t reflected in it’s interface. This is achieved by exposing only enough information that is needed by the rest of the system. Information hiding is contrasted with information leakage, which occurs when a module’s design decision is reflected in multiple places in the system. Information leakage occurs every time you add a piece of information to a module’s interface or make a method public.
One very common source of information leakage is Temporal Decomposition, which is a a phenomena that occurs when the structure of a system reflects the order in which operations occur. For example, consider a system that reads a file, modifies it, and then writes to the file. It’s intuitive to create three different classes to handle this process (1.FileReader, 2.FileModifier, 3.FileWriter). However, now both FileReader and FileWriter have knowledge about the file format, which results in information leakage.
When building modules a good guideline to follow is that they should be somewhat general purpose; meaning that the class should reflect your current needs but it’s interface should be more general purpose. This ensures that the developers can implement the interface through out the system without having to learn about it’s intricacies, thereby reducing cognitive load.
Software systems are composed in layers, and in a well designed system the top most layers use the functionality provided by low level systems. In these systems every layer provides an abstraction that is different from the layer above/below it. In a poorly designed system there are lots of adjacent layers with shared abstractions, meaning that the developer has learn about multiple classes in order to fully understand a functionality; increasing cognitive load and change amplification. Two symptoms of adjacent layers with shared abstractions are:
1.Pass Through Methods: This method simply calls another method in another class. For example :
function CapitalizeText(text: TextClass): string {
return text.toUpperCase();
}
In this case both the text class and CapitalizeText method provide the exact same abstraction; methods like these add very little functionality to the system but having enough of these will make the system increasingly complex.
2.Pass Through variables: These are variables that are passed from class A to class B just so that they can be used in class C and can be a symptom of leaky abstractions. One should think about merging the classes if possible.
A common conundrum when designing classes is that whether a given functionality should be separated into a new class or added to a an existing one. As discussed earlier, the issue with creating a class around a small functionality is that you create shallow modules; leading to a high cognitive load.
1.If two pieces of code share information consider bringing them together in the same class.
2. If the interface will become simpler (which will occur if two classes each provide part of the solution), combine the two classes.
3. If you find that you’re duplicating code, consider merging the classes.
4. Separate general-purpose and special-purpose code; if a class contains functionality that is used by several different modules , it should only have that functionality; special purpose code (code that’s only suited for a unique scenario/a single class) should live in a different module.
5. If a developer is unable to understand one method without understanding another method, the methods are conjoined; consider merging the two methods.
The next four chapters talk about the benefits of writing comments. The main reason for writing comment is to capture information in the mind of the designer that can’t be expressed by the code. Writing comments also has two main benefits:
- Decrease cognitive load by communicating the intent of a method/class(especially if it’s a deep module) so that the developer won’t have to read/analyze code that’s not useful for them.
- Reduce unknown unknowns through clearly articulating the structure of the systems and pointing out hidden dependencies and obscure pieces of code.
The guiding principle for writing good comments is that comments should describe things that are not obvious from the code. A developer should be able to read the comments for a module and be able to understand the abstractions offered by the module. Some tips to write good comments are:
-
Pick conventions: For example, every class will have comments at the very top outlining what the class does. This makes it more likely that you’ll actually write comments because you have a clear framework for how to write comments.
-
Do not repeat the code: Avoid writing comments that are obvious from reading the next line of code , they actually increased the cognitive load on the developer and are a hinderance.
-
Comments augment code by providing information at different levels: It is helpful to group them into two categories:low-level comments augment the code by adding precision to the code; for instance, you write low level comments when you’re describing how the parameters are used by a method. In contrast, high-level comments augment the code by enhancing intuition for how the code works; when you provide an overview for how a module/method is executed, you are writing a high-level comment. When writing comments, try not to mix high-level and low-level comments as the documentation may end up confusing the reader.
-
Write comments before you write the code: Very often comments are the last part of the application development cycle; once the code is working and has been tested, developers will write a half-hearted ode to the code; after all, the code is already working so why bother with the comments ? Oh, and not to forget, The Code Is The Documentation. This approach results in documentation that is of poor quality at best. A much better approach is thinking of the act of writing comments as part of the application design process; writing comments should be a precursor to writing code.
For me this book had three important takeaways:
- Modules should be deep; do not be afraid of large functions and classes. I read Clean Code by Robert Martin early on in my career and it provided me with useful heuristics for designing software; classes should be small, function should be small(er) and should not have more than three parameters. I took this advise to heart and although I still think it’s a good rule, there have been instances where I’m developing a complicated piece of functionality and I break it into five small functions like so:
function complexOperation() {
var resA = funcA();
var resB = funcB(resA);
var resC = funcC(resA, resB);
var resD = funcD(resB);
funcE();
}
what ends up happening is that I’d have to toggle back and forth between functions to fully grasp the logic. Although the smaller functions were easier to reason about in isolation(because they were small and the function name served as an apt label), it was a little more work to understand them in the context of the broader function because the smaller functions were in-fact, related to one another. This is an example of conjoined methods (discussed above in chapter 9 notes) and the advise here to merge methods if you cant fully understand what a method does is pretty solid.
-
Thinking of complexity before it arises: I’d often be reactive when thinking about complexity; I could tell whether a module is complex but the book constantly underscores how systems get complex; it’s often a death by a thousand cut. Since reading the book I’ve noticed that I think more about how a given functionality adds to the system’s complexity as a whole.
-
(No) Comments: The chapters on writing comments are as insightful as they are amusing and are worth reading if only for their entertainment value. Just like flossing, I know the act of writing comments is good for me but I’ve not been very consistent at it. The advise given by John here is very practical; especially that of writing comments before writing the code; I’ve noticed that writing comments has actually become interesting and makes for writing better comments as comments are no longer seen as drudge work but as a way of producing high quality code.