reading

A Philosophy of Software Design

《A Philosophy of Software Design》是斯坦福大学 John Ousterhout 教授写的一本关于如何减少软件复杂度的书。 软件复杂度在软件的迭代中会不可避免的增加,复杂度增加对于程序代码的阅读者来说会增加学习成本,增加后继开发者的出bug的几率。 简单的软件设计可以让复杂度增加的慢一些,所以在软件开发的生命全周期里面软件设计需要一直应用。而软件设计的技能并不是天生的, 是可以习得的。

The Nature of Complexity

开发人员拥有鉴别复杂度的能力很重要。那么什么是软件复杂度? 软件复杂度是指任何让软件系统难以阅读,难以理解和难以修改的结构。 写软件的人通常无法准确意识到自己的程序的复杂度,但是其他阅读的人更能明白,以阅读者的评判为准。

Symptoms of Complexity

Cause of Complexity

dependencies -> change amplification and cognitive load obscurity -> unknown unknowns and cognitive load

Complexity is incremental

复杂度是递增的,累积的。每次引入一点点的复杂度,开发者往往不以为意,但是需要 “零容忍”的态度来对待引入复杂度。

Working code isn’t enough – Strategic vs. Tactical Programming

战略路径 strategic approach -> better design -> slower -> require an investment mindset -> proactive investment on good documentation -> the real hero -> take a few to fix it when discovering design problem

战术路径 tactical approach -> 尽快交付(新功能或者bug)-> complexities accumulation -> need refactor

How much to invest?

大的一次性的全局的design并不是有效率的 -> 随着开发系统的进行,理想的设计会一点一点的浮现出来 -> 10%-20% 的时间投资在思考更好的设计上 -> 投资会在未来收到回报

Startups and investment

初创公司更加偏向于短期的战术编程,以后再重构。但是如果代码最终变得像意大利面条spaghetti一样错综复杂,重构是很难的

软件工程师的质量也是很重要的,少量的优秀的会设计的工程师会比大量的平庸的工程师更加有生产力, 清晰的系统设计后期也更加可以吸引优秀的工程师加入团队。

Modules should be deep

What is modules design

One of the most important techniques for managing software complexity is to design systems so that developers only need to face a small fraction of the overall complexity at any given time. 各个模块相对独立,可以多人同步进行开发。

Modular Design

In modular design, a software system is decomposed into a collection of modules that are relatively independent. Modules can take many forms, such as classes, subsystems, or services. 但是module可能互相调用,需要直到彼此的一些信息,所以不存在绝对的module独立性,总该有点依赖。

管理依赖的入手点

The best modules are those whose interfaces are much simpler than their implementations.

What is an interface

Abstractions

抽象 An abstraction is a simplified view of an entity, which omits unimportant details.

The key to designing abstractions is to understand what is important, and to look for designs that minimize the amount of information that is important.

Deep modules

he best modules are those that provide powerful functionality yet have simple interfaces. I use the term deep to describe such modules

Unfortunately, the value of deep classes is not widely appreciated today. The conventional wisdom in programming is that classes should be small, not deep.

吐槽下java IO的设计,没有把 Buffer 当作默认设置,而是每次都要声明使用,导致程序员容易忘记。 没有 buffer 的IO会很慢的,应该没有这样的使用场景。

FileInputStream fileStream =
        new FileInputStream(fileName);
BufferedInputStream bufferedStream =
        new BufferedInputStream(fileStream);
ObjectInputStream objectStream =
        new ObjectInputStream(bufferedStream);

How to make modules deep

Information Hiding(and leakage)

Information Hiding

The knowledge(例如数据接口和算法) is embedded in the module’s implementation but doesn’t appear in its interface.

Benefits of information hiding

Hiding variables and methods in a class by declaring them private isn’t the same thing as information hiding

Information Leakage

相反,涉及到多个模块关联的设计决定就是信息泄漏,会产生依赖关系,涉及到设计的修改会牵扯到多个模块的修改。

例如两个模块都要操作同一个文件,一个读,一个写。即是接口中没有明示依赖关系,这种依赖难以察觉。

遇到了多个模块共享信息该怎么处理?

The common reason of information leakage

One common cause of information leakage is a design style called temporal decomposition, 一种以操作的时间先后顺序进行设计的结构。例如对一个文件的操作的设计分为三个步骤,也想当然的分为三个类:读文件,写文件,存文件。

When designing modules, focus on the knowledge that’s needed to perform each task, not the order in which tasks occur.

Overexposure

If the API for a commonly used feature forces users to learn about other features that are rarely used, this increases the cognitive load on users who don’t need the rarely used features.

such as buffer in java IO

Make classes somewhat general-purpose

The phrase “somewhat general-purpose” means that the module’s functionality should reflect your current needs, but its interface should not. Instead, the interface should be general enough to support multiple uses.

Example: Building a GUI text editor

Generality leads to better information hiding

How to design general-purpose module

Different layer, different abstraction

Pass-through methods:

A pass-through method is one that does nothing except pass its arguments to another method, usually with the same API as the pass-through method. This typically indicates that there is not a clean division of responsibility between the classes.

Having methods with the same signature is not always bad

One example where it’s useful for a method to call another method with the same signature is a dispatcher. the dispatcher provides useful functionality: it chooses which of several other methods should carry out each task.

Examples:

Decorators 装饰器

The decorator design pattern (also known as a “wrapper”) is one that encourages API duplication across layers. A decorator object takes an existing object and extends its functionality

Examples:

The motivation for decorators is to separate special-purpose extensions of a class from a more generic core.

Before creating a decorator class, consider alternatives such as the following:

Interface versus implementation

The interface of a class should normally be different from its implementation

Pass-through variables

Pass-through variables add complexity because they force all of the intermediate methods to be aware of their existence, even though the methods have no use for the variables.

Eliminating pass-through variables can be challenging.

Contexts are far from an ideal solution:

Pull Complexity Downwards 将复杂度拉向底层代码

It is more important for a module to have a simple interface than a simple implementation. Most modules have more users than developers, so it is better for the developers to suffer than the users.

开发人员很乐于把复杂度推给用户,但是对于降低软件复杂度不可取:

Better Together or Better Apart?

When deciding whether to combine or separate, the goal is to reduce the complexity of the system as a whole and improve its modularity.

The disadvantages of apart:

Indications that two pieces of code are related(better together):

Separate general-purpose and special-purpose code

In general, the lower layers of a system tend to be more general-purpose and the upper layers more special-purpose.

Example: editor undo mechanism

Some of the student projects implemented the entire undo mechanism as part of the text class. The text class maintained a list of all the undoable changes.

These problems can be solved by extracting the general-purpose core of the undo/redo mechanism and placing it in a separate class

public class History {
    public interface Action {
        public void redo();
        public void undo();
    }
    History() {...}
    void addAction(Action action) {...}
    void addFence() {...}
    void undo() {...}
    void redo() {...}
}

The History class knows nothing about the information stored in the actions or how they implement their undo and redo methods.

History.Actions are special-purpose objects

There are a number of ways to group actions; the History class uses fences

Splitting and joining methods

You shouldn’t break up a method unless it makes the overall system simpler Methods containing hundreds of lines of code are fine if they have a simple signature and are easy to read. Each method should do one thing and do it completely.

Conjoined Methods: If you can’t understand the implementation of one method without also understanding the implementation of another, that’s a red flag.

Define Errors Out Of Existence

Exception handling is one of the worst sources of complexity in software systems. The key overall lesson from this chapter is to reduce the number of places where exceptions must be handled; in many cases the semantics of operations can be modified so that the normal behavior handles all situations and there is no exceptional condition to report. 改变方法或者接口的功能描述,就可以把异常纳入正常的代码中。

Why exceptions add complexity

How to deal with the exceptions

The exception handling code must restore consistency, such as by unwinding any changes made before the exception occurred. Exception handling code creates opportunities for more exceptions. To prevent an unending cascade of exceptions, the developer must eventually find a way to handle exceptions without introducing more exceptions.

处理异常的代码自己有错误那是最致命的:When exception handling code fails, it’s difficult to debug the problem, since it occurs so infrequently.

Too many exceptions

Tcl contains an unset command that can be used to remove a variable. I defined unset so that it throws an error if the variable doesn’t exist. However, one of the most common uses of unset is to clean up temporary state created by some previous operation.

classes with lots of exceptions have complex interfaces, and they are shallower than classes with fewer exceptions.

The best way to reduce the complexity damage caused by exception handling is to reduce the number of places where exceptions have to be handled.

Define errors out of existence 通过语义定义让错误异常消失

I should have changed the definition of unset slightly: rather than deleting a variable, unset should ensure that a variable no longer exists. There is no longer an error case to report.

Example: file deletion in Windows

The Windows operating system does not permit a file to be deleted if it is open in a process

In Unix, if a file is open when it is deleted, Unix does not delete the file immediately. Instead, it marks the file for deletion, then the delete operation returns successfully. The file name has been removed from its directory, so no other processes can open the old file and a new file with the same name can be created, but the existing file data persists. Processes that already have the file open can continue to read it and write it normally. Once the file has been closed by all of the accessing processes, its data is freed.

Example: Java substring method

if either index is outside the range of the string, then substring throws IndexOutOfBoundsException.

The Java substring method would be easier to use if it performed this adjustment automatically, so that it implemented the following API: “returns the characters of the string (if any) with index greater than or equal to beginIndex and less than endIndex.”

Mask exceptions 底层代码处理掉异常以掩盖异常

The second technique for reducing the number of places where exceptions must be handled is exception masking. With this approach, an exceptional condition is detected and handled at a low level in the system, so that higher levels of software need not be aware of the condition.

TCP masks packet loss by resending lost packets within its implementation, so all data eventually gets through and clients are unaware of the dropped packets.

Exception aggregation 让异常上抛,把异常处理集中到一处进行处理(与掩盖异常相反的方法,适用于不同场景)

The third technique for reducing complexity related to exceptions is exception aggregation. The idea behind exception aggregation is to handle many exceptions with a single piece of code;

Instead of catching the exceptions in the individual service methods, let them propagate up to the top- level dispatch method for the Web server,

This is the opposite of exception masking: masking usually works best if an exception is handled in a low-level method. For masking, the low-level method is typically a library method used by many other methods, so allowing the exception to propagate would increase the number of places where it is handled.

缺点: One disadvantage of promoting a corrupted object into a server crash is that it increases the cost of recovery considerably. Error promotion may not make sense for errors that happen frequently.

如何衡量什么时候用 exception aggregation: One way of thinking about exception aggregation is that it replaces several special-purpose mechanisms, each tailored for a particular situation, with a single general-purpose mechanism that can handle multiple situations.

Just crash

The fourth technique for reducing complexity related to exception handling is to crash the application. these errors are difficult or impossible to handle and don’t occur very often. The simplest thing to do in response to these errors is to print diagnostic information and then abort the application.

Example:

Whether or not it is acceptable to crash on a particular error depends on the application. For a replicated storage system, it isn’t appropriate to abort on an I/O error. Instead, the system must use replicated data to recover any information that was lost.

Design special cases out of existence 减少特殊用例,减少if语句

Special cases can result in code that is riddled with if statements, which make the code hard to understand and lead to bugs. Thus, special cases should be eliminated wherever possible.

The best way to do this is by designing the normal case in a way that automatically handles the special cases without any extra code.

Example:

Design it Twice

Designing software is hard, so it’s unlikely that your first thoughts about how to structure a module or system will produce the best design.

Example: GUI text editor line-oriented -> character-oriented -> string-oriented -> range-oriented

Try to pick approaches that are radically different from each other; you’ll learn more that way. Even if you are certain that there is only one reasonable approach, consider a second design anyway, no matter how bad you think it will be.

make a list of the pros and cons of each one.

make a decision

Write Comments

Good code is self-documenting

However, there is still a significant amount of design information that can’t be represented in code. The informal aspects of an interface, such as a high-level description of what each method does or the meaning of its result, can only be described in comments. If users must read the code of a method in order to use it, then there is no abstraction

Benefits of well-written comments

The overall idea behind comments is to capture information that was in the mind of the designer but couldn’t be represented in the code.

there is a risk of bugs if the new developer misunderstands the original designer’s intentions if it has been more than a few weeks since you last worked in a piece of code, you will have forgotten many of the details of the original design.

Comments Should Describe Things that Aren’t Obvious from the Code 注释

Developers should be able to understand the abstraction provided by a module without reading any code other than its externally visible declarations. (obvious)

Pick conventions

Javadoc for Java, Doxygen for C++, or godoc for Go

consistency

Don’t repeat the code

If the information in a comment is already obvious from the code next to the comment, then the comment isn’t helpful.

A first step towards writing good comments is to use different words in the comment from those in the name of the entity being described.

/*
 * The amount of blank space to leave on the left and
 * right sides of each line of text, in pixels.
 */
private static final int textHorizontalPadding = 4;

Lower-level comments add precision

Some comments provide information at a lower, more detailed, level than the code; these comments add precision by clarifying the exact meaning of the code.

Other comments provide information at a higher, more abstract, level than the code; these comments offer intuition, such as the reasoning behind the code, or a simpler and more abstract way of thinking about the code.

Precise comments can fill in missing details such as:

When documenting a variable, think nouns, not verbs. In other words, focus on what the variable represents, not how it is manipulated.

Higher-level comments enhance intuition

They omit details and help the reader to understand the overall intent and structure of the code.

Higher-level comments are more difficult to write than lower-level comments

being able to ignore the low-level details and think about the system only in terms of its most fundamental characteristics.

Interface documentation

The first step in documenting abstractions is to separate interface comments from implementation comments.

Interface comments provide information that someone needs to know in order to use a class or method; they define the abstraction.

Implementation comments describe how a class or method works internally in order to implement the abstraction.

The interface comment:

Implementation comments: what and why, not how

The main goal of implementation comments is to help readers understand what the code is doing (not how it does it).

In addition to describing what the code is doing, implementation comments are also useful to explain why.

Documents for Cross-module design decisions

Write The Comments First

(Use Comments As Part Of The Design Process) The best time to write comments is at the beginning of the process, as you write the code. Writing the comments first makes documentation part of the design process. Not only does this produce better documentation, but it also produces better designs and it makes the process of writing documentation more enjoyable.

Benefits of writing the comments at the beginning:

Choosing Names 变量命名

Example: bad names cause bugs

The file system code used the variable name block for two different purposes. In some situations, block referred to a physical block number on disk; in other situations, block referred to a logical block number within a file

Unfortunately, at one point in the code there was a block variable containing a logical block number, but it was accidentally used in a context where a physical block number was needed; as a result, an unrelated block on disk got overwritten with zeroes.

block -> fileBlock and diskBlock

Take a bit of extra time to choose great names, which are precise, unambiguous, and intuitive.

Create an image

names become unwieldy if they contain more than two or three words. Thus, the challenge is to find just a few words that capture the most important aspects of the entity.

Names should be precise

Good names have two properties: precision and consistency.

Vague Name: If a variable or method name is broad enough to refer to many different things, then it doesn’t convey much information to the developer and the underlying entity is more likely to be misused.

it’s fine to use generic names like i and j as loop iteration variables, as long as the loops only span a few lines of code.

Use names consistently

Consistent naming reduces cognitive load

Modifying Existing Code 更改已有代码的建议

the design of a mature system is determined more by changes made during the system’s evolution than by any initial conception.

Stay strategic

If you want to have a system that is easy to maintain and enhance, then “working” isn’t a high enough standard; you have to prioritize design and think strategically.

Unfortunately, when developers go into existing code to make changes such as bug fixes or new features, they don’t usually think strategically. A typical mindset is “what is the smallest possible change I can make that does what I need?”

must resist the temptation to make a quick fix

If you’re not making the design better, you are probably making it worse.

什么时候选择不选择重构并提升设计?

Maintaining comments: keep the comments near the code

It’s easy to forget to update comments when you modify code, which results in comments that are no longer accurate.

The best way to ensure that comments get updated is to position them close to the code they describe, so developers will see them when they change the code.

Example:

Comments belong in the code, not the commit log

Maintaining comments: avoid duplication

If documentation is duplicated, it is more difficult for developers to find and update all of the relevant copies. Instead, find the most obvious single place to put the documentation. In addition, add short comments in the other places that refer to the central location: “See the comment in xyz for an explanation of the code below.”

If information is already documented someplace outside your program, don’t repeat the documentation inside the program; just reference the external documentation.

Maintaining comments: check the diffs before commit

Consistency

once you have learned how something is done in one place, you can use that knowledge to immediately understand other places that use the same approach. Consistency allows developers to work more quickly with fewer mistakes.

Examples of consistency

Consistency can be applied at many levels in a system:

Ensuring consistency

Code Should be Obvious

the best way to determine the obviousness of code is through code reviews.

Things that make code more obvious

Things that make code less obvious

Object-oriented programming and inheritance

Agile development

One of the risks of agile development is that it can lead to tactical programming. Developing incrementally is generally a good idea, but the increments of development should be abstractions, not features.

Unit tests

Test-driven development

When creating a new class, the developer first writes unit tests for the class, based on its expected behavior. Then the developer works through the tests one at a time, writing enough code for that test to pass. When all of the tests pass, the class is finished.

The problem with test-driven development is that it focuses attention on getting specific features working, rather than finding the best design.

One place where it makes sense to write the tests first is when fixing bugs. Before fixing a bug, write a unit test that fails because of the bug. Then fix the bug and make sure that the unit test now passes.

Design patterns

If a design pattern works well in a particular situation, it will probably be hard for you to come up with a different approach that is better.

The greatest risk with design patterns is over-application.

Getters and setters

Getters and setters are shallow methods

Designing for Performance

The most important idea is still simplicity: not only does simplicity improve a system’s design, but it usually makes systems faster.

How to think about performance

The best approach is something between these extremes, where you use basic knowledge of performance to choose design alternatives that are “naturally efficient” yet also clean and simple.

a few examples of operations that are relatively expensive today:

Efficiency 与 Complexity 矛盾时候: If the faster design adds a lot of implementation complexity, or if it results in more complicated interfaces, then it may be better to start off with the simpler approach and optimize later if performance turns out to be a problem.

Measure before modifying

If you start making changes based on intuition, you’ll waste time on things that don’t actually improve performance, and you’ll probably make the system more complicated in the process.

Before making any changes, measure the system’s existing behavior.

Design around the critical path